Transforming Motoko's SHA256 Implementation to SHA512

Posted Jan. 5 2022 by Tarek.

The tricky part in going from SHA256 to SHA512 has to do with computations using 128-bit numbers. While SHA256 requires computations with at most 64-bit numbers, SHA512 requires 128-bits. Motoko provides Nat8, Nat16, Nat32, Nat64 primitive types for representing natural numbers with upper bounds \(2^8 - 1, 2^{16} - 1, 2^{32} - 1, 2^{64} - 1\) respectively. As those primitives are bounded, they also get bitwise and wrapping operators out of the box:

// Declare a 16-bit natural number
var n1: Nat16 = 100;
// Perform a computation modulo 2^16
n1 *%= 1000;
assert(n1 == 34464);

// Basically n2 = n1 % 2^8:
var n2: Nat8 = Nat8.fromIntWrap(Nat16.toNat(n1));
assert(n2 == 160);

// Left shift modulo 2^8
n1 = n1 << 1;
assert(n1 == 88);

Nat, on the other hand, is a primitive for representing unbounded natural numbers where wrapping operators do not make sense and are absent. Bitwise operators are also absent for Nat, but whether it is attributable to the unboundedness of the numbers as well is not clear.

For working with 128-bit numbers the choice is then between using Nat, or using two Nat64 numbers. Using Nat sounded simpler so I'm going with Nat.

1. Initialization

The first step is to update the round constants and initial hash values which become 64-bits and carry different values in SHA512:

private let K : [Nat64] = [
    0x428a2f98d728ae22, 0x7137449123ef65cd, 0xb5c0fbcfec4d3b2f,
    ...
    ]
private let S : [Nat64] = [
    0x6a09e667f3bcc908, 0xbb67ae8584caa73b, 0x3c6ef372fe94f82b,
    ...
    ]

The class Digest has some SHA256-specific member declarations:

// copy the initial hash values
private let s = Array.thaw<Nat32>(S);
// initialize the currently unprocessed buffer to 64 bytes
private let x = Array.init<Nat8>(64, 0);
// total length of incoming data
private var len: Nat64 = 0;

Converting to SHA512 values:

private let s = Array.thaw<Nat64>(S);
private let x = Array.init<Nat8>(128, 0);
private var len = 0;

Note that len is now of type Nat since there isn't a Nat128 primitive in motoko.

2. Processing

The function write contains the following line:

len +%= Nat64.fromIntWrap(p.size());

which is necessary because the length size in SHA256 is 64 bits. Since we don't have Nat128, and we cannot do +%= for Nat, lets break down the above to:

let tmp1 = p.size() % 2**64;
let tmp2 = len + tmp1;
len := tmp2 % 2**64;

which can be shortened into:

len += p.size();
len %= 2**64;

While I expect the above to be less performant than the original, a few benchmarks I carried out showed no clear distinction between performance of both. A more accurate measure would be in cycles, but at the time of writing there does not seem to be a way to auto-compute consumptions of cycles locally.

Finally using 128 bits the above becomes:

len += p.size();
len %= 2**128;

Next, a couple of simple changes in lines 72 and 77 based on the fact that BLOCK_SIZE=128 for SHA512:

-        let n = Nat.min(p.size(), 64 - nx);
+        let n = Nat.min(p.size(), BLOCK_SIZE - nx);
-        if (nx == 64) {
+        if (nx == BLOCK_SIZE) {

I was particularly annoyed by the following block of code starting at line 86:

if (p.size() >= 64) {
  let n = Nat64.toNat(Nat64.fromIntWrap(p.size()) & (^ 63));
  // size of buf is an exact multiple of 64
  let buf = Array.tabulate<Nat8>(n, func (i) {
    return p[i];
  });
  block(buf);
  // put the remaining unprocessed items in p
  p := Array.tabulate<Nat8>(p.size() - n, func (i) {
    return p[n + i];
  });
};

The purpose of Nat64.toNat(Nat64.fromIntWrap(p.size()) & (^ 63)) is to zero the last 6 bits of p.size() so that it becomes a multiple of BLOCK_SIZE. Simply replacing 63 with 127 doesn't work, because Nat64 also needs to become Nat128 which we do not have. Using Nat here is also not sufficient because Nat does not have bitwise operators defined. Initially I tried to break p.size() into two Nat64 numbers, apply the bitwise operation on the proper one of them and recombine. I was not happy with that since I avoided representing 128 bit numbers in two 64 bit numbers in first place. Alternatively I went with the following change:

while (p.size() >= BLOCK_SIZE) {
  let buf = Array.tabulate<Nat8>(BLOCK_SIZE, func (i) {
    return p[i];
  });
  block(buf);
  p := Array.tabulate<Nat8>(p.size() - BLOCK_SIZE, func (i) {
    return p[BLOCK_SIZE + i];
  });
};

So instead of sending an exact multiple of BLOCK_SIZE to block(), I iterate through the data and send a buffer of size exactly equals BLOCK_SIZE to block(). Performance-wise it should be identical to before since block() did the p := Array.tabulate for each block of its incoming data, but now this is here, and block() gets only one block at a time.

3. Sum

Line 105 begins a check for whether the last block has enough space to append the total length of processed input. If not then it extends another block leaving enough space for writing the size of the length after padding. The size of the length is 8 bytes for SHA256 and 16 bytes for SHA512.

-      if (56 > t) {
-        m := 56 - t;
+
+      if (BLOCK_SIZE > t + LENGTH_SIZE) {
+        m := BLOCK_SIZE - LENGTH_SIZE - t;
       } else {
-        m := 120 - t;
+        m := BLOCK_SIZE * 2 - LENGTH_SIZE - t;
      };

Line 114 shifts left by 3 bits. But n is now of type Nat so I replaced shifting with the equivalent multiplication by \(2^3 = 8\):

       var n = len;
       ...
-      n := n << 3;
+      n := n * 8;

And a similar thing in the following block at line 120 for shifting right:

-      buf := Array.init<Nat8>(8, 0);
-      for (i in Iter.range(0, 7)) {
-        let j : Nat64 = 56 -% 8 *% Nat64.fromIntWrap(i);
-        buf[i] := Nat8.fromIntWrap(Nat64.toNat(n >> j));
+      buf := Array.init<Nat8>(LENGTH_SIZE, 0);
+      for (i in Iter.range(0, LENGTH_SIZE - 1)) {
+        let j: Nat = BLOCK_SIZE - 8 - 8 * i;
+        buf[i] := Nat8.fromIntWrap(n / (2**j));
       };

The output hash size is 64 bytes in SHA512:

-      let hash = Array.init<Nat8>(32, 0);
+      let hash = Array.init<Nat8>(64, 0);

Finally copying all hash values into the array hash 1 byte at a time:

    for (i in Iter.range(0, 7)) {
-        for (j in Iter.range(0, 3)) {
-          let k : Nat32 = 24 -% 8 *% Nat32.fromIntWrap(j);
-          hash[4 * i + j] := Nat8.fromIntWrap(Nat32.toNat(s[i] >> k));
+        for (j in Iter.range(0, WORD_SIZE - 1)) {
+          let k : Nat64 = Nat64.fromIntWrap(WORD_SIZE * 8)
+            - 8 -% 8 *% Nat64.fromIntWrap(j);
+          hash[8 * i + j] := Nat8.fromIntWrap(Nat64.toNat(s[i] >> k));
        };
    };