# Porting SHA512 to Motoko

Motoko is the language for the Internet Computer. Like any relatively new programming language it might lack some libraries whose presence we take for granted in popular languages. Specifically, the diversity of cryptographic primitives are still limited. I've spent some time the past few days porting SHA2-512 to Motoko. In the processes I've learned about Motoko, as well as about the SHA2 algorithm. In the rest of this post I will just use "SHA512" when referring to "SHA2-512".

Read this post if you are:

- Interested in learning Motoko
- Interested in big number computations in Motoko
- Interested in SHA

## Resources

Wikipedia does a really good job describing the SHA2 algorithm and highlighting the differences between its variants. A pseudo code is also given which is easy to follow.

There is also an existing SHA2-256 implementation already available in Motoko. This is a great starting point to understand how big number arithmetic, byte manipulation, and arrays should be approached in Motoko. Since I do not have a frame of reference for best practices, I hope this library is following them already.

Finally, of help were those SHA512 implementations in C and python.

## Plan

`motoko-sha`

describes itself as *the* SHA package. Although it packages
`SHA2-256`

only at the moment, it seems the project is intended to host other
variants as they become ported. Since the existing `SHA2-256`

implementation is
quite a valuable reference, I'm basing my `SHA512`

port on it, and will be
submitting it as a PR to `motoko-sha`

on Github rather than having it in a
standalone repo. My plan will go as follows:

**1. Understand:**

Understand the general `SHA2`

algorithm. This would help lay build some
expectations before reading the `SHA256`

implementation.

**2. Map:**

Read the `SHA256`

implementation and map it to my understanding/expectations of
the `SHA256`

algorithm as I know it.

**3. Transform:**

Identify which parts are specific to `SHA256`

and how they could be changed to
become the `SHA512`

algorithm.

## Overview of SHA2

The SHA2 algorithm contains the following variants:

- SHA224
- SHA256
- SHA384
- SHA512
- SHA-512/224
- SHA-512/256

Each variant defines a `BLOCK_SIZE`

and `NUM_ROUNDS`

amongst other constants.
All variants contain three main phases.

### 1. Initialization

A set of hash values are initialized to predefined constants. The number and
values of those hashes depend on the algorithm variant. Similarly each variant
has a specific number of rounds `NUM_ROUNDS`

, and there is a list of constants
predefined for each round.

### 2. Processing

Incoming data are broken into fixed-size chunks of size `BLOCK_SIZE`

. Each
chunk gets operated on and ends up updating all hash values. If size of the
last chunk is not a multiple of `BLOCK_SIZE`

, it does not get processed for
now.

### 3. Sum

If the is any remaining unprocessed data, it gets padded to match the
`BLOCK_SIZE`

and we process as chunk.

All hash values are concatenated and this becomes the hash output.

## Mapping to the Existing SHA256 Implementation

This is the source
code
being analyzed. The block size is `64`

bytes and the number of rounds is `64`

.
These are not explicitly defined but used inline.

### 1. Initialization

Line
19
defines the `64`

round constants of size 32-bits each.

```
private let K : [Nat32] = [
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
...
];
```

Line
38
initialized the `8`

initial hash values of 32-bits each.

```
private let S : [Nat32] = [
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19,
];
```

### 2. Processing

The processing phase is a collaboration between the functions
`write`

and
`block`

.

The block at Line
71
checks if there is already buffered data and appends incoming bytes to it
at most until it fills `BLOCK_SIZE`

.

```
let n = Nat.min(p.size(), 64 - nx);
for (i in Iter.range(0, n - 1)) {
x[nx + i] := p[i];
};
```

If the buffer size reaches `BLOCK_SIZE`

send to `block`

where the hash values
get updated:

```
if (nx == 64) {
let buf = Array.freeze<Nat8>(x);
block(buf);
nx := 0;
};
```

If the remaining data has enough bytes to fill a `BLOCK_SIZE`

, it will be
processed by `block`

up to its last chunk if it's not yet of size `BLOCK_SIZE`

.

### 3. Sum

The function
`sum`

handles padding and outputting the final hash.

## Transforming into SHA512

The tricky part in going from SHA256 to SHA512 has to do with computations
using 128-bit numbers. While SHA256 requires computations with at most 64-bit
numbers, SHA512 requires 128-bits. Motoko provides `Nat8, Nat16, Nat32, Nat64`

primitive types for representing natural numbers with upper bounds
\(2^8 - 1, 2^{16} - 1, 2^{32} - 1, 2^{64} - 1\) respectively. As those primitives are
bounded, they also get bitwise and wrapping operators out of the box:

```
// Declare a 16-bit natural number
var n1: Nat16 = 100;
// Perform a computation modulo 2^16
n1 *%= 1000;
assert(n1 == 34464);
// Basically n2 = n1 % 2^8:
var n2: Nat8 = Nat8.fromIntWrap(Nat16.toNat(n1));
assert(n2 == 160);
// Left shift modulo 2^8
n1 = n1 << 1;
assert(n1 == 88);
```

`Nat`

, on the other hand, is a primitive for representing unbounded natural
numbers where wrapping operators do not make sense and are absent. Bitwise
operators are also absent for `Nat`

, but whether it is attributable to the
unboundedness of the numbers as well is not clear.

For working with 128-bit numbers the choice is then between using `Nat`

, or
using two `Nat64`

numbers. Using `Nat`

sounded simpler so I'm going with
`Nat`

.

### 1. Initialization

The first step is to update the round constants and initial hash values which become 64-bits and carry different values in SHA512:

```
private let K : [Nat64] = [
0x428a2f98d728ae22, 0x7137449123ef65cd, 0xb5c0fbcfec4d3b2f,
...
]
private let S : [Nat64] = [
0x6a09e667f3bcc908, 0xbb67ae8584caa73b, 0x3c6ef372fe94f82b,
...
]
```

The class `Digest`

has some SHA256-specific
member declarations:

```
// copy the initial hash values
private let s = Array.thaw<Nat32>(S);
// initialize the currently unprocessed buffer to 64 bytes
private let x = Array.init<Nat8>(64, 0);
// total length of incoming data
private var len: Nat64 = 0;
```

Converting to SHA512 values:

```
private let s = Array.thaw<Nat64>(S);
private let x = Array.init<Nat8>(128, 0);
private var len = 0;
```

Note that `len`

is now of type `Nat`

since there isn't a `Nat128`

primitive in
motoko.

### 2. Processing

The function `write`

contains the following
`line`

:

```
len +%= Nat64.fromIntWrap(p.size());
```

which is necessary because the length size in SHA256 is 64 bits. Since we don't
have `Nat128`

, and we cannot do `+%=`

for `Nat`

, lets break down
the above to:

```
let tmp1 = p.size() % 2**64;
let tmp2 = len + tmp1;
len := tmp2 % 2**64;
```

which can be shortened into:

```
len += p.size();
len %= 2**64;
```

While I expect the above to be less performant than the original, a few benchmarks I carried out showed no clear distinction between performance of both. A more accurate measure would be in cycles, but at the time of writing there does not seem to be a way to auto-compute consumptions of cycles locally.

Finally using 128 bits the above becomes:

```
len += p.size();
len %= 2**128;
```

Next, a couple of simple changes in lines
72
and
77
based on the fact that `BLOCK_SIZE=128`

for SHA512:

```
- let n = Nat.min(p.size(), 64 - nx);
+ let n = Nat.min(p.size(), BLOCK_SIZE - nx);
```

```
- if (nx == 64) {
+ if (nx == BLOCK_SIZE) {
```

I was particularly annoyed by the following block of code starting at line 86:

```
if (p.size() >= 64) {
let n = Nat64.toNat(Nat64.fromIntWrap(p.size()) & (^ 63));
// size of buf is an exact multiple of 64
let buf = Array.tabulate<Nat8>(n, func (i) {
return p[i];
});
block(buf);
// put the remaining unprocessed items in p
p := Array.tabulate<Nat8>(p.size() - n, func (i) {
return p[n + i];
});
};
```

The purpose of `Nat64.toNat(Nat64.fromIntWrap(p.size()) & (^ 63))`

is to zero
the last 6 bits of `p.size()`

so that it becomes a multiple of `BLOCK_SIZE`

.
Simply replacing 63 with 127 doesn't work, because `Nat64`

also needs to
become `Nat128`

which we do not have. Using `Nat`

here is also not sufficient
because `Nat`

does not have bitwise operators defined. Initially I tried to
break `p.size()`

into two `Nat64`

numbers, apply the bitwise operation on the
proper one of them and recombine. I was not happy with that since I avoided
representing 128 bit numbers in two 64 bit numbers in first place.
Alternatively I went with the following change:

```
while (p.size() >= BLOCK_SIZE) {
let buf = Array.tabulate<Nat8>(BLOCK_SIZE, func (i) {
return p[i];
});
block(buf);
p := Array.tabulate<Nat8>(p.size() - BLOCK_SIZE, func (i) {
return p[BLOCK_SIZE + i];
});
};
```

So instead of sending an exact multiple of `BLOCK_SIZE`

to `block()`

, I iterate
through the data and send a buffer of size exactly equals `BLOCK_SIZE`

to
`block()`

. Performance-wise it should be identical to before since `block()`

did the `p := Array.tabulate`

for each block of its incoming data, but now this
is here, and `block()`

gets only one block at a time.

### 3. Sum

Line 105 begins a check for whether the last block has enough space to append the total length of processed input. If not then it extends another block leaving enough space for writing the size of the length after padding. The size of the length is 8 bytes for SHA256 and 16 bytes for SHA512.

```
- if (56 > t) {
- m := 56 - t;
+
+ if (BLOCK_SIZE > t + LENGTH_SIZE) {
+ m := BLOCK_SIZE - LENGTH_SIZE - t;
} else {
- m := 120 - t;
+ m := BLOCK_SIZE * 2 - LENGTH_SIZE - t;
};
```

Line 114
shifts left by 3 bits. But `n`

is now of type `Nat`

so I replaced shifting with
the equivalent multiplication by \(2^3 = 8\):

```
var n = len;
...
- n := n << 3;
+ n := n * 8;
```

And a similar thing in the following block at line 120 for shifting right:

```
- buf := Array.init<Nat8>(8, 0);
- for (i in Iter.range(0, 7)) {
- let j : Nat64 = 56 -% 8 *% Nat64.fromIntWrap(i);
- buf[i] := Nat8.fromIntWrap(Nat64.toNat(n >> j));
+ buf := Array.init<Nat8>(LENGTH_SIZE, 0);
+ for (i in Iter.range(0, LENGTH_SIZE - 1)) {
+ let j: Nat = BLOCK_SIZE - 8 - 8 * i;
+ buf[i] := Nat8.fromIntWrap(n / (2**j));
};
```

The output hash size is 64 bytes in SHA512:

```
- let hash = Array.init<Nat8>(32, 0);
+ let hash = Array.init<Nat8>(64, 0);
```

Finally copying all hash values into the array `hash`

1 byte at a time:

```
for (i in Iter.range(0, 7)) {
- for (j in Iter.range(0, 3)) {
- let k : Nat32 = 24 -% 8 *% Nat32.fromIntWrap(j);
- hash[4 * i + j] := Nat8.fromIntWrap(Nat32.toNat(s[i] >> k));
+ for (j in Iter.range(0, WORD_SIZE - 1)) {
+ let k : Nat64 = Nat64.fromIntWrap(WORD_SIZE * 8)
+ - 8 -% 8 *% Nat64.fromIntWrap(j);
+ hash[8 * i + j] := Nat8.fromIntWrap(Nat64.toNat(s[i] >> k));
};
};
```

## Conclusion

Overall it was a fun introduction to Motoko and how it deals with numbers. It
would have been nice if there were up to 256 bit variants instead of only 64
just for the sake of being familiar to those coming from a solidity background.
But I like how it distinguishes between natural numbers and integers, and how
it deals with overflows in the bounded variants. I submitted a pull
request to `motoko-sha`

repository containing my changes.