# Tamarin Prover Introduction posted June 2017

I've made a quick intro on Tamarin Prover, which is a **protocol verification tool**. I just wanted to show people how practical and fun it looks =)

Hey! I'm **David**, the author of the Real-World Cryptography book. I'm a crypto engineer at O(1) Labs on the Mina cryptocurrency, previously I was the security lead for Diem (formerly Libra) at Novi (Facebook), and a security consultant for the Cryptography Services of NCC Group. This is my blog about **cryptography** and **security** and other related topics that I find interesting.

Quick access to articles on this page:

- - June 2017 - Tamarin Prover Introduction
- - June 2017 - A New Public-Key Cryptosystem via Mersenne Numbers
- - June 2017 - Is Symmetric Security Solved?
- - June 2017 - Implementation of Kangaroo Twelve in Go
- - June 2017 - Maybe you shouldn't skip SHA-3
- - May 2017 - Loop unrolling
- - May 2017 - StrobeGo
- - May 2017 - Crypto Blogging Award 2016

more on the next page...

I've made a quick intro on Tamarin Prover, which is a **protocol verification tool**. I just wanted to show people how practical and fun it looks =)

A new paper made its way to eprint the other day.

a lot of keywords here are really interesting. But first, **what is a Mersenne prime?**

A mersenne prime is simply a prime \(p\) such that \(p=2^n - 1\). The nice thing about that, is that the programming way of writing such a number is

`(1 << n) - 1`

which is a long series of `1`

s.

A number modulo this prime can be any bitstring of the mersenne prime's length.

OK we know what's a Mersenne prime. How do we build our new **public key cryptosystem** now?

Let's start with a private key: `(secret, privkey)`

, two bitstrings of low hamming weight. Meaning that they do not have a lot of bits set to `1`

.

Now something very intuitive happens: the inverse of such a bitstring will probably have a high hamming weight, which let us believe that \(secret \cdot privkey^{-1} \pmod{p}\) **looks random**. This will be our **public key**.

Now that we have a `private key`

and a `public key`

. **How do we encrypt ?**

The paper starts with a very simple scheme on how to encrypt a bit \(b\).

\[ciphertext = (-1)^b \cdot ( A \cdot pubkey + B ) \pmod{p} \]

with \(A\) and \(B\) two public numbers that have low hamming weights as well.

We can see intuitively that the ciphertext will have a high hamming weight (and thus might look random).

If you are not convinced, all of this is based on actual proofs that such operations between low and high hamming weight bitstrings will yield other low or high hamming weight bitstrings. All of this really work because we are modulo a \(1111\cdots\) kind of number. The following lemmas taken from the paper are proven in section 2.1.

**How do you decrypt such an encrypted bit?**

This is how:

\[ciphertext \cdot privkey \pmod{p}\]

This will yield either a low hamming weight number → the original bit \(b\) was a \(0\),

or a high hamming weight number → the original bit \(b\) was a \(1\).

You can convince yourself by following the equation:

And again, intuitively you can see that everything is low hamming weight except for the value of \((-1)^b\).

This scheme doesn't look CCA secure nor practical. The paper goes on with an explanation of a more involved cryptosystem in section 6.

**EDIT**: there is already a reduction of the security estimates published on eprint.

Recently **T. Duong**, **D. Bleichenbacher**, **Q. Nguyen** and **B. Przydatek** released a crypto library intitled Tink. At its center lied an implementation of AES-GCM somehow different from the rest: it did not take a nonce as one of its input.

A few days ago, at the Crypto SummerSchool in Croatia, **Nik Kinkel** told me that he would generally recommend against letting developers tweak the nonce value, based on how AES-GCM tended to be heavily mis-used in the wild. For a recap, if a nonce is used twice to encrypt two different messages **AES-GCM will leak the authentication key**.

I think it's a fair improvement of AES-GCM to remove the nonce argument. By doing so, nonces have to be **randomly generated**. Now the new danger is that the same nonce is randomly generated twice for the same key. The birthday bound tells us that after \(2^{n/2}\) messages, \(n\) being the bit-size of a nonce, you have great odds of generating a previous nonce.

The optimal rekey point has been studied by Abdalla and Bellare and can computed with the cubic root of the nonce space. If more nonces are generated after that, chances of a nonce collision are too high. For AES-GCM this means that after \(2^{92/3} = 1704458900\) different messages, the key should be rotated.

This is of course assuming that you use 92-bit nonces with 32-bit counters. Some protocol and implementations will actually fix the first 32 bits of these 92-bit nonces reducing the birthday bound even further.

Isn't that a bit low?

Yes it kinda is. An interesting construction by Dan J. Bernstein called XSalsa20 (and that can be extended to XChacha20) allow us to use nonces of 192 bits. This would mean that you should be able to use the same key for up to \(2^{192/3} = 18446744073709551616\) messages. Which is already twice more that what a `BIG INT`

can store in a database

It seems like Sponge-based AEADs should benefit from large nonces as well since their rate can store even more bits. This might be a turning point for these constructions in the last round of the CAESAR competition. There are currently 4 of these: Ascon, Ketje, Keyak and NORX.

With that in mind, is nonce mis-use resistance now fixed?

**EDIT**: Here is a list of recent papers on the subject:

- Reconsidering the Security Bound of AES-GCM-SIV
- Better Bounds for Block Cipher Modes of Operation via Nonce-Based Key Derivation
- Increasing the Lifetime of Symmetric Keys for the GCM Mode by Internal Re-keying
- updatable Authenticated Encryption

I've released an implementation of KangarooTwelve in Go

It is heavily based on the official Go's x/crypto/sha3 library. But because of minor implementation details the relevant files have been copied and modified there so you do not need Go's SHA-3 implementation to run this package. Hopefully one day Go's SHA-3 library will be more flexible to allow other keccak construction to rely on it.

I have tested this implementation with different test vectors and it works fine. Note that it has not received proper peer review. If you look at the code and find issues (or not) please let me know!

See here why you should use KangarooTwelve instead of SHA-3. But see here first why you should still not skip SHA-3.

This implementation does not yet make use of **SIMD** to parallelize the implementation. But we can already see improvements due to the smaller number of rounds:

100 bytes | 1000 bytes | 10,000 bytes | |

K12 | 761 ns/op | 1875 ns/op | 15399 ns/op |

SHA3 | 854 ns/op | 3962 ns/op | 34293 ns/op |

SHAKE128 | 668 ns/op | 2853 ns/op | 29661 ns/op |

This was done with a very simple bench script on my 2 year-old macbook pro.

6 commentsAdam Langley from Google wrote a blogpost yesterday called Maybe Skip SHA-3. It started some discussions both on HN and on r/crypto

Speed drives adoption, Daniel J. Bernstein (djb) probably understand this more than anyone else. And this is what led Adam Langley to decide to either stay on SHA-2 or move to BLAKE2. Is that a good advice? Should we all follow his steps?

I think it is important to remember that Google, as well as the other big players, have an agenda for speed. I'm not saying that they do not care about security, it would be stupid for me to say that, especially seeing all of the improvements we've had in the field thanks to them in the last few years (Project Zero, Android, Chrome, Wycheproof, Tink, BoringSSL, email encryption, Key Transparency, Certificate Transparency, ...)

What I'm saying is that although they care deeply about security, they are also looking for compromises to gain speed. This can be seen with the push for 0-RTT in TLS, but I believe we're seeing it here as well with a push for either KangarooTwelve (K12) or BLAKE2 instead of SHA-3/SHAKE and BLAKE (which are more secure versions of K12 and BLAKE2).

Adam Langley even went as far as to recommend folks to stay on SHA-2.

But how can we keep advising SHA-2 when we know it can be **badly** misused? Yes I'm talking about length extension attacks. These attacks that prevent you from using `SHA-2(key|data)`

to create a Message Authentication Code (MAC).

We recently cared so much about **misuses** of AES-GCM (**documented misuses!**) that Adam Langley's previous blog post to the SHA-3 one is about AES-GCM-SIV.
We cared so much about **simple APIs**, that recently Tink simply removed the nonce argument from AES-GCM's API.

If we cared so much about documented misusage of crypto APIs, how can we not care about this **undocumented** misuse of SHA-2? Intuitively, if SHA-2 was to behave like a random oracle there would be no problem at all with the `SHA-2(key|data)`

construction. Actually, none of the more secure hash functions like Blake and SHA-3 have this problem.

If we cared so much about simple APIs, why would we advise people to "fix" SHA-2 by truncating 256 bits of SHA-512's output (SHA-512/256)?

The reality is that **you should use SHA-3**. I'm making this as a broad recommendation for people who do not know much about cryptography. You can't go wrong with the NIST's standard.

Now let's see how we can dilute a good advice.

If you care about speed you should use SHAKE or BLAKE.

If you really care about speed you should use KangarooTwelve or BLAKE2.

If you really really care about speed you should use SHA-512/256. (**edit**: people are pointing to me that Blake2 is also faster than that)

If you really really really care about speed you should use CRC32 (don't, it's a joke).

How big of a **security compromise** are you willing to make? We know the big players have decided, but have they decided for you?

**Is SHA-3 that slow**?

Where is a hash function usually used? One major use is in signing messages. You hash the message before signing it because you can't sign something that is too big.

Here is a number taken from djb's benchmarks on how many cycles it takes to sign with ed25519: `187746`

Here is a number taken from djb's benchmarks on how many cycles it takes to hash a byte with keccak512: `15`

Keccak has a page with more numbers for software performance

Spoiler Alert: you probably don't care about a few cycles.

Not to say there are no cases where this is relevant, but if you're doing a huge amount of hashing and you're hashing big messages, you should rather switch to a tree-hashing mode. KangarooTwelve, BLAKE2bp and BLAKE2sp all support that.

**EDIT**: The Keccak team just released a response as well.

Reading cryptographic implementations, you can quickly spot a lot of weirdness and oddities. Unfortunately, this is because very few cryptographic implementations aim for readability (hence the existence of readable implementations). But on the other hand, there are a lot of cool techniques in there which are interesting to learn about.

Bit-slicing is one I've never really talked about, so here is a self-explanatory tweet quoting Thomas Pornin:

an example of the bitslicing technique to get constant time crypto code (from https://t.co/pz5aQs9iGR ) pic.twitter.com/UwJxEIgLlM

— David Wong (@lyon01_david) April 25, 2017

But I'm here to talk about loop unrolling. Here's what wikipedia has to say about it:

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as the space-time tradeoff.

Want to see an example? Here's how Keccak was implemented in go

While the θ step could have easily fit in a loop:

```
for x := 0; x < 5; x++ {
B[x] = a[x][0] ^ a[x][1] ^ a[x][2] ^ a[x][3] ^ a[x][4]
}
```

It was **unrolled** to allow for faster execution, getting rid of loop registers and checks for the end of the loop at every iteration:

```
// Round 1
bc0 = a[0] ^ a[5] ^ a[10] ^ a[15] ^ a[20]
bc1 = a[1] ^ a[6] ^ a[11] ^ a[16] ^ a[21]
bc2 = a[2] ^ a[7] ^ a[12] ^ a[17] ^ a[22]
bc3 = a[3] ^ a[8] ^ a[13] ^ a[18] ^ a[23]
bc4 = a[4] ^ a[9] ^ a[14] ^ a[19] ^ a[24]
d0 = bc4 ^ (bc1<<1 | bc1>>63)
```

Here is an implementation of MD5 in go. You can see that the `gen.go`

file is generating a different `md5block.go`

file which will be the unrolled code. Well yeah, sometimes it's just easier than doing it by hand :]

Here is an implementation of SHA-1 in assembly.

It's assembly, I'm not going to try to explain that.

Here's another example with Keccak in C. The unrolling is done in the preprocessing phase via C macros like this one:

`#define REPEAT5(e) e e e e e`

That's it! If you still haven't got it, you might have the qualities to run for president.

If you know more tricks let me know!

comment on this storyI wrote an opinionated and readable implementation of the Strobe Protocol Framework in Go here.

It hasn't been thoroughly tested, and its main goal was to help me understand the specification of Strobe while kicking around ideas to challenge the spec.

It started a discussion on the mailing list if you're interested.

I hope that this implementation can help someone else understand the paper/spec better. The plan is to have a usable implementation by the time Strobe's specification is more stable (not that it isn't, its version is actually 1.0.2 at the moment).

Following is a list of divergence from the current specification:

The main **Operate** public function that allows you to interact with STROBE as a developer has a different API. In pseudo code:

`Operate(meta bool, operation string, data []byte, length int, more bool) → (data []byte)`

I've also written high level functions to call the different known operations like `KEY`

, `PRF`

, `send_ENC`

, etc... One example:

```
// `meta` is appropriate for checking the integrity of framing data.
func (s *strobe) send_MAC(meta bool, output_length int) []byte {
return s.Operate(meta, "send_MAC", []byte{}, output_length, false)
}
```

The meta boolean variable is used to indicate if the operation is a META operation.

```
// operation is meta?
if meta {
flags.add("M")
}
```

What is a META operation? From the specification:

For any operation, there is a corresponding "meta" variant. The meta operation works exactly the same way as the ordinary operation. The two are distinguished only in an "M" bit that is hashed into the protocol transcript for the meta operations. This is used to prevent ambiguity in protocol transcripts. This specification describes uses for certain meta operations. The others are still legal, but their use is not recommended. The usage of the meta flag is described below in Section 6.3.

Each operation that has a relevant META variant, also contains explanation on it. For example the ENC section:

send_meta_ENC and recv_meta_ENC are used for encrypted framing data.

The operations are called via their **strings equivalent** instead of their flag value in byte. This could in theory be irrelevant if only the high level functions were made available to the developers.

A length argument is also available for operations like `PRF`

, `send_MAC`

and `RATCHET`

that require a length. A non-zero length will create a zero buffer of the required length, which is not optimal for the `RATCHET`

operation.

```
// does the operation requires a length?
if operation == "PRF" || operation == "send_MAC" || operation == "RATCHET" {
if length == 0 {
panic("A length should be set for this operation.")
}
// create an empty data slice of the relevant size
data = bytes.Repeat([]byte{0}, length)
} else {
if length != 0 {
panic("Output length must be zero except for PRF, send_MAC and RATCHET operations.")
}
// copy the data not to modify the original data
data = make([]byte, len(data_))
copy(data, data_)
}
```

One of my grip with the Strobe's specification is that nothing is truly explained through words: the reader is expected to follow reference snippets of code written in Python. Everything has been optimized and it is quite hard to understand what is really happening (hence why I wrote this implementation in the first place). To remediate this, I wrote a clear path of what is happening for each operation. It is great for other readers who want to understand what is happening, but obviously not the optimal thing to do for an efficient implementation :)

It looks like a bunch of `if`

and `else if`

(Go does not have support for `switch`

):

```
if operation == "AD" { // A
// nothing happens
} else if operation == "KEY" { // AC
cbefore = true
forceF = true
} else if operation == "PRF" { // IAC
cbefore = true
forceF = true
...
```

I have omited any function name starting with `_`

as Go has a different way of making functions available out of a package: make the first letter upper case.

The _duplex function was written after Golang's official SHA-3 implementation and is not optimal. The reason is that SHA-3 does not need to XOR anything in the state until after either the end of the rate (block size) or the end of the input has been reached. Because of this, Go's SHA-3 implementation waits for these to happen while STROBE often cannot: STROBE often needs the result right away. This led to a few oddities.

I do not use the `pos`

variable as it is easily obtainable from the buffered state. This is another (good) consequence of Go's decision on waiting before running the Keccak function. Some other implementations might want to keep track of it.

The specification states that cur_flags should be initialized to `None`

. Which is annoying to program (see how I did it for I0). This is why I took the decision that a flag cannot be zero, and I'm initializing cur_flags to zero.

I'm later asserting that the flag is valid, according to a list of valid flags

```
// operation is valid?
var flags flag
var ok bool
if flags, ok = OperationMap[operation]; !ok {
panic("not a valid operation")
}
```

The specification states that the initialization should be done with R = R + 2. This is to have a match with cSHAKE's padding when using some STROBE instances like Strobe-128/1600 and Strobe-256/1600.

I like having constants and not modifying them. So I initialize with a smaller rate (block_size) and do not conform with cSHAKE's specification. Is it important? I think yes. It does not make sense to use cSHAKE's padding on Strobe-128/800, Strobe-256/800 and Strobe-128/400 as their rate is smaller than cSHAKE's defined rate in its padding.

**EDIT**: I ended up reverting this change. As I only implemented Strobe-256/1600, it is easy to conform to cSHAKE by just initializing with `R + 2`

which will not modify the constant R.

I spent a lot of time reading this blogpost today and thought to myself: this is a great blog post. If there was a blogging award for security/crypto blogposts this would probably be the front runner for 2017.

But I can't really blog about it, because we're still waiting for a lot more good blog posts to come this year.

So what I decided to do instead, is to go through all the blog posts that I liked last year (it's easy, I have a list here) and find out which ones stood out from the rest.

please do not get offended if yours is not in there, I might have just missed it!

How does a blog post make it to the list? It has to be:

**Interesting**. I need to learn something out of it, whatever the topic is. If it's only about results I'm generally not interested.**Pedagogical**. Don't dump your unfiltered knowledge on me, I'm dumb. Help me with diagrams and explain it to me like I'm 5.**Well written**. I can't read boring. Bonus point if it's funny :)

So here it is, a list of my favorite Crypto-y/Security blogposts for the year of 2016:

- The amazing series of Thai Duong's The Internet of Broken Protocols. Don't read the answer right away, try to solve them yourself, it's fun :)
- Crypto Classics: Wiener's RSA Attack. The explanation is top notch. If I could have something like that for every crypto attack I would know everything.
- The most incredible hiding-in-the wild backdoor found: OpenSSL key recovery attack on DH small subgroups (CVE-2016-0701) (it also merged into a paper and was nominated in the pwnie awards).
- Adam Langley on Cryptographic Agility which shows a lot of the arguments behind TLS 1.3's design.
- BearSSL and many of its documentation pages (yes I know, these are not blog posts but they read like blog posts).
- Filippo with his explanation (and finding) of the pen and paper Bleichenbacher 2006 signature forgery
- The hilarious Graphing when your Facebook friends are awake. (Not really crypto.)
- Another hilarious post: Playing games with an attacker: how I messed with someone trying to breach the CryptoWall tracker. (Again, more security than crypto.)
- All the blogposts on Matthew Green's blog. The one on Sweet32, the iMessage attack, the DROWN attack, ...
- The Goldreich–Goldwasser–Halevi (GGH) Cryptosystem and Nyguen’s Attack by Kelby Ludwig. Will teach you about lattices, a lattice-based cryptosystem and an attack while showing you how to see these things in Sage.
- The Hunting For Vulnerabilities in Signal series by Jean-Philippe Aumasson and Markus Vervier.
- The excellent Cloudflare blog has many crypto blog posts, for example this article by Filippo on TLS nonce-nse
- A Survival Guide to a PhD.
- Literally everything on the blog of Stephane Bortzmeyer (warning: it's in french).
- The crypto work of Quarkslab. Here's their analysis of Confide.
- Ransomware: Past, Present, and Future. I haven't read this one but it seems so complete I couldn't not include it :)

Did I miss one? Make me know in the comments :]

comment on this storyMy book **Real-World Cryptography** is finished and shipping! You can purchase it here.

If you don't know **where to start**, you might want to check these popular articles:

- - Speed and Cryptography
- - Problems that UDP and only UDP has
- - ASN.1 vs DER vs PEM vs x509 vs PKCS#7 vs ....
- - How to Backdoor Diffie-Hellman: quick explanation
- - A New Public-Key Cryptosystem via Mersenne Numbers
- - The 9 Lives of Bleichenbacher's CAT: New Cache ATtacks on TLS Implementations
- - Developers Are Not Idiots

Here are the latest **links** posted:

- 28 Nov Insecure Random Number Generation
- 27 Nov Bitslicing For Constant Time Cryptography
- 26 Nov Linear Cryptanalysis Notebook
- 25 Nov An Intensive Introduction To Cryptography
- 24 Nov Rsa Vs Ecdsa For Dnssec

You can also **suggest a link**.