## What is SPAKE2?

SPAKE2 is a *password-authenticated key exchange* (PAKE) protocol.

This means it is a set of steps (a protocol)
to allow two parties to share a session key (“key exchange”)
based on a password that they both know (“password-authenticated”).

There are many such protocols,
but as mentioned last post, I know next to nothing about cryptography,
so if you want to learn about them, you’ll have to go elsewhere.

SPAKE2 is designed under a certain set of assumptions and constraints.

First, we don’t know if the person we’re talking to is the person we think we are talking to,
but we want to find out.
That is, we need to *authenticate* them,
and we want to use the password to do this (hence “password-authenticated”).

Second, the shared password is expected to be weak,
such as a PIN code,
or a couple of English words stuck together.

## How does SPAKE2 solve this?

To explain how SPAKE2 solves this,
it can help to go through a couple of approaches that definitely do *not* work.

For example, we could just send the password over the wire.
This is a terrible idea.
Not only does it expose the password to eavesdroppers,
but it also gives us no evidence that the other side knows the password.
After all, we could send them the password, and they could send it right back.

We need to send something over the wire that is *not* the password,
but that could only have been generated *with* the password.

So perhaps our next refinement might be that we send our name,
somehow cryptographically signed with password.

This is better than just sending the password,
but still leaves us exposed to offline dictionary attacks.
After all, our name is well-known in plain text,
so an eavesdropper can look out for it in the protocol,
snaffle up the ciphertext,
and then run a dictionary against it at their leisure.
It also leaves open the question of how we will generate a session key.

SPAKE2 goes a few steps further than this.
Rather than sending a signed version of some known text,
each side sends an “encrypted” version of a *random value*,
using the password as a key.

Each side then decrypts the value it receives from the other side,
and then uses its random value and the other random value
as inputs to a hash function that generates a session key.

If the passwords are the same, the session key will be the same,
and both sides will be able to communicate.

That is the shorter answer for “How does SPAKE2 work?”.
The longer answer involves rather a lot of mathematics.

## Show me the mathematics

When I was learning SPAKE2, this was a bit of a problem for me.
I hit three major hurdles.

- Notation—maths just has obscure notation
- Terminology—maths uses non-descriptive words for concepts
- Concepts—some are merely unfamiliar, others genuinely difficult

In this post, I want to help you over all of these hurdles,
such that you can go and read papers and blog posts
by people who actually understand what they are talking about.
This means that I’ll try to go out of my way to explain the notation and terminology
while also going through the core concepts.

I want to stress that I am not an expert.
What you’re reading here is me figuring this out for myself,
with a little help from my friends and the Internet.

### Overview

We can think of SPAKE2 as having five stages:

- Public system parameters, established before any exchange takes place
- A secret password shared between two parties
- An exchange of data
- Using that exchange to calculate a key
- Generating a session key

We’ll deal with each in turn.

### System parameters

First, we start with some system parameters.
These are things that both ends of the SPAKE2 protocol need to have baked into their code.
As such, they are public.

These parameters are:

- a group, \(G\), of prime order \(p\)
- a generator element, \(g\), from that group
- two arbitrary elements, \(M\), \(N\), of the group

What’s a group? A group \((G, +)\) is a set together with a binary operator such that:

- adding any two members of the group gets you another member of the group
(closed under \(+\))
- the operation + is associative, that is \(X + (Y + Z) = (X + Y) + Z\) (associativity)
- there’s an element, \(0\),
such that \(X + 0 = X = 0 + X\) for any element x in the group (identity)
- for every element, \(X\), there’s an element \(-X\),
such that \(X + (-X) = 0\) (inverse)

It’s important to note that \(+\) stands for a generic binary operation with these properties,
not necessarily any kind of addition,
and \(0\) stands for the identity, rather than the numeral 0.

To get a better sense of this somewhat abstract concept,
it can help to look at a few concrete examples.
These don’t have much to do with SPAKE2 per se,
they are just here to help explain groups.

The integers with addition form a group with \(0\) as the identity,
because you can add and subtract (i.e. add the negative) them and get other integers,
and because addition is associative.

The integers with multiplication are *not* a group, because what’s the inverse of 2?

But the rational numbers greater than zero with multiplication *do* form a group,
with 1 as the identity.

Likewise, integers with multiplication modulo some fixed number do form a group—a finite group.
For example, for integers with multiplication modulo 7,
the identity is 1, multiplication is associative,
and the inverse of 2 is 4, because \((2 \cdot 4) \mod 7 = 1\).

But but! When we are talking about groups in the abstract,
we’ll still call the operation \(+\) and the identity \(0\),
even if the implementation is that the operation is multiplication.

But but but! This is not at all a universally followed convention,
so when you are reading about groups, you’ll often see the operation
written as a product (e.g. \(XY\) or \(X \cdot Y\) instead of \(X + Y\))
and the identity written as \(1\).

Still with me?

Why groups?

You might be wondering why we need this “group” abstraction at all.
It might seem like unnecessary complexity.

Abstractions like groups are a lot like the programming concept of an abstract interface.
You might write a function in terms of an interface
because you want to allow for lots of different possible implementations.
Doing so also allows you to ignore details about specific concrete implementations
so you can focus on what matters—the external behaviour.

It’s the same thing here. The group could be an elliptic curve,
or something to do with prime numbers, or something else entirely—SPAKE2 doesn’t care.
We want to define our protocol to allow lots of different underlying implementations,
and without getting bogged down in how they actually work.

For SPAKE2, we have an additional requirement for the group:
it is finite and has a prime number of elements.
We’ll use \(p\) to refer to this number—this is what is meant by “of prime order \(p\)” above.

Due to the magic of group theory , this gives \(G\) some bonus properties:

- it is
*cyclic*, we can generate all of the elements of the group by picking one (not the identity) and adding it to itself over and over
- it is abelian, that is \(X + Y = Y + X\),
for any two elements \(X\), \(Y\) in \(G\) (commutativity)

Which explains what we mean by “a generator element”, \(g\),
it’s just an element from the group that’s not the identity.
We can use it to make any other element of the group by adding it to itself.
If the group weren’t cyclic, we could, in general, only use \(g\) to generate a subgroup.

The process of adding an element to itself over and over is called *scalar multiplication* .
In mathematical notation, we write it like this:

\begin{equation*}
Y = nX
\end{equation*}

Or slightly more verbosely like:

\begin{equation*}
Y = n \cdot X
\end{equation*}

Where \(n\) is an integer
and \(X\) is a member of the group,
and \(Y\) is the result of adding \(X\) to itself \(n\) times.

If \(n\) is 0, \(Y\) is \(0\). If \(n\) is 1, \(Y\) is \(X\).

Just as sometimes the group operator is written with product notation rather than addition,
so to scalar multiplication is sometimes written with exponentiation,
to denote *multiplying* a thing by itself n times. e.g.

\begin{equation*}
Y = X^n
\end{equation*}

I’m going to stick to the \(n \cdot X\) notation in this post,
and I’m always going to put the scalar *first*.

Also, I am mostly going to use upper case (e.g. \(X\), \(Y\)) to refer to group elements
(with the exception of the generator element, \(g\))
and lower case (e.g. \(n\), \(k\)) to refer to scalars.
I’m going to try to be consistent, but it’s always worth checking for yourself.

Because the group \(G\) is cyclic,
if we have some group element \(X\) and a generator \(g\),
there will always be a number, \(k\), such that:

\begin{equation*}
X = k \cdot g
\end{equation*}

Here, \(k\) would be called the discrete log of \(X\) with respect to base \(g\).
“Log” is a nod to exponentiation notation,
“discrete” because this is a finite group.

Time to recap.

SPAKE2 has several public parameters, which are

- a group, \(G\), of prime order \(p\),
which means it’s cyclic, abelian, and we can do scalar multiplication on it
- a generator element, \(g\), from that group,
that we will do a lot of scalar multiplication with
- two arbitrary elements, \(M\), \(N\), of the group,
where no one knows the discrete log with respect to \(g\).

### Shared secret password

The next thing we need to begin a SPAKE2 exchange is a shared secret password.

In human terms, this will be a short string of bytes, or a PIN.

In terms of the mathematical SPAKE2 protocol, this must be a scalar, \(pw\).

How we go from a string of bytes to a scalar is completely out of scope for the SPAKE2 paper.
This confused me when trying to implement SPAKE2 in Haskell,
and I don’t claim to fully understand it now.

We HKDF expand the password in order to get a more uniform distribution of scalars .
This still leaves us with a byte string, though.

To turn that into an integer (i.e. a scalar), we simply base256 decode the byte string.

This gives us \(pw\), which we use in the next step.

### Data exchange

At this point, the user has entered a password and we’ve converted it into a scalar.

We need some way to convince the other side that we know the password
without *actually sending* the password to them.

This means two things:

- We have to send them something
*based on* the password
- We get to use all of the shiny mathematics we introduced earlier

The process for both sides is the same, but each side needs to know who’s who.
One side is side A, and other is side B,
and how they figure out which is which happens outside the protocol.

Each draw a random scalar between \(0\) and \(p\): \(x\) for side A, \(y\) for side B.
They then use that to generate an element: \(X = x \cdot g\) for side A,
\(Y = y \cdot g\) for side B.

They then “blind” this value by adding it to one of the elements that make up the system parameters,
scalar multiplied by \(pw\), our password.

Thus, side A makes \(X^{\star} = X + pw \cdot M\)
and side B makes \(Y^{\star} = Y + pw \cdot N\).

They then each send this to the other side and wait to receive the equivalent message.

Again, the papers don’t say how to encode the message,
so python-spake2 just base-256 encodes it
and has side A prepend the byte `A` (`0x41`)
and side B prepend `B` (`0x42`).

### Calculating a key

Once each side has the other side’s message, they can start to calculate a key.

Side A calculates its key like this:

\begin{equation*}
K_A = x \cdot (Y^{\star} - pw \cdot N)
\end{equation*}

The bit inside the parentheses is side A recovering \(Y\),
since we defined \(Y^{\star}\) as:

\begin{equation*}
Y^{\star} = Y + pw \cdot N
\end{equation*}

We can rewrite that in terms of \(Y\) by subtracting \(pw \cdot N\) from both sides:

\begin{equation*}
Y = Y^{\star} - pw \cdot N
\end{equation*}

Which means, as long as both sides have the same value for \(pw\),
can swap in \(Y\) like so:

\begin{align*}
K_A& = x \cdot Y \\
& = x \cdot (y \cdot g) \\
& = xy \cdot g
\end{align*}

Remember that when we created \(Y\) in the first place,
we did so by multiplying our generator \(g\) by a random scalar \(y\).

Side B calculates its key in the same way:

\begin{align*}
K_B& = y \cdot (X^{\star} - pw \cdot N) \\
& = y \cdot X \\
& = y \cdot (x \cdot g) \\
& = yx \cdot g \\
& = xy \cdot g
\end{align*}

Thus, if both sides used the same password, \(K_A = K_B\).

### Generating a session key

Both sides now have:

- \(X^{\star}\)
- \(Y^{\star}\)
- Either \(K_A\) or \(K_B\)
- \(pw\), or at least their own opinion of what \(pw\) is

To these we add the heretofore unmentioned \(A\) and \(B\),
which are meant to be identifiers for side A and side B respectively.
Each side presumably communicates these to each other out-of-band to SPAKE2.

We then hash all of these together, using a hash algorithm, \(H\),
that both sides have previously agreed upon, so that:

\begin{equation*}
SK = H(A, B, X^{\star}, Y^{\star}, pw, K)
\end{equation*}

Where \(K\) is either \(K_A\) or \(K_B\).

I don’t really understand why this step is necessary—why not use \(K\)?
Nor do I understand why each of the inputs to the hash is necessary—\(K\)
is already derived from \(X^{\star}\), why do we need \(X^{\star}\)?

In the code, we change this ever so slightly:

\begin{equation*}
SK = H(H(pw), H(A), H(B), X^{\star}, Y^{\star}, K)
\end{equation*}

Basically, we hash all of the variable length fields to make them fixed length
to avoid collisions between values.

python-spake2 uses SHA256 as the hash algorithm for this step.
I’ve got no idea why this and not, say, HKDF.

And this is the session key. SPAKE2 is done!