Planet Bazaar

October 24, 2017

Mark Shuttleworth

Beavering away at the brilliantly bionic 18.04 LTS

Congratulations to Team *Buntu on the release of our Artful Aardvark 17.10, featuring all your favourite desktop environments, kubernetes 1.8, the latest OpenStack, and security updates for 9 months, which takes us all the way to our next enterprise release, Ubuntu 18.04 LTS.

A brumous development cycle always makes for cool-headed work and brisk progress on the back of breem debate.

As always, 18.04 LTS will represent the sum of all our interests.

For those of you with bimodal inclinations, there’s the official upstream Kubernetes-on-Ubuntu spell for ‘conjure-up kubernetes’ with bijou multi-cloud goodness. We also have spells for OpenStack on Ubuntu and Hadoop on Ubuntu, so conjure-up is your one-stop magic shop for at-scale boffo big data, cloud and containers. Working with upstreams to enable fast deployment and operations of their stuff on all the clouds is a beamish way to spend the day.

If your thing is bling, pick a desktop! We’ve defaulted to GNOME, but we’re the space where KDE and GNOME and MATE and many others come together to give users real and easy choice of desktops. And if you’re feeling boned by the lack of Unity in open source, you might want to hop onto the channel and join those who are updating Unity7 for the newest X and kernel graphics in 18.04.

And of course, if your thing is actually a thing with internet smarts, then it’s Ubuntu Core that will get you flying (or driving or gatewaying or routing or, well, anything your thing desires) in a snap.

It takes a booky brilliance to shine, and we celebrate brilliance in all its forms in our community. Thanks to the artists and the advocates, the brains and the documenters, the councils and yes, the crazies who find entirely new ways to contribute, Ubuntu grows and reflects the depth and breadth of free software. For many upstream projects, Ubuntu represents the way most users will enjoy their contribution to society. That’s a big responsibility, and one we take seriously. Leave the bolshy, blithe and branky BS aside, and let’s appeal to all that’s brave and bonzer as we shape the platform on which others will build.

It’s builders that we celebrate – the people that build our upstream applications and packages, the people who build Ubuntu, and the people who build on Ubuntu. In honour of that tireless toil, our mascot this cycle is a mammal known for it’s energetic attitude, industrious nature and engineering prowess. We give it a neatly nerdy 21st century twist in honour of the relentless robots running Ubuntu Core. Ladies and gentlemen, I give you 18.04 LTS, the Bionic Beaver.

by Mark Shuttleworth at October 24, 2017 01:24 PM

October 20, 2017

Jonathan Lange

Category theory in everyday life

I was going to write a post about how knowing some abstract algebra can help you write clearer programs.

Then I saw Eugenia Cheng’s excellent talk, Category Theory in Everyday Life, which was a keynote at Haskell Exchange 2017.

It’s excellent. She says what I wanted to say much better than I could, and says many more things that I wouldn’t have thought to say at all. You should watch it.

The talk assumes very little technical or mathematical knowledge, and certainly no knowledge of Haskell.

by Jonathan Lange at October 20, 2017 11:00 PM

October 12, 2017

Jonathan Lange

SPAKE2 in Haskell: How Haskell Helped

Porting SPAKE2 from Python to Haskell helped me understand how SPAKE2 worked, and a large part of that is due to specific features of Haskell.

What’s this again?

As a favour for Jean-Paul, I wrote a Haskell library implementing SPAKE2, so he could go about writing a magic-wormhole client. This turned out to be much more work than I expected. Although there was a perfectly decent Python implementation for me to crib from, my ignorance of cryptography and the lack of standards documentation for SPAKE2 made it difficult for me to be sure I was doing the right thing.

One of the things that made it easier was the target language: Haskell. Here’s how.

Elliptic curves—how do they work?

The arithmetic around elliptic curves can be slow. There’s a trick where you can do the operations in 4D space, rather than 2D space, which somehow makes the operations faster. Brian’s code calls these “extended points”. The 2D points are called “affine points”.

However, there’s a catch. Many of the routines can generate extended points that aren’t on the curve for that we’re working in, which makes them useless (possibly dangerous) for our cryptography.

The Python code deals with this using runtime checks and documentation. There are many checks of isoncurve, and comments like extended->extended.

Because I have no idea what I’m doing, I wanted to make sure I got this right.

So when I defined ExtendedPoint, I put whether or not the point is on the curve (in the group) into the type.

e.g.

-- | Whether or not an extended point is a member of Ed25519.
data GroupMembership = Unknown | Member

-- | A point that might be a member of Ed25519.
data ExtendedPoint (groupMembership :: GroupMembership)
  = ExtendedPoint
  { x :: !Integer
  , y :: !Integer
  , z :: !Integer
  , t :: !Integer
  } deriving (Show)

This technique is called phantom types.

It means we can write functions with signatures like this:

isExtendedZero :: ExtendedPoint irrelevant -> Bool

Which figures out whether an extended point is zero, and we don’t care whether it’s in the group or not.

Or functions like this:

doubleExtendedPoint
  :: ExtendedPoint preserving
  -> ExtendedPoint preserving

Which says that whether or not the output is in the group is determined entirely by whether the input is in the group.

Or like this:

affineToExtended
  :: AffinePoint
  -> ExtendedPoint 'Unknown

Which means that we know that we don’t know whether a point is on the curve after we’ve projected it from affine to extended.

And we can very carefully define functions that decide whether an extended point is in the group or not, which have signatures that look like this:

ensureInGroup
  :: ExtendedPoint 'Unknown
  -> Either Error (ExtendedPoint 'Member)

This pushes our documentation and runtime checks into the type system. It means the compiler will tell me when I accidentally pass an extended point that’s not a member (or not proven to be a member) to something that assumes it is a member.

When you don’t know what you are doing, this is hugely helpful. It can feel a bit like a small child trying to push a star-shaped thing through the square-shaped hole. The types are the holes that guide how you insert code and values.

What do we actually need?

Python famously uses “duck typing”. If you have a function that uses a value, then any value that has the right methods and attributes will work, probably.

This is very useful, but it can mean that when you are trying to figure out whether your value can be used, you have to resort to experimentation.

inbound_elem = g.bytes_to_element(self.inbound_message)
if inbound_elem.to_bytes() == self.outbound_message:
   raise ReflectionThwarted
pw_unblinding = self.my_unblinding().scalarmult(-self.pw_scalar)
K_elem = inbound_elem.add(pw_unblinding).scalarmult(self.xy_scalar)

Here, g is a group. What does it need to support? What kinds of things are its elements? How are they related?

Here’s what the type signature for the corresponding Haskell function looks like:

generateKeyMaterial
  :: AbelianGroup group
  => Spake2Exchange group  -- ^ An initiated SPAKE2 exchange
  -> Element group  -- ^ The outbound message from the other side (i.e. inbound to us)
  -> Element group -- ^ The final piece of key material to generate the session key.

This makes it explicit that we need something that implements AbelianGroup, which is an interface with defined methods.

If we start to rely on something more, the compiler will tell us. This allows for clear boundaries.

When reverse engineering the Python code, it was never exactly clear whether a function in a group implementation was meant to be public or private.

By having interfaces (type classes) enforced by the compiler, this is much more clear.

What comes first?

The Python SPAKE2 code has a bunch of assertions to make sure that one method isn’t called before another.

In particular, you really shouldn’t generate the key until you’ve generated your message and received one from the other side.

Using Haskell, I could put this into the type system, and get the compiler to take care of it for me.

We have a function that initiates the exchange, startSpake2:

-- | Initiate the SPAKE2 exchange. Generates a secret (@xy@) that will be held
-- by this side, and transmitted to the other side in "blinded" form.
startSpake2
  :: (AbelianGroup group, MonadRandom randomly)
  => Spake2 group
  -> randomly (Spake2Exchange group)

This takes a Spake2 object for a particular AbelianGroup, which has our password scalar and protocol parameters, and generates a Spake2Exchange for that group.

We have another function that computes the outbound message:

-- | Determine the element (either \(X^{\star}\) or \(Y^{\star}\)) to send to the other side.
computeOutboundMessage
  :: AbelianGroup group
  => Spake2Exchange group
  -> Element group

This takes a Spake2Exchange as its input. This means it is _impossible_ for us to call it unless we have already called startSpake2.

We don’t need to write tests for what happens if we try to call it before we call startSpake2, in fact, we cannot write such tests. They won’t compile.

Psychologically, this helped me immensely. It’s one less thing I have to worry about getting right, and that frees me up to explore other things.

It also meant I had to do less work to be satisfied with correctness. This one line type signature replaces two or three tests.

We can also see that startSpake2 is the only thing that generates random numbers. This means we know that computeOutboundMessage will always return the same element for the same initiated exchange.

Conclusion

Haskell helped me be more confident in the correctness of my code, and also gave me tools to explore the terrain further.

It’s easy to think of static types as being a constraint the binds you and prevents you from doing wrong things, but an expressive type system can help you figure out what code to write.

by Jonathan Lange at October 12, 2017 11:00 PM

June 03, 2017

Jonathan Lange

SPAKE2 in Haskell: What is SPAKE2?

Last post, I discussed how I found myself implementing SPAKE2 in Haskell. Here, I want to discuss what SPAKE2 is, and why you might care.

I just want to send a file over the internet

Long ago, Glyph lamented that all he wanted to do was send a file over the internet. Since then, very little has changed.

If you want to send a file to someone, you either attach it to an email, or you upload it to some central, cloud-based service that you both have access to: Google Drive; Dropbox; iCloud; etc.

But what if you don’t want to do that? What if you want to send a file directly to someone, without intermediaries?

First, you need to figure out where they are. That’s not what SPAKE2 does.

Once you have figured out where they are, you need:

  1. Their permission
  2. Assurance that you are sending the file to the right person
  3. A secure channel to send the actual data

SPAKE2 helps with all three of these.

What is SPAKE2?

SPAKE2 is a password-authenticated key exchange (PAKE) protocol.

This means it is a set of steps (a protocol) to allow two parties to share a session key (“key exchange”) based on a password that they both know (“password-authenticated”).

There are many such protocols, but as mentioned last post, I know next to nothing about cryptography, so if you want to learn about them, you’ll have to go elsewhere.

SPAKE2 is designed under a certain set of assumptions and constraints.

First, we don’t know if the person we’re talking to is the person we think we are talking to, but we want to find out. That is, we need to authenticate them, and we want to use the password to do this (hence “password-authenticated”).

Second, the shared password is expected to be weak, such as a PIN code, or a couple of English words stuck together.

What does this mean?

These assumptons have a couple of implications.

First, we want to give our adversary as few chances as possible to guess the password. The password is precious, we don’t want to lose it. If someone discovers it, they could impersonate us or our friends, and gain access to precious secrets.

Specifically, this means we want to prevent offline dictionary attacks (where the adversary can snapshot some data and run it against all common passwords at their leisure) against both eavesdropping adversaries (those snooping on our connection) and active adversaries (people pretending to be our friend).

Second, we don’t want to use the password as the key that encrypts our payload. We need to use it to generate a new key, specific to this session. If we re-use passwords, eventually we’ll send some encrypted content for which the plaintext content is known, the eavesdropper will find this and be able to brute force at their leisure.

How does SPAKE2 solve this?

To explain how SPAKE2 solves this, it can help to go through a couple of approaches that definitely do not work.

For example, we could just send the password over the wire. This is a terrible idea. Not only does it expose the password to eavesdroppers, but it also gives us no evidence that the other side knows the password. After all, we could send them the password, and they could send it right back.

We need to send something over the wire that is not the password, but that could only have been generated with the password.

So perhaps our next refinement might be that we send our name, somehow cryptographically signed with password.

This is better than just sending the password, but still leaves us exposed to offline dictionary attacks. After all, our name is well-known in plain text, so an eavesdropper can look out for it in the protocol, snaffle up the ciphertext, and then run a dictionary against it at their leisure. It also leaves open the question of how we will generate a session key.

SPAKE2 goes a few steps further than this. Rather than sending a signed version of some known text, each side sends an “encrypted” version of a random value, using the password as a key.

Each side then decrypts the value it receives from the other side, and then uses its random value and the other random value as inputs to a hash function that generates a session key.

If the passwords are the same, the session key will be the same, and both sides will be able to communicate.

That is the shorter answer for “How does SPAKE2 work?”. The longer answer involves rather a lot of mathematics.

Show me the mathematics

When I was learning SPAKE2, this was a bit of a problem for me. I hit three major hurdles.

  1. Notation—maths just has obscure notation
  2. Terminology—maths uses non-descriptive words for concepts
  3. Concepts—some are merely unfamiliar, others genuinely difficult

In this post, I want to help you over all of these hurdles, such that you can go and read papers and blog posts by people who actually understand what they are talking about. This means that I’ll try to go out of my way to explain the notation and terminology while also going through the core concepts.

I want to stress that I am not an expert. What you’re reading here is me figuring this out for myself, with a little help from my friends and the Internet.

Overview

We can think of SPAKE2 as having five stages:

  1. Public system parameters, established before any exchange takes place
  2. A secret password shared between two parties
  3. An exchange of data
  4. Using that exchange to calculate a key
  5. Generating a session key

We’ll deal with each in turn.

System parameters

First, we start with some system parameters. These are things that both ends of the SPAKE2 protocol need to have baked into their code. As such, they are public.

These parameters are:

  • a group, \(G\), of prime order \(p\)
  • a generator element, \(g\), from that group
  • two arbitrary elements, \(M\), \(N\), of the group

What’s a group? A group \((G, +)\) is a set together with a binary operator such that:

  • adding any two members of the group gets you another member of the group (closed under \(+\))
  • the operation + is associative, that is \(X + (Y + Z) = (X + Y) + Z\) (associativity)
  • there’s an element, \(0\), such that \(X + 0 = X = 0 + X\) for any element x in the group (identity)
  • for every element, \(X\), there’s an element \(-X\), such that \(X + (-X) = 0\) (inverse)

It’s important to note that \(+\) stands for a generic binary operation with these properties, not necessarily any kind of addition, and \(0\) stands for the identity, rather than the numeral 0.

To get a better sense of this somewhat abstract concept, it can help to look at a few concrete examples. These don’t have much to do with SPAKE2 per se, they are just here to help explain groups.

The integers with addition form a group with \(0\) as the identity, because you can add and subtract (i.e. add the negative) them and get other integers, and because addition is associative.

The integers with multiplication are not a group, because what’s the inverse of 2?

But the rational numbers greater than zero with multiplication do form a group, with 1 as the identity.

Likewise, integers with multiplication modulo some fixed number do form a group—a finite group. For example, for integers with multiplication modulo 7, the identity is 1, multiplication is associative, and the inverse of 2 is 4, because \((2 \cdot 4) \mod 7 = 1\).

But but! When we are talking about groups in the abstract, we’ll still call the operation \(+\) and the identity \(0\), even if the implementation is that the operation is multiplication.

But but but! This is not at all a universally followed convention, so when you are reading about groups, you’ll often see the operation written as a product (e.g. \(XY\) or \(X \cdot Y\) instead of \(X + Y\)) and the identity written as \(1\).

Still with me?

Why groups?

You might be wondering why we need this “group” abstraction at all. It might seem like unnecessary complexity.

Abstractions like groups are a lot like the programming concept of an abstract interface. You might write a function in terms of an interface because you want to allow for lots of different possible implementations. Doing so also allows you to ignore details about specific concrete implementations so you can focus on what matters—the external behaviour.

It’s the same thing here. The group could be an elliptic curve, or something to do with prime numbers, or something else entirely—SPAKE2 doesn’t care. We want to define our protocol to allow lots of different underlying implementations, and without getting bogged down in how they actually work.

For SPAKE2, we have an additional requirement for the group: it is finite and has a prime number of elements. We’ll use \(p\) to refer to this number—this is what is meant by “of prime order \(p\)” above.

Due to the magic of group theory [1], this gives \(G\) some bonus properties:

  • it is cyclic, we can generate all of the elements of the group by picking one (not the identity) and adding it to itself over and over
  • it is abelian, that is \(X + Y = Y + X\), for any two elements \(X\), \(Y\) in \(G\) (commutativity)

Which explains what we mean by “a generator element”, \(g\), it’s just an element from the group that’s not the identity. We can use it to make any other element of the group by adding it to itself. If the group weren’t cyclic, we could, in general, only use \(g\) to generate a subgroup.

The process of adding an element to itself over and over is called scalar multiplication [2]. In mathematical notation, we write it like this:

\begin{equation*} Y = nX \end{equation*}

Or slightly more verbosely like:

\begin{equation*} Y = n \cdot X \end{equation*}

Where \(n\) is an integer and \(X\) is a member of the group, and \(Y\) is the result of adding \(X\) to itself \(n\) times.

If \(n\) is 0, \(Y\) is \(0\). If \(n\) is 1, \(Y\) is \(X\).

Just as sometimes the group operator is written with product notation rather than addition, so to scalar multiplication is sometimes written with exponentiation, to denote multiplying a thing by itself n times. e.g.

\begin{equation*} Y = X^n \end{equation*}

I’m going to stick to the \(n \cdot X\) notation in this post, and I’m always going to put the scalar first.

Also, I am mostly going to use upper case (e.g. \(X\), \(Y\)) to refer to group elements (with the exception of the generator element, \(g\)) and lower case (e.g. \(n\), \(k\)) to refer to scalars. I’m going to try to be consistent, but it’s always worth checking for yourself.

Because the group \(G\) is cyclic, if we have some group element \(X\) and a generator \(g\), there will always be a number, \(k\), such that:

\begin{equation*} X = k \cdot g \end{equation*}

Here, \(k\) would be called the discrete log of \(X\) with respect to base \(g\). “Log” is a nod to exponentiation notation, “discrete” because this is a finite group.

Time to recap.

SPAKE2 has several public parameters, which are

  • a group, \(G\), of prime order \(p\), which means it’s cyclic, abelian, and we can do scalar multiplication on it
  • a generator element, \(g\), from that group, that we will do a lot of scalar multiplication with
  • two arbitrary elements, \(M\), \(N\), of the group, where no one knows the discrete log [3] with respect to \(g\).

Shared secret password

The next thing we need to begin a SPAKE2 exchange is a shared secret password.

In human terms, this will be a short string of bytes, or a PIN.

In terms of the mathematical SPAKE2 protocol, this must be a scalar, \(pw\).

How we go from a string of bytes to a scalar is completely out of scope for the SPAKE2 paper. This confused me when trying to implement SPAKE2 in Haskell, and I don’t claim to fully understand it now.

We HKDF expand the password in order to get a more uniform distribution of scalars [4]. This still leaves us with a byte string, though.

To turn that into an integer (i.e. a scalar), we simply base256 decode the byte string.

This gives us \(pw\), which we use in the next step.

Data exchange

At this point, the user has entered a password and we’ve converted it into a scalar.

We need some way to convince the other side that we know the password without actually sending the password to them.

This means two things:

  1. We have to send them something based on the password
  2. We get to use all of the shiny mathematics we introduced earlier

The process for both sides is the same, but each side needs to know who’s who. One side is side A, and other is side B, and how they figure out which is which happens outside the protocol.

Each draw a random scalar between \(0\) and \(p\): \(x\) for side A, \(y\) for side B. They then use that to generate an element: \(X = x \cdot g\) for side A, \(Y = y \cdot g\) for side B.

They then “blind” this value by adding it to one of the elements that make up the system parameters, scalar multiplied by \(pw\), our password.

Thus, side A makes \(X^{\star} = X + pw \cdot M\) and side B makes \(Y^{\star} = Y + pw \cdot N\).

They then each send this to the other side and wait to receive the equivalent message.

Again, the papers don’t say how to encode the message, so python-spake2 just base-256 encodes it and has side A prepend the byte A (0x41) and side B prepend B (0x42).

Calculating a key

Once each side has the other side’s message, they can start to calculate a key.

Side A calculates its key like this:

\begin{equation*} K_A = x \cdot (Y^{\star} - pw \cdot N) \end{equation*}

The bit inside the parentheses is side A recovering \(Y\), since we defined \(Y^{\star}\) as:

\begin{equation*} Y^{\star} = Y + pw \cdot N \end{equation*}

We can rewrite that in terms of \(Y\) by subtracting \(pw \cdot N\) from both sides:

\begin{equation*} Y = Y^{\star} - pw \cdot N \end{equation*}

Which means, as long as both sides have the same value for \(pw\), can swap in \(Y\) like so:

\begin{align*} K_A& = x \cdot Y \\ & = x \cdot (y \cdot g) \\ & = xy \cdot g \end{align*}

Remember that when we created \(Y\) in the first place, we did so by multiplying our generator \(g\) by a random scalar \(y\).

Side B calculates its key in the same way:

\begin{align*} K_B& = y \cdot (X^{\star} - pw \cdot N) \\ & = y \cdot X \\ & = y \cdot (x \cdot g) \\ & = yx \cdot g \\ & = xy \cdot g \end{align*}

Thus, if both sides used the same password, \(K_A = K_B\).

Generating a session key

Both sides now have:

  • \(X^{\star}\)
  • \(Y^{\star}\)
  • Either \(K_A\) or \(K_B\)
  • \(pw\), or at least their own opinion of what \(pw\) is

To these we add the heretofore unmentioned \(A\) and \(B\), which are meant to be identifiers for side A and side B respectively. Each side presumably communicates these to each other out-of-band to SPAKE2.

We then hash all of these together, using a hash algorithm, \(H\), that both sides have previously agreed upon, so that:

\begin{equation*} SK = H(A, B, X^{\star}, Y^{\star}, pw, K) \end{equation*}

Where \(K\) is either \(K_A\) or \(K_B\).

I don’t really understand why this step is necessary—why not use \(K\)? Nor do I understand why each of the inputs to the hash is necessary—\(K\) is already derived from \(X^{\star}\), why do we need \(X^{\star}\)?

In the code, we change this ever so slightly:

\begin{equation*} SK = H(H(pw), H(A), H(B), X^{\star}, Y^{\star}, K) \end{equation*}

Basically, we hash all of the variable length fields to make them fixed length to avoid collisions between values. [5]

python-spake2 uses SHA256 as the hash algorithm for this step. I’ve got no idea why this and not, say, HKDF.

And this is the session key. SPAKE2 is done!

Did SPAKE2 solve our problem?

We wanted a way of authenticating a remote connection using a password, without having to share that password, and without using that password to encrypt known plaintext. We’ve done that.

The SPAKE2 protocol above will result in two sides negotiating a shared session key, sending only randomly generated data over the wire.

Is it vulnerable to offline dictionary attacks? No. The value we send over the wire is just a random group element encrypted with the password. Even if an eavesdropper gets that value and runs a dictionary against it, they’ll have no way of determining whether they’ve cracked it or not. After all, one random value looks very much like another.

Where to now?

I’m looking forward to learning about elliptic curves, and to writing about what it was like to use Haskell in particular to implement SPAKE2.

I learned a lot implementing SPAKE2, then learned a lot more writing this post, and have much to learn still.

But perhaps the biggest thing I’ve learned is that although maths isn’t easy, it’s at least possible, and that sometimes, if you want to send a file over the Internet, what you really need is a huge pile of math.

Let me know if I’ve got anything wrong, or if this inspires you do go forth and implement some crypto papers yourself.

Thanks

This post owes a great deal to Brian Warner’s “Magic Wormhole” talk, to Jake Edge’s write-up of that talk, and to Michel Abdalla and David Pointcheval’s paper “Simple Password-Based Encrypted Key Exchange Protocols”.

Bice Dibley, Chris Halse Rogers, and JP Viljoen all read early drafts and provided helpful suggestions. This piece has been much improved by their input. Any infelicities are my own.

I wouldn’t have written this without being inspired by Julia Evans. Julia often shares what she’s learning as she learns it, and does a great job at making difficult topics seem approachable and fun. I highly recommend her blog, especially if you’re into devops or distributed systems.

[1]I used to know the proof for this but have since forgotten it, so I’m taking this on faith for now.
[2]With scalar multiplication, we aren’t talking about a group, but rather a \(\mathbb{Z}\)-module. But at this point, I can’t even, so look it up on Wikipedia if you’re interested.
[3]Taking this on faith too.
[4]Yup, faith again.
[5]I only sort of understand why this is necessary.

by Jonathan Lange at June 03, 2017 11:00 PM

May 26, 2017

Jonathan Lange

SPAKE2 in Haskell: the journey begins

There’s a joke about programmers that’s been doing the rounds for the last couple of years:

We do these things not because they are easy, but because we thought they would be easy.

This is about how I became the butt of a tired, old joke.

My friend Jean-Paul decided to start learning Haskell by writing a magic-wormhole client.

magic-wormhole works in part by negotiating a session key using SPAKE2: a password-authenticated key exchange protocol, so one of the first things Jean-Paul needed was a Haskell implementation.

Eager to help Jean-Paul on his Haskell journey, I volunteered to write a SPAKE2 implementation in Haskell. After all, there’s a pure Python implementation, so all I’d need to do is translate it from Python to Haskell. I know both languages pretty well. How hard could it be?

Turns out there are a few things I hadn’t really counted on.

I know next to nothing about cryptography

Until now, I could summarise what I knew about cryptography into two points:

  1. It works because factoring prime numbers is hard
  2. I don’t know enough about it to implement it reliably, I should use proven, off-the-shelf components instead

This isn’t really a solid foundation for implementing crypto code. In fact, it’s a compelling argument to walk away while I still had the chance.

My ignorance was a particular liability here, since python-spake2 assumes a lot of cryptographic knowledge. What’s HKDF? What’s Ed25519? Is that the same thing as Curve25519? What’s NIST? What’s q in this context?

python-spake2 also assumes a lot of knowledge about abstract algebra. This is less of a problem for me, since I studied a lot of that at university. However, it’s still a problem. Most of that knowledge has sat unused for fifteen or so years. Dusting off those cobwebs took time.

My rusty know-how was especially obvious when reading the PDFs that describe SPAKE2. Mathematical notation isn’t easy to read, and every subdiscipline has its own special variants (“Oh, obviously q means the size of the subgroup. That’s just convention.”)

For example, I know that what’s in spake2/groups.py is the multiplicative group of integers modulo n, and I know what “the multiplicative group of integers modulo n” means, but I understand about 2% of the Wikipedia page on the subject, and I have even less understanding about how the group is relevant to cryptography.

The protocol diagrams that appear in the papers I read were a confusing mess of symbols at first. It took several passes through the text, and a couple of botched explanations to patient listeners on IRC before I really understood them. These diagrams now seem clear & succinct to me, although I’m sure they could be written better in code.

python-spake2 is idiosyncratic

The python-spake2 source code is made almost entirely out of object inheritance and mutation, which makes it hard for me to follow, and hard to transliterate into Haskell, where object inheritance and mutation are hard to model.

This is a very minor criticism. With magic-wormhole and python-spake2, Warner has made creative, useful software that solves a difficult problem and meets a worthwhile need.

crypto libraries rarely have beginner-friendly documentation

python-spake2 isn’t alone in assuming cryptographic knowledge. The Haskell library cryptonite is much the same. Most documentation I could find about various topics on the web pointed to cr.yp.to pages, which either link to papers or C code.

I think this is partly driven by a concern for user safety, “if you don’t understand it, you shouldn’t be using it”. Maybe this is a good idea. The problem is that it can be hard to know where to start in order to gain that understanding.

To illustrate, I now sort of get how an elliptic curve might form a group, but have forgotten enough algebra to not know about what subgroups there are, how that’s relevant to the implementation of ed25519, how subgroups and groups relate to fields, to say nothing of how elliptic curve cryptography actually works.

I don’t really know where to go to remedy this ignorance, although I’m pretty sure doing so is within my capabilities, I just need to find the right book or course to actually teach me these things.

Protocols ain’t protocols

The mathematics of SPAKE2 are fairly clearly defined, but there is a large gap between “use this group element” and sending some bits over the wire.

python-spake2 doesn’t clearly distinguish between the mathematics of SPAKE2 and the necessary implementation decisions it makes in order to be a useful networked protocol.

This meant that when translating, it was hard to tell what was an essential detail and what was accidental detail. As Siderea eloquently points out, software is made of decisions. When writing the Haskell version, which decisions do I get to make, and which are forced upon me? Must this salt be the empty string? Can I generate the “blind” any way I want?

Eventually, I found a PR implementing SPAKE2 (and SPAKE2+, SPAKE2-EE, etc.) in Javascript. From the discussion there, I was able to synthesize a rough standard for implementing.

Jean-Paul helped by writing an interoperability test harness, which gave me an easy way to experiment with design choices.

Conclusion

Happily, as of this weekend, I’ve been able to overcome my lack of knowledge of cryptography, the idiosyncracies of python-spake2, the documentation quirks of crypto libraries, and the lack of a standard for SPAKE2 on the network to implement SPAKE2 in Haskell, first with NIST groups, then with Ed25519.

No doubt much could be better—I would very much welcome feedback, whether it’s about my Haskell, my mathematics, or my documentation—but I’m pretty happy with the results.

This has been a fun, stretching, challenging exercise. Even though it took more time and was more difficult than I expected, it has been such a privilege to be able to tackle it. Not only have I learned much, but I also feel much more confident in my ability to learn hard things.

I hope to follow up with more posts, covering:

  • just what is SPAKE2, and why should I care?
  • how can I use SPAKE2 (and especially, haskell-spake2)?
  • what was it like to write a Haskell version of a Python library?
  • what’s up with Ed25519? (this is somewhat ambitious)

by Jonathan Lange at May 26, 2017 11:00 PM

January 31, 2017

Jonathan Lange

Announcing haskell-cli-template

Last October, I announced servant-template, a cookiecutter template for creating production-ready Haskell web services.

Almost immediately after making it, I wished I had something for building command-line tools quickly. I know stack comes with a heap of them, but:

  • it’s hard to predict what they’ll do
  • adding a new template requires submitting a PR
  • cookiecutter has existed for ages and is pretty much better in every way

So I made haskell-cli-template. It’s very simple, it just makes a Haskell command-line project with some tests, command-line parsing, and a CircleCI build.

I wanted to integrate logging-effect, but after a few months away from it my tired little brain wasn’t able to figure it out. I like command-line tools with logging controls, so I suspect I’ll add it again in the future.

Let me know if you use haskell-cli-template to make anything cool, and please feel free to fork and extend.

by Jonathan Lange at January 31, 2017 12:00 AM

January 30, 2017

Jonathan Lange

Announcing graphql-api: Haskell library for GraphQL

Late last year, my friend Tom tried to convince me that writing REST APIs was boring and repetitive and that I should give this thing called GraphQL a try.

I was initially sceptical. servant, the REST library that I’m most familiar with, is lovely. Its clever use of Haskell’s type system means that all the boring boilerplate I’d have to write in other languages just goes away.

However, after watching Lee Byron’s Strange Loop talk on GraphQL I began to see his point. Being able to get many resources with the same request is very useful, and as someone who writes & runs servers, I very much want clients to ask only for the data that they need.

The only problem is that there isn’t really a way to write GraphQL servers in Haskell—until now.

Introducing graphql-api

Tom and I put together a proof-of-concept GraphQL server implementation called graphql-api, which we released to Hackage today.

It lets you take a GraphQL schema and translate it into a Haskell type that represents the schema. You can then write handlers that accept and return native Haskell types. graphql-api will take care of parsing and validating your user queries, and Haskell’s type system will make sure that your handlers handle the right thing.

Using graphql-api

Say you have a simple GraphQL schema, like this:

type Hello {
  greeting(who: String!): String!
}

which defines a single top-level type Hello that contains a single field, greeting, that takes a single, required argument who.

A user would query it with something like this:

{
  greeting("World")
}

And expect to see an answer like:

{
  "data": {
    "greeting": "Hello World!"
  }
}

To do this in Haskell with GraphQL, first we’d define the type:

type Hello = Object "Hello" '[]
  '[ Argument "who" Text :> Field "greeting" Text ]

And then a handler for that type:

hello :: Handler IO Hello
hello = pure greeting
 where
   greeting who = pure ("Hello " <> who <> "!")

We can then run a query like so:

queryHello :: IO Response
queryHello = interpretAnonymousQuery @Hello hello "{ greeting(who: \"World\") }"

And get the output we expect.

There’s a lot going on in this example, so I encourage you to check out our tutorial to get the full story.

graphql-api’s future

Tom and I put graphql-api together over a couple of months in our spare time because we wanted to actually use it. However, as we dug deeper into the implementation, we found we really enjoyed it and want to make a library that’s genuinely good and helps other people do cool stuff.

The only way to do that, however, is to release it and get feedback from our users, and that’s what we’ve done. So please use graphql-api and tell us what you think. If you build something cool with it, let us know.

For our part, we want to improve the error messages, make sure our handling for recursive data types is spot on, and smooth down a few rough edges.

Thanks

Tom and I want to thank J. Daniel Navarro for his great GraphQL parser and encoder, which forms the basis for what we built here.

About the implementation

graphql-api is more-or-less a GraphQL compiler hooked up to type-based executing (aka “resolving”) engine that’s heavily inspired by Servant and uses various arcane type tricks from GHC 8.

We tried to stick to implementing the GraphQL specification. The spec is very well written, but implementing it has taught us that GraphQL is not at all as simple as it looks at first.

I can’t speak very well to the type chicanery, except to point you at the code and at the Servant paper.

The compiler mostly lives in the GraphQL.Internal.Validation module. The main idea is that it takes the AST and translates it into a value that cannot possibly be wrong.

All the syntax stuff is from the original graphql-haskell, with a few fixes, and a tweak to guarantee that Name values are guaranteed to be correct.

by Jonathan Lange at January 30, 2017 12:00 AM

January 12, 2017

Jonathan Lange

Announcing grafanalib

Late last year, as part of my work at Weaveworks, I published grafanalib, a Python DSL for building Grafana dashboards.

We use it a lot, and it’s made our dashboards much nicer to maintain. I’ve written a blog post about it that you can find it on the Weaveworks blog.

by Jonathan Lange at January 12, 2017 12:00 AM

December 15, 2016

Elliot Murphy

Dec 15, 2016

Another morning run, and more dolphin. Wish there was a way to identify them. Got a picture of the sailboat named floating point. Can't remember the name of this typeface.

by Elliot Murphy at December 15, 2016 05:47 PM

Dec 14, 2016

There was some extra time before Darcy had to leave for the day, so we jumped in the boat a few minutes after sunrise. Best day yet for dolphin viewing, several dolphins were chasing fish toward us in the intersection of several canals. Darcy got some great photos of the dolphins playing, and observed how great it is to be able to enjoy your food.

by Elliot Murphy at December 15, 2016 05:46 PM

December 14, 2016

Elliot Murphy

Dec 13, 2016

Dec 13, 2016

Notified a client about an impeding outage that I noticed during an audit today, and did some studying on the Elm language. After the rainstorm finished I took a break from studying and hopped in the boat. Full moon, and it was bright enough to read. The river was a black mirror. Dolphins were out hunting fish with barely a ripple. Saw the largest shooting star I've ever seen, and the clouds over the barrier island in the Atlantic were full of heat lightning.

by Elliot Murphy at December 14, 2016 08:49 PM

December 12, 2016

Elliot Murphy

Dec 12, 2106

Headed out shortly after sunup today. Water was absolutely glassy, the winds from the weekend were gone. Dodged half a dozen large coconuts as we headed out.

Went past Sampson island into the main river because it was so calm. After clearing the mangroves could see straight to the bottom.

On the way back in the water was so smooth you could see a distinct V ahead on the surface as if some invisible boat. I guessed more dolphins, and sure enough there were two more herding fish back into the docks, lazily circling and diving.

by Elliot Murphy at December 12, 2016 02:42 PM

Dec 11, 2016

Another late night run after spending a fun afternoon at the zoo. Headed to the end of the canal and was delighted by a couple of Atlantic bottlenose dolphins chasing fish.

by Elliot Murphy at December 12, 2016 02:37 PM

Dec 8, 2016

Tonight I headed through the west canal,and it turned out to run the length of the next island. Saw a couple of amusing boat names. A ketch named "floating point" with a legacy computer typeface, and another sailboat named "satellite office".

by Elliot Murphy at December 12, 2016 02:36 PM

Dec 10, 2016

Woke up with energy today, and dressed quickly for the day. Even though I had to wear a suit for today, decided to risk a quick inspection of the island during the early morning light to assess whether we could bring anyone over. Heading back I saw Darcy sitting outside enjoying her morning coffee substitute, and I can't think of a nicer sight.

In the afternoon most of the family took naps, and then I pleaded with everyone to go for a quick island visit before dusk. We all piled on, including the two dogs. Some dogs on the canal bank started barking at some ducks, and daisy hopped in the water to join in. Cut the motor, the boat went over the dog, everyone stayed calm and we got turned around. Daisy was paddling in circles and we made it over and got her back in the boat.

Arriving at the island everyone enjoyed walking around on the trails and seeing the mangroves and pines and campsites. Daisy dried off by rolling in the white sand.

by Elliot Murphy at December 12, 2016 02:28 PM

Dec 9, 2016

Quite windy today, 12 to 15 mph gusts on the river. Small craft advisory and whitecaps. Headed out to the main channel around midnight to see how the Boston whaler skiff handled compared to our old proline with higher freeboard. Waves were 2-3 feet and the skiff definitely couldn't take anything bigger. Little boat on a following swell is always a thrill as it goosenecks and swerves sliding into the trough and you look directly into the side of the waves. Wish I could see some flying fish.

by Elliot Murphy at December 12, 2016 02:18 PM

Dec 7, 2016

Late at night before I made it out in the boat. Quiet, half moon and cloudy. The river was absolutely slippery with fish. Headed south, and saw the bioluminescence was stronger. Killed the lights and watched streaks of light through the water all around the boat as the schools darted and churned.

by Elliot Murphy at December 12, 2016 02:13 PM

December 07, 2016

Elliot Murphy

Dec 6, 2016

Intermittent heavy rains today. Vey glad I bought the hand operated bilge pump, handy to empty the bilge twice during the day. After we returned in the evening I finished provisioning the ubiqity network gear for our mobile network and went out in the boat just before midnight. Then moon was just setting, and the mangroves were still dripping from rain. Headed past the nature preserve into the river, shut off the stern light and just looked at the stars. Stirring a stick in the water and still captivated by the bioluminescence.

by Elliot Murphy at December 07, 2016 05:26 AM

December 06, 2016

Elliot Murphy

Dec 5, 2016

5 day streak. Tonight after getting home everyone agreed to go with me, and we set out on the boat again north on the great canal. Incredibly calm night with a bright moon but many clouds. All the boat lights are so pretty and we saw several docks with underwater lights. One even had a heron standing near the light fishing.

by Elliot Murphy at December 06, 2016 06:23 AM

Dec 4, 2016

Woke up at 6AM and recorded a time lapse of the sunrise at the beach. Then rushed back and headed out in the boat. Rushed too much and realized I left the keys behind after casting off. Tried to lasso a cleat and missed as the wind caught the bimini top. Happily there was an emergency paddle in the anchor locker, so 15 minutes of paddling later I made it back to the dock and headed out to our Sunday appointments. After lunch tried another boat run with rhaiza. Brought the dogs and headed north. Many folks putting lights on their boats. After passing under the bridge one of the dogs got lost their footing and went into the water. Showed rhaiza the overboard procedure, threw out a floating cushion, and got the dog back aboard none the worse for wear.

by Elliot Murphy at December 06, 2016 06:20 AM