API Tokens: A Tedious Survey

Image by Annie Ruygt

We’re Fly.io. This post isn’t about Fly.io, but you have to hear about us anyways, because my blog, my rules. Our users ship us Docker containers and we transmute them into Firecracker microvms, which we host on our own hardware around the world. With a working Dockerfile, getting up and running will take you less than 10 minutes.

This is not really a post about Fly.io, though I’ll talk about us a little up front to set the scene.

The last several weeks of my life have been about API security. I’m working on a new permissions system for Fly.io, and did a bunch of research into my options. We even recorded a podcast about it. I won’t leave you hanging and tell you right up front: we’re working on rolling out a Macaroon-based scheme, which you’ll read more about later.

This post is long. You may be interested in just one kind of token. I’ll make easy for you: here’s a table of contents:

  1. Simple Random Tokens
  2. Platform Tokens
  3. OAuth 2.0
  4. JWT
  5. PASETO
  6. Protobuf Tokens
  7. Authenticated Requests
  8. Facebook CATs
  9. Macaroons
  10. Biscuits

Fly.io is an application hosting platform. Think of us as having a control plane that applications running on Fly.io interact with, and an API that our users interact with — mostly through our CLI, flyctl. It’s that flyctl piece we’re talking about here.

Today, Fly.io API access is all-or-nothing. Everyone has root credentials. What we want is fine-grained permissions. Here’s two big problems we want to solve:

  • Our API publishes Prometheus metrics. You can point Grafana at it and start building dashboards. You’d like to give Grafana Cloud a credential that lets them host dashboards without mucking with your apps.
  • You’d like to give a contractor access to an app without letting them get access to secrets from other applications.

This is the API job people generally refer to as IAM. There are a bunch of different ways to do the IAM job, and they’re all fun to nerd out about.

Let’s First Clarify Some Stuff

What I’m interested in here is API security, for end-users; “retail” security.

There’s a closely related API security problem I’m not talking about: inter-service authentication. Modern applications are comprised of ensembles of small services. Ideally, there’s a security layer between them. But nobody does retail API IAM with Kerberos or mTLS. If you want to read more about these approaches, I wrote a long post about them elsewhere.

Another related problem is federated authentication and single sign-on. Google, Apple, and Okta will give you tokens that map requests to identities on their platforms. Those token formats are relevant here, but I want to be clear that federated identity is not what I’m after.

Most API security schemes boil down to a token that accompanies API requests. The tokens are somehow associated with access rules. The API looks takes the request, extracts the token, finds the access rules, and decides how to proceed.

Some questions to keep in your brain as you read through this:

  • How do we revoke tokens? Credentials get compromised. If you can’t revoke a token, you can’t recover from a compromise.
  • How often are we hitting the database to satisfy requests? In a microservice contraption, it’s painful to give every service direct access to the database.
  • Are we introducing vulnerabilities? Software developers will make every possible mistake. Muzzle discipline: don’t point the guns at our feet.

Let’s Take Passwords Off The Table

It’s 2021 and so I don’t need to tell you that having your API pass a username and password through HTTP basic authentication is a bad idea. Your tokens should look large and random, whatever they are.

Simple Random Tokens: Unsung Heroes

Here is a token generator that, from a security perspective, is pretty hard to beat:

>>> binascii.hexlify(os.urandom(16))
b'46d684a052c29cdce14c7e03e19da0f9'

Keep a table of random tokens, associate them with a table of users, and associate those users with allowed actions. You don’t need me to tell you how to do this; it’s simply how CRUD apps work.

What you might need me to tell you is that this is a good way, even over the long term, to handle the IAM problem. Random tokens aren’t cryptographically scary. They’re easily revoked and expired. The accompanying permissions logic is clean and expressive; it’s just your API code.

Frankly, the biggest knock against simple random tokens is that they’re boring. If you can get away with using them — and most applications can — you probably should. Give yourself permission by saying you’re doing it for security reasons. Security is a problem for all the fancy tokens I’m going to talk about from now on.

Platform Tokens

Assume we’re trying to minimize the fraction of requests that have to hit the database. Mainstream web application frameworks tend to already have features that help with this.

Rails, for instance, has MessageVerifier and MessageEncryptor. Give them a bag of attributes and get back a tamper-proof (optionally encrypted) string, using HMAC-SHA2 and encrypt-then-MAC AES-CBC. Put the string in a cookie. The server only remembers a root secret, and can pull user data out of the cookie instead of the database. This is how Rails sessions work.

Python frameworks have similar features, but also the excellent Python pyca/cryptography libraries, which include Fernet, which provides the same functions optimized for tokens.

You can’t generally use general-purpose user sessions for API tokens (the defining feature of an API token is that it doesn’t log out). But you can use the same features to build API tokens. Share the root secret among multiple services — maybe that’s fine — and microservices don’t have to rely on a central service.

Platform tokens are relatively simple, and can be stateless. What’s the catch? Well, you’re effectively using tokens as a database cache, and cache consistency is frustrating.

Right off the bat, you’ve lost the simplest form of token revocation. The whole premise is that you’re not validating tokens against the database, so now you have to come up with another way to tell if they’ve been revoked. Without a standard protocol for renewing them, short-expiry tokens don’t work either.

A pattern I’ve seen a bunch here, and one that I kind of like, is to “version” the users. Stick a token version in the user table, have tokens bear the current version. To revoke, bump the version in the database; outstanding tokens are now invalid. Of course, you need to keep state to do that, but the state is very cheap; a Redis cache of user versions, falling back to the database, does the trick.

OAuth 2

All exhibits and addenda attached previously to the section on “Platform Tokens” is hereby incorporated into this section and made a part thereof.

By design, OAuth is a federation protocol. Canonically, OAuth lets a 3rd party post a tweet with your account. That’s not the problem we’re trying to solve.

But OAuth 2.0 is popular and has bumped into every tedious problem you’re likely to encounter with tokens, and they’ve come up with solutions of varying levels of grossness. You can, and lots of people do, draft off that work in your own API IAM situation.

For instance, OAuth 2.0 has a built-in solution for short-expiry tokens; OAuth 2.0 has a “Refresh Token”, which exchanges for “Access Tokens”. Access Tokens are the ones you actually do stuff in the API with, and they expire rapidly. OAuth 2.0 libraries know how to use Refresh Tokens. And they’re easy to revoke, because they’re used less frequently and don’t punish the database.

OAuth 2.0 Access Tokens are opaque strings, so you can do the same things with them that you would with a Platform Token (or just stuff a Platform Token in there).

The “cryptography” in OAuth 2.0, such as it is, is simple. It gets tricky in standalone single-page applications, but so does everything else. I used to snark about people cargo-culting OAuth into simple client-server apps. Not anymore.

JSON Web Tokens

A brief history lesson. We got OAuth, and apps could tweet on behalf of users, and God saw what he had made and it was good. Then someone realized that if you could post a tweet on behalf of a user, you could use that capability as a proof of identity and “log users in with Twitter”. The tweet itself became extraneous and people just used OAuth tokens that could, like, read your user profile as an identity proof.

This is a problem because the ability to read your user profile isn’t a good identity proof. You might grant that capability to applications for reasons having nothing to do with whether they can “log in with Twitter” to a dating app. People found a bunch of vulnerabilities.

Enter OpenID Connect (OIDC). OIDC is the demon marriage of OAuth 2.0 and a cryptographic token standard called JWT. OIDC’s is unambiguous: it gives you an “Identity Token”, JWT-encoded, that tells you who’s logging in.

We’re not so much interested in OIDC here, but the eldritch rituals that brought OIDC into being unleashed a horde of JWTs into the world, and that’s now a thing we have to think about.

From a purely functional perspective, JWT isn’t doing much more than a Platform Token embedded in OAuth 2.0. But JWT is standardized, and “JSON encrypted with Fernet and embedded in OAuth Access Token” isn’t, and so a whole lot of dev UX has sprung up around JWT. So, unfortunately, JWT has really good ergonomics.

What makes that unfortunate? JWT is bad.

This is not a post about why JWT is bad, though I do hope you come away from this agreeing with me. So I’ll be brief.

First, JWT is a design-by-committee cryptographic kitchen sink. JWTs can be protected with a MAC, like HMAC-SHA2. Or with RSA digital signatures. Or encrypted, with static-ephemeral P-curve elliptic curve Diffie Hellman. This isn’t so much a footgun as it is the entire Rock Island Arsenal deployed against your feet. If you’re an aficionado of crypto vulnerabilities, you almost have to love it. Where else outside of TLS are you going to find invalid curve point attacks?

Next, the JSON semantics of JWT are not thoughtfully designed. JWT doesn’t bind purpose or even domain parameters to keys, and JWT libraries are written with the assumption that RSA and HMAC-SHA2 are just interchangeable solutions to the same problem. So you get bugs where people take RSA-signed JWTs and switch the JWT header from RS256 to HS256 (don’t even get me started on these names), and the libraries obliviously treat public signing keys as private MAC keys. Also, there’s alg=none.

JWT is so popular that it has become synonymous with the concept of stateless authentication tokens, despite the fact that stateless tokens are straightforward without (and were in wide use prior to) JWT.

There’s a sense in which complaining about JWT is just howling at the moon, because it’s non-optional in OIDC, and OIDC is how Google and Apple implement single sign-on. Friend-of-our-dumb-podcast Jonathan Rudenberg has a good observation about this: if your application retains direct connectivity to (say) Apple, you can somewhat safely use OIDC JWT simply by dint of trusting the TLS connection you have to Apple’s servers; you don’t so much even need to care about the cryptographic misfeatures of the token itself.

Aside: Never, Ever SAML

There are rituals even demons won’t stomach. OIDC’s competitor is SAML, which is based on XML DSIG, which is a way of turning XML documents into signed tokens. You should not turn XML documents into signed tokens. You should not sign XML. XML DSIG is the worst cryptographic format in common use on the Internet. Take all the flaws JWT, including the extensive parsing of untrusted data just to figure out how to verify stuff. Mix in a DOM model where a single document could potentially have dozens of different signed subtrees, then add a pluggable canonicalization layer that transforms documents before they’re signed. Make it complicated enough that there is essentially a single C-language implementation of the spec that every SAML library wraps. You’re obviously not going to use to authenticate your API, but, in case you can’t tell, I’m getting some stuff out of my system here.

PASETO

PASETO (rhymes with “potato”) is hipster JWT. I mean that in the nicest way. It has essentially the same developer UX as JWT, but tries to lock the token into modern cryptography.

JWT is a cryptographic kitchen sink. PASETO is the smaller bathroom vanity sink. I’m critical here, because PASETO has, for some good reasons, done well among token nerds and doesn’t need my help.

There are today four versions, each of which define two kinds of token, a symmetric “local” and an asymmetric “public”. Version 1 uses “NIST-compliant” AES-CTR, HMAC-SHA2, and RSA. Version 2 has XChaPoly and Ed25519. Version 3 replaces RSA with a P-384 ECDSA. Version 4 replaces XChaPoly with XChaCha and a Blake2 KMAC. You can swap v4 with v4c to use CBOR instead of JSON. It’s a lot.

My issue with PASETO is that it’s essentially the same thing as JWT. You could almost build it from JWT, by adding some algorithms and banning some others.

PASETO advocates for the now-accepted practice of versioning whole protocols rather negotiating parameters on the fly. That should be a powerful advantage. But PASETO has 8 versions, 4 of which are “current”, and I think part of the idea of protocol versioning that PASETO misses is that you’re not supposed to keep multiple versions flying around. Versions 3 and 4 are partly the result of a vulnerability (not a super serious one) Thai Duong found. PASETO libraries support multiple versions, in some cases dynamically. Kill the old versions!

The IRTF CFG is the IETF’s cryptography review board. For reasons I will never understand, the PASETO authors submitted it to CFRG for consideration. Never do this. In the thread, Neil Madden pointed out that it had managed to inherit JWT’s RSA/HMAC problem; all the PASETO versions now have an “Algorithm Lucidity” warning telling people to make sure they’re strongly typing their keys.

I don’t think this is PASETO’s fault so much as I think that the fundamental idea is an impossible trinity: cryptographic flexibility, cryptographic misuse-resistance, and Javascript-y developer UX.

Also: the “NIST-compliant” PASETO versions were an unforced error.

I’m peeved by JSON tokens that authenticate bags of random user attributes alongside token metadata like issuance dates and audiences. Cryptography engineers hear me rant about this and scratch their heads, but I think they’re mostly thinking about OIDC JWTs that carry minimal data, and not all the weird JWTs developers cook up, where user data mingles with metadata. This, too, seems like an unforced error for me. So does the fact that a lot of this metadata is optional. Why? It’s important!

Still, you’re far better off using PASETO than JWT. My take regarding PASETO is that if you use it, you should find real lucidity about whether you want symmetric or asymmetric tokens; they’re two different things with different use cases. Support just one version.

Protocol Buffer Tokens: The Anti-PASETO

You can get essentially the thing PASETO tries to do, without any of the downsides, just by defining your own strongly typed protocol format. David Adrian calls these “Protobuf Tokens”.

All you do is, define a Protocol Buffer schema that looks like this:

message SignedToken {
  bytes signature = 0;
  bytes token = 1;  
}
message Token {
  string userId = 0;
  uint64 not_before = 1;
  uint64 not_after = 2;
  // and other stuff
}

Push all your token semantics into the Token message, and marshal it into a string with a first pass of Protobuf encoding. Sign it with Ed25519 (concatenate a version string like “Protobuf-Token-v1” into the signature block), stick the token byte string in the token field of a SignedToken, and populate the signature. Marshal again, and you’re done.

This two-pass encoding gives you two things. First, there’s only one way to decode and verify the tokens. Second, everything in the token is signed, so there’s no ambiguity about metadata being signed. The tokens are compact, easy to work with, and can be extended (Protocol Buffers are good at this) to carry arbitrary optional claims.

Authenticated Requests

You don’t need tokens at all. You can instead just have keys. Use them to authenticate requests. That’s how the AWS API works.

We tend to send normal HTTP requests to our APIs, and pass an additional header carrying a “bearer token”. Bearer tokens are like bearer bonds, in that if you have your bear paws on them, it’s game over. Authenticated requests don’t have this problem.

To do this, you need a canonicalization scheme for your HTTP requests. The same HTTP request has multiple representations; we need to decide on just one to compute a MAC tag. This seems easy but was a source of vulnerabilities in early implementations of the AWS scheme. Just use AWS’s Version 4.

Compute an HMAC over the canonicalized request (with AWS, you’d just use your AWS_SECRET_ACCESS_KEY) and attach the resulting tag as a parameter.

God help you, you could use X509 here and issue people certificates and keys they can use to sign requests, which is a thing Facebook apparently did internally.

There are nice things about authenticated requests. No bearer tokens, no bears. The biggest problem is logistical: it’s a pain to build request authenticating code, so, unless your app gets huge, the only way to talk to it will be with your official SDK that does all the request signing work.

Facebook’s CATs

So, here’s a cool trick. You’ve got a bunch of services, like Messages and Photos and Presence and Ivermectin Advocacy. And you’ve got a central Authentication service, to which both your services and your users can talk.

Authentication holds a root key. Messages comes on line, and makes (say) an identity-proving mTLS connection to Authentication. It’s issued a service key, which is HMAC(k=root, v=“Messages”).

Now a user “Alice” arrives. Authentication issues her a key. It’s HMAC(k=HMAC(k=root, v=“Messages”), v=“Alice”).

CAT diagram

See what we did there? Messages doesn’t have Alice’s key. But her key is simply the HMAC of her username under the Messages key, so the service can reconstruct it and verify the message.

You can use a CATS-like construction to sign requests, or to sign a Protobuf Token (with HMAC or an AEAD, rather than Ed25519). You’re getting some of the decoupling advantage of public key cryptography. Messages requires only sporadic contact with Authentication, to enroll themselves and periodically rotate keys. That’s enough to authenticate requests from all comers, trusting that the only way Alice got her key was if Authentication OK’d it.

If this was a post about deploying on Fly.io, you’d have been done 22 minutes ago.

We don’t need 5,000 words to tell you how to get an application running close to users around the world, from Sydney to Amsterdam. All it takes is a working Dockerfile

Try it for free

Macaroons

We can go for a walk where it’s quiet and dry and talk about Macaroons.

Imagine a golden ticket for your service, an authenticated token permitting any action. It’s much too dangerous to pass around as a bearer token.

Now imagine adding caveats to that golden token. You’re allowed only to read, not to write. Only for a single document. Only on a request from a specific IP, or on a session independently authenticated to a specific user ID. This attenuated token is much less dangerous. In fact, you might get it so locked down that it’s not even sensitive.

We exploit the same trick CAT uses to derives user keys. Start with your golden ticket and HMAC it under a root key. Now you want to make it read-only, so you add another message layer to the token, and you MAC that new layer, using the MAC tag of the previous layer as the key. The holder of the new token can’t work out the original MAC tag of the golden ticket; the token carries only the new chained MAC tag. But services have the root key and can re-derive all the intermediate values.

Macaroon diagram

Macaroons are a token format built around this idea. They do three big things.

Attenuation: user can restrict tokens without talking to the issuing service. All the caveats must evaluate true. You can’t undo previous caveats with new ones. The service just knows about the basic caveat types and doesn’t need special-case code for all the goofy combinations users might want.

Confinement: If you have the right caveat types, you can set it up so there are useful Macaroons that are safe to pass around, because they’re only (say) valid on a session under a particular mTLS client certificate, or at a particular time of day.

Delegation: Macaroons have “third-party caveats”, which delegate logic to other systems. Third-party caveats are encrypted; users can see only the URL of a third-party service they can talk with to resolve them. The third-party system issues a “discharge Macaroon”, which is submitted alongside the original Macaroon to resolve the caveat.

These ideas synergize. You can delegate authentication to an IAM service, and then add additional service-specific access rules as first-party caveats. A revocation service verifies a user’s tokens; the rest of your system doesn’t need to know how revocation is implemented. The same goes for audit logging, and for anti-abuse.

Sometimes there’s so much beauty in the world I feel like I can’t take it, like my heart’s going to cave in.

But Macaroons are unpopular for good reasons.

First: there’s a library ecosystem for Macaroons and it’s not great. No library can support all or even most of the caveats developers will want. So “standard” Macaroons use an untyped string DSL as their caveat format and ask relying services to parse them.

They’re also clunky. With most of the previous formats, you can imagine slotting them into OAuth 2.0. But third-party caveats break that. Your Macaroon API will be fussy. Users might have to make and store the results of a bunch of queries to issue a real request.

Macaroons rely on symmetric cryptography. This is good and bad. It radically simplifies the system, but means you have to express relationships between your services with shared keys. You have to do that with HS256 JWT too, of course, but unless you depart pretty radically from the Macaroon paper, you can’t get the public key wins, without coming up with something like CAT-caroons.

In practice, caveats can be tricky to reason about. It’s easy to write a loop over a set of caveats that bombs as soon as one evaluates false. But you can accidentally introduce semantics that produce caveats that expand instead of contract authority. You’ve got code that wants to answer “can I do this?” questions by asking the database about a user ID, and you can write caveat constructions that do similar things, which is never what you want in a coherent Macaroon design.

I have more to say about these problems! For now, though, it suffices to say that I spent many years beating the drum for Macaroons, and then I went and implemented them, and I probably won’t be beating that drum anymore. But where they work well, I think they probably work really well. My take is: if all three of attenuation, confinement, and delegation resonate with your design, Macaroons will probably work fine. If you skip any of the three, consider something else.

Biscuits

Finally, there’s Geoffroy Couprie’s Biscuits. Biscuits are what you’d get if you sat down to write an over-long blog post like this one, did all the research, and then decided instead to write a cryptographic token to address the shortcomings of every other token.

Biscuits are heavily influenced by Macaroons (Couprie claims they’re JWT-influenced as well, but I don’t see it). Like Macaroons, users can attenuate Biscuits. But unlike Macaroons:

  1. Biscuits rely on public key signatures instead of HMAC, which somewhat dampens the need for third-party caveats.

  2. Rather than simple boolean caveats, Biscuits embed Datalog programs to evaluate whether a token allows an operation.

Biscuits are incredibly ambitious.

To begin with, swapping out the simple cryptography in Macaroons for public key signatures isn’t an easy task. The cryptographic process of adding a caveat to a Macaroon is trivial: you just feed the MAC tag from the previous caveat forward as the HMAC key for the new caveat. But there’s no comparably straightforward operation for signatures.

The cryptography proposed for Biscuits started with pairing curve moon math. Keller Fuchs pulled them back to low-earth orbit with curve VRFs. Then they took a detour into blockchainia with aggregated Gamma-Signatures. Ultimately, though, Biscuit’s core cryptography came back to Earth with a pretty straightforward chaining of Ed25519 signatures.

The caveat structure of Biscuit tokens is flexible, probably to a fault, but formally rigorous, which is an interesting combination. It works by evaluating a series of signed programs (compiled and marshaled with Protocol Buffers). Services derive fact patterns from requests, like “you’re asking for cats2.webp” or “the operation you’re requesting is WRITE”. The tokens themselves include rules that derive new fact patterns, and checkers that test those patterns against predicates.

Honestly, when I first read about Biscuits, I thought it was pretty nuts. If the proposal hadn’t lost me at “pairing curves”, it had by the time it started describing Datalog. But then I implemented Macaroons for myself, and now, I kind of get it. One thing Biscuits get you that no other token does is clarity about what operations a token authorizes. Rendered in text, Biscuit caveats read like policy documents.

That’s I think the only big concern I have about them. I wonder whether taking real advantage of Biscuits requires you to move essentially all your authorization logic into your tokens. Even with Macaroons, which previously held the title for “most expressive token”, the host services were still making powerful choices about what caveats could be expressed in the first place. Biscuits strips the service’s contribution to authorization policy down to what seems like their constituent atoms, and derives all security policy in Prolog. I see how that could be powerful, but also how you’d kind of have to buy into it wholesale to use it.

Now What?

Here’s a scorecard:

token scorecard

Believe it or not, with the exception of passwords and SAML, I think there’s something to like in all of these schemes.

I continue to believe that boring, trustworthy random tokens are underrated, and that people burn a lot of complexity chasing statelessness they can’t achieve and won’t need, because token databases for most systems outside of Facebook aren’t hard to scale.

A couple months ago, I’d have said that Macaroons are underrated in a different way, the way Big Star’s “#1 Record” is. Now I think there’s merely underrated like the first Sex Pistols show; everyone who read about them created their own token format. We’re moving forward with Macaroons, and I’m psyched about that, but I’d hesitate to recommend them for a typical CRUD application.

But, don’t use JWT.