Globally Distributed Object Storage with Tigris

A cartoon hot air balloon with a bindle and a bucket of files walking down a sidewalk.
Image by Annie Ruygt

We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy. That’s pretty cool, but we want to talk about something someone else built, that you can use today to build applications.

There are three hard things in computer science:

  1. Cache invalidation
  2. Naming things
  3. Doing a better job than Amazon of storing files

Of all the annoying software problems that have no business being annoying, handling a file upload in a full-stack application stands apart, a universal if tractable malady, the plantar fasciitis of programming.

Now, the actual act of clients placing files on servers is straightforward. Your framework has a feature that does it. What’s hard is making sure that uploads stick around to be downloaded later.

Enter object storage, a pattern you may know by its colloquial name “S3”. Object storage occupies a funny place in software architecture, somewhere between a database and a filesystem. It’s like malloc(), but for cloud storage instead of program memory.

S3—err, object storage — is so important that it was the second AWS service ever introduced (EC2 was not the first!). Everybody wants it. We know, because they keep asking us for it.

So why didn’t we build it?

Because we couldn’t figure out a way to improve on S3. And we still haven’t! But someone else did, at least for the kinds of applications we see on Fly.io.

But First, Some Back Story

S3 checks all the boxes. It’s trivial to use. It’s efficient and cost-effective. It has redundancies that would make a DoD contractor blush. It integrates with archival services like Glacier. And every framework supports it. At some point, the IETF should just take a deep sigh and write an S3 API RFC, XML signatures and all.

There’s at least one catch, though.

Back in, like, ‘07 people ran all their apps from a single city. S3 was designed to work for those kinds of apps. The data, the bytes on the disks (or whatever weird hyperputer AWS stores S3 bytes on), live in one place. A specific place. In a specific data center. As powerful and inspiring as The Architects are, they are mortals, and must obey the laws of physics.

This observation feels banal, until you realize how apps have changed in the last decade. Apps and their users don’t live in one specific place. They live all over the world. When users are close to the S3 data center, things are amazing! But things get less amazing the further away you get from the data center, and even less amazing the smaller and more frequent your reads and writes are.

(Thought experiment: you have to pick one place in the world to route all your file storage. Where is it? Is it Loudoun County, Virginia?)

So, for many modern apps, you end up having to write things into different regions, so that people close to the data get it from a region-specific bucket. Doing that pulls in CDN-caching things that complicated your application and put barriers between you and your data. Before you know it, you’re wearing custom orthotics on your, uh, developer feet. (I am done with this metaphor now, I promise.)

Personally, I know this happens. Because I had to build one! I run a CDN backend that’s a caching proxy for S3 in six continents across the world. All so that I can deliver images and video efficiently for the readers of my blog.

What if data was really global? For some applications, it might not matter much. But for others, it matters a lot. When a sandwich lover in Australia snaps a picture of a hamdog, the people most likely to want to see that photo are also in Australia. Routing those uploads and downloads through one building in Ashburn is no way to build a sandwich reviewing empire.

Localizing all the data sounds like a hard problem. What if you didn’t need to change anything on your end to accomplish it?

Show Me A Hero

Building a miniature CDN infrastructure just to handle file uploads seems like the kind of thing that could take a week or so of tinkering. The Fly.io unified theory of cloud development is that solutions are completely viable for full-stack developers only when they take less than 2 hours to get working.

AWS agrees, which is why they have a SKU for it, called Cloudfront, which will, at some variably metered expense, optimize the read side of a single-write-region bucket: they’ll set up a simple caching CDN for you. You can probably get S3 and Cloudfront working within 2 hours, especially if you’ve set it up before.

Our friends at Tigris have this problem down to single-digit minutes, and what they came up with is a lot cooler than a cache CDN.

Here’s how it works. Tigris runs redundant FoundationDB clusters in our regions to track objects. They use Fly.io’s NVMe volumes as a first level of cached raw byte store, and a queuing system modelled on Apple’s QuiCK paper to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3.

If your objects are less than about 128 kilobytes, Tigris makes them instantly global. By default! Things are just snappy, all over the world, automatically, because they’ve done all the work.

But it gets better, because Tigris is also much more flexible than a cache simple CDN. It’s globally distributed from the jump, with inter-region routing baked into its distribution layer. Tigris isn’t a CDN, but rather a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions.

There’s a lot going on in this architecture, and it’d be fun to dig into it more. But for now, you don’t have to understand any of it. Because Tigris ties all this stuff together with an S3-compatible object storage API. If your framework can talk to S3, it can use Tigris.

fly storage

To get started with this, run the fly storage create command:

$ fly storage create
Choose a name, use the default, or leave blank to generate one: xe-foo-images
Your Tigris project (xe-foo-images) is ready. See details and next steps with: https://fly.io/docs/reference/tigris/

Setting the following secrets on xe-foo:
AWS_REGION
BUCKET_NAME
AWS_ENDPOINT_URL_S3
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Secrets are staged for the first deployment

All you have to do is fill in a bucket name. Hit enter. All of the configuration for the AWS S3 library will be injected into your application for you. And you don’t even need to change the libraries that you’re using. The Tigris examples all use the AWS libraries to put and delete objects into Tigris using the same calls that you use for S3.

I know how this looks for a lot of you. It looks like we’re partnering with Tigris because we’re chicken, and we didn’t want to build something like this. Well, guess what: you’re right!

Compute and networking: those are things we love and understand. Object storage? We already gave away the game on how we’d design a CDN for our own content, and it wasn’t nearly as slick as Tigris.

Object storage is important. It needs to be good. We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as ✨magical✨ as Fly.io is.

This also mirrors a lot of the Unix philosophy of Days Gone Past, you have individual parts that do one thing very well that are then chained together to create a composite result. I mean, come on, would you seriously want to buy your servers the same place you buy your shoes?

One bill to rule them all

Well, okay, the main reason why you would want to do that is because having everything under one bill makes it really easy for your accounting people. So, to make one bill for your computer, your block storage, your databases, your networking, and your object storage, we’ve wrapped everything under one bill. You don’t have to create separate accounts with Supabase or Upstash or PlanetScale or Tigris. Everything gets charged to your Fly.io bill and you pay one bill per month.

This is our Valentine’s Day gift to you all. Object storage that just works. Stay tuned because we have a couple exciting features that build on top of the integration of Fly.io and Tigris that allow really unique things, such as truly global static website hosting and turning your bucket into a CDN in 5 minutes at most.

Here’s to many more happy developer days to come.