What if S3 could be a fast, globally synced, Key Value Database? That's Tigris

Author

Name: Jason Stiebs
@peregrine: @peregrine

We’re Fly.io and we transmute containers into VMs, running them on our hardware around the world. We have fast booting VM’s, so why not take advantage of them?

That said… building applications that span the globe is a hard problem. Heck, syncing data between two machines is not trivial. It might require writing to a “primary” and reading from a “secondary”. Or using a CRDT to “sync” state, combined with pub-sub to communicate changes.

There’s a wide selection of databases and libraries that claim to solve this class of problems. With the caveat that we need to operate this system under our application. Or, pay someone else to operate it for us. While Elixir/Phoenix has tools to make this easier, it’s no free lunch. You need to consider the many failure modes that could corrupt your data.

One solution: give up and choose a central actor to store data. Most apps today already use S3-style object storage this way. Instead of storing files in a database, we accept that we have a globally-accessible location (bucket) that stores values (file data) identified by keys (file names).

So, why don’t we treat S3 as a simple key-value store? We have cheap, strongly-consistent, bottomless storage, with wide language support.

This comes down to the speed of light. S3 makes it trivial to store as much data as you want in a single region, in a single bucket. Which is what almost every user of S3 does, this leads to a “US-EAST Privilege” for users. Closer you are to US-EAST the better your experience with the web will be, the further away the slower every website feels. This can be ameliorated with CDN but now we’ve added more cost and services to our stack.

Enter Tigris

Tigris is an S3 Compatible file storage built on Fly.io, for Fly.io users, with interesting properties:

Objects are initially stored in the region closest to the uploader say Chicago, where they are most likely to be served.
If that object is requested from a Singapore, Tigris will transparently move, then cache the object there. Like a CDN you don’t have to configure.
Every file under 128kb is cached globally.

How they do that is beyond the scope of this post but we recommend reading through their architecture documentation, its very good!

I am very interested in that last bullet point, if every object under 128kb is cached and synced globally that means we have a globally spanning Key Value store that should be crazy fast to read and write to! So let’s build a simple KV Store against Tigris.

Experiment

Building off my previous post where I broke down and built our own AWS client, I’ll be using a similar extension to Req I built as shown in this gist. Our entire api will be something like this:

    KV.get("/path/to/key") # get key contents
KV.put("/path/to/key", value) # update/create
KV.put_transaction("/path/to/key", value) # update/create with global lock on key
KV.delete("/path/to/key") # delete
KV.get("/path/to/") # list keys

  

I will spare you the details here as this is literally just reading and writing to a file and encoding the value using :erlang.term_to_binary.

Keen observers might realize we have one glaring issue and that’s data races. If we have a globally spanning cluster and two regions write to the same value at the same time, we’ll have a conflict.

To solve this, there is one function in our API that’s not like the others with put_transaction. Incredibly the BEAM comes with a function that helps us out here called :global.trans/3. It will create a “global lock on an id” and execute your callback ONLY on a single node. Our put_transaction code literally looks like:

    def put_transaction(key, value) do
  :global.trans(key, fn ->
    put(key, value)
  end, 0)
end

  

Now we have globally distributed transactions on our KV store. Normally the trans function will retry forever until it gets a global lock, in our code we tell it to not retry we’ll fail here with :aborted and let our user’s figure out what to do.

And that is that! We’re using the key based “File System” of Tigris as a KV store and so long as our values are under 128kb we’re syncing as fast as the internet allows!

Setup Tigris

Assuming you are using Fly.io already, adding Tigris to an existing app is as easy as

$ fly storage create
? Choose a name, use the default, or leave blank to generate one: demo-bucket
Your  project (demo-bucket) is ready. See details and next steps with:

Setting the following secrets on app:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
BUCKET_NAME
AWS_ENDPOINT_URL_S3

Secrets are staged for the first deployment

And that is it we have full access to the Tigris API and the common configuration variables already set. As simple as that we have a functional KV store thats durable, globally cached and for cheaper than AWS.

Brainstorming

Let’s brainstorm what else we could achieve with this, noting this is a brainstorming session no bad ideas, and not production promises.

Durable Processess

Since the read/write latency is low we can simply write our process state out to Tigris every few seconds and or on terminate. The next time the process or LiveView is loaded we have our last known good state.

Tables and Indexes Oh My!

The API is just a file system, we could use key paths as a sort of table spaces Req.get("s3://kv-bucket/users/") or create indexes of data we need to query often Req.get("s3://kv-bucket/users/idaho") keeping those indexes up to date via a Worker is an exercise for the reader.

Single Tenancy

Maybe your user’s need exclusive single tenancy, you could put_transaction a SQLite DB into a folder with a .lock key pointing to the fly machine-id that owns it. When the first user opens up the website, download db and put_transaction the .lock file. When the next user opens up the page in a new region, check the lock and redirect them to the machine that owns the DB with fly-replay. Syncing everything periodically or when the last user closes the page you’ve got a globally distributed SQLite. Obviously this would fall over if you have too much concurrency, racing to read/write the lock file, but for smaller users this could work well!

Append-only tables.

Every machine can create new files with their data in a common directory. Another machine could query that list of files and collate them into a common thread.

Wrap-up

Tigris has given a rare gift, a new way to look at an old tool, S3. Globally distributed, close to our users and fast global sync for files smaller than 128kb. Not only can we simply replace S3 in our existing stack’s we can use it in new and fresh ways!

I can’t wait to see what you build with Tigris!

Fly.io ❤️ Elixir

Fly.io is a great way to run your Phoenix LiveView apps. It’s really easy to get started. You can be running in minutes.
Deploy a Phoenix app today! →

Next post ↑: Clustering Elixir From Laptop to Cloud
Previous post ↓: Not every Dependency is worth it.