Playing Traffic Cop with Fly-Replay

The fly.io balloon traffic router
Image by Annie Ruygt

Fly.io is a platform for compute. You can do a bunch more than just run your average web app! Check out the Machines platform and see how your business might run on Fly.io.

The Fly Replay header is deceptively simple. All your app has to do is respond with a header, and the HTTP request gets re-ran somewhere else.

It’s behind-the-scenes of some pretty interesting apps on Fly.io (we wrote about using it with Globally Distributed Postgres).

We often bring it up when answering questions by those enamored with the Machines platform.

So, here’s a use case I think is pretty neat.

But first: What is it?

All public network traffic headed into Fly.io goes through the Fly Proxy. The proxy has features! One of those features involves looking for a fly-replay header in responses.

The fly-replay header tells the Fly Proxy to replay an HTTP request somewhere else. This gives your applications some power.

Depending on the value your app gives the fly-replay header, the Fly Proxy can replay the initial HTTP request on another app, in a different region, on a specific VM, or a mix of those things. This only works for sending apps within the same Fly.io organization.

Here’s what that looks like.

Replay in a Different Region:

I’m going to steal from the Globally Distributed Postgres article (and the corresponding docs).

If you have a “leader” database with a bunch of read-replicas, you typically need write queries to go to the leader.

If an HTTP request (e.g. POST /foo) results in writes to your database, then sending that request to a VM near the leader database has benefits - it’s way faster than opening DB connection across the globe.

To do this, your application can return a header that looks like this:

fly-replay: region=sjc

Replay in Other Apps:

You may have a bunch of apps - perhaps because each of your customers gets an app, or your have some micro services, or whatever crazy scheme you trapped yourself into.

You can route requests to specific apps:

fly-replay: app=some-app

Replay in Specific VMs:

Maybe you want requests to go to specific VM’s! I’ve used this to make sure requests after a file upload landed on the same server.

The fly-replay was a quick way to accomplish that:

fly-replay: instance=00bb33ff

Since Machines can scale down to zero (stop on exit), you can also use this as a tricky way to wake them up - just ship it an HTTP request!

There’s more you can do than just these examples, so definitely RTFM.

Something about a traffic cop?

We’re going to make a “proxy” - a little app that just responds with a fly-replay header. It’ll tell the Fly Proxy to replay the HTTP request on a different app.

This is useful if you, for example, point *.example.org to that router and have a specific app respond to a request - perhaps based on the hostname.

This particular use case of mine is a bit like a load balancer - a “reverse proxy”, but with some code instead of configuration.

I like Go for HTTP plumbing, so let’s do some of that. We’re going to write the type of “toy” app that accidentally stays in production for 14 years.

This “proxy” app will check the request hostname against a database of known apps, and route the request as needed.

The full(ish) code is here.

It’s basically just this

The important logic is this bit of standard Go HTTP stuff:

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
    // We'll find a customer based on `r.Host`
    customer, err := Find(r.Host)

    if err != nil {
        // Customer not found
        w.Header().Set("fly-replay", "app=our-default-app")
    } else {
        // Replay the request on customer's app
        w.Header().Set("fly-replay", fmt.Sprintf("app=%s", customer.App))
    }
})

Go’s HTTP library does a prefix match on HTTP URI’s, so "/" will match anything, which is just what we want.

All we do is find a customer (based on hostname) and respond with a replay header.

This is great when paired with a SQLite database, as (trigger warning) reads from the local disk are pretty quick relative to network stuff.

The Find function is just a sql query (but super verbose, because Golang):

func Find(host string) (*Customer, error) {
    row := db.QueryRow(`SELECT id, host, app, instance
      FROM customers
      WHERE host = ?`, host)

    customer := Customer{}
    err := row.Scan(
      &customer.Id,
      &customer.Host,
      &customer.App,
      &customer.Instance
    )

    if err != nil {
        return nil, fmt.Errorf("no customer found: %w", err)
    }

    return &customer, nil
}

Locally, the whole round trip of the HTTP request + database lookup took ~4ms. In the real world, it added ~100ms to hit this proxy and replay the request against another Fly.io app (my crufty blog).

To test this out, I ran a few curl requests:

# Get replayed against the default app
curl -i -H "Host: fake.fideloper.com" https://proxycentral.fly.dev

# Get replayed against an app that is registered
curl -i -H "Host: c1.fideloper.com" https://proxycentral.fly.dev

Preventing Direct Access

In this scenario, we want the “proxy” app to be available publicly, while keeping customer apps private.

However, the Fly Proxy needs to know where apps are listening when it directs HTTP requests to them. Therefore, we need to define services in the fly.toml file.

You also might be dynamically creating apps, in which case you don’t need a fly.toml file, but will be defining services via Machine API calls.

Luckily, we can keep the apps private while still telling the Fly Proxy how to reach them. The easiest way is to create the app without any public IP addresses via the fly launch command:

cd /path/to/my/app

# Ensure we're using appsv2 (Machines) platform
# Older accounts default to an older platform
fly orgs apps-v2 default-on <org-slug>

fly launch --no-public-ips

The flag --no-public-ips is the key there. However, it requires the newer Machines-based apps platform. Also, if you’re creating apps via the Machines API, having no public IP’s is the default.

Now the customer apps are private, and the Fly Proxy can still replay requests against them.

App Discovery

I used a SQLite database to map domains to apps. If this proxy ran globally, I could have used LiteFS for distributed SQLite across multiple regions.

Another fun possibility is (ab)using Fly’s .internal addresses to check for the existence of apps (or application instances) via DNS.

Perhaps we could have pinged this occasionally and created/updated an in-memory map of apps and hostnames! Here’s two DNS queries that would have been useful for that:

# List apps in the same org
dig TXT _apps.internal

# List apps and their VM instances
# in the same org
dig TXT _instances.internal

So this is pretty neat! The fly-replay header is a simple solution that gives you the ability to do some really neat stuff - particularly within globally distributed apps.