We’re building something ambitious at Fly.io: a public cloud, running on our own hardware all over the world.
Fly Machines are the engine underneath everything we build at Fly.io. They’re containers, but they run under hardware virtualization in our cloud. Machines are lightweight enough that we can stop and start them in response to incoming HTTP requests.
Fly Machines have two primary customers inside Fly.io: MPG and Sprites.
MPG is Fly.io’s Managed Postgres product, which runs clusters of Fly Machines to provide single-leader multi-replica globally distributed Postgres that our users can enable with a single command. MPG is the most difficult systems engineering problem we work on.
Sprites are semi-disposable computers for agents like Claude Code and OpenClaw to run inside. They’re Fly Machines that create instantaneously and that come up with 100GB durable filesystems, which run on an entirely new storage stack.
These are both very neat projects and the Machines team will stick you right in the middle of both of them, along with all the stuff our users do with Fly Machines directly.
This Role
We’re looking for engineers to join the team working on Fly Machines and their orchestrator, flyd.
Most of this is Golang code. It has an elegant structure. On thousands of beefy “worker” servers in our fleet, each flyd is solely responsible for its own state — every server is the source of truth for its own workloads, without a global top-down orchestrator. Under the hood, flyd is a specialized database server that durably tracks the steps in a series of fine state machines, like “create a Fly Machine” or “cordon off an existing Fly Machine”.
Fly Machines connect up with our network infrastructure, which is written primarily in Rust, through a state distribution system called Corrosion, which is an open source project you can check out for yourself.
What You’ll Be Doing
The Machines team owns the control plane and runtime that makes Fly Machines work. Engineering challenges this role owns include:
- We have a large and growing hardware fleet. Hardware is demonic and networks are outright Satanic and both fail, routinely, at exactly the worst possible times. Making
flydand our overall platform resilient to failure is the primary task of this role. - Owing to a fateful decision we made back in 2021, which another team inside of Fly is developing a time machine to send someone back to 2021 to prevent, Fly Volumes are backed directly by attached NVMe storage, which anchors Machines that have durable filesystems to specific hardware. This makes Machine migration a Fun problem (note caps). Rebalancing workloads across our fleet: another crucial job.
- We have enormous amounts of telemetry (logs, metrics, and oTel) that we are using 2% of. When things go wrong with Fly Machines, Platform Engineers can practically always fix them quickly, but our support team, who can’t execute code on our servers, can’t. Using our telemetry to predict, diagnose, and resolve problems is a huge thing for us.
At Fly.io, the jury is out on agent-based development. We read everything, with human nervous system input, before it’s merged to main. But you’re going to want to be comfortable in an agent to work here: we rely on them intensively for keeping up with the codebase.
We think these are fun problems. We can’t promise they won’t be stressful problems. If that’s a kind of bittersweet you’re interested in, let’s see if we’d work well together.
How We Hire
This is a mid-level to senior, remote, full-time position. We’re currently hiring people who live in the United States or Canada.
In order to optimize for pay equity, Fly.io doesn’t negotiate salaries. We have standardized salaries for each employee level. The salary for this role is $190 to $225k USD, depending on level. We offer competitive equity grants with a long exercise window. We provide health care benefits, flexible vacation time (with a minimum), hardware/phone allowances, the standard stuff.
Our hiring process may be a little different from what you’re used to. We respect career experience but we aren’t hypnotized by it, and we’re thrilled at the prospect of discovering new talent. So instead of resumes and interviews, we’re going to show you the kind of work we’re doing and then see if you enjoy actually doing it, with “work-sample challenges”. Unlike a lot of places that assign “take-home problems”, our challenges are the backbone of our whole process; they’re not pre-screeners for an interview gauntlet. (We’re happy to talk, though!)
There’s more about us than you probably want to know at our hiring documentation.
If you’re interested, mail jobs+platform-machines@fly.io. In the body of your email, please also include your location (US state or Canadian province), your Github username for work sample access, and a statement about your favorite food. We probably won’t respond to emails that don’t include all three items.