Platform Product Engineer

Now Hiring: Intern; Level 1; Level 2; Senior

Fly.io takes Docker containers from users and converts them to Firecracker micro-VMs running on our hardware around the world, hooked via WireGuard to a global Anycast network, managed Postgres databases, LVM2 volumes, an industrial-scale metrics cluster, a high-performance messaging-based logging system, all coordinated by a global orchestration system. You don’t have to know any of this to use Fly.io: if you have a working container, it can be running in Singapore, São Paolo, and Sunnyvale in just a couple minutes.

We’re usually more concise when describing Fly.io. But we’re not just trying to convince you to boot your apps up on Fly.io. If that last paragraph sounded interesting to you, what we really want to do is talk to you about working here.

The job is Platform Product Engineer. It’s software development for the core engine of our product.

The team at Fly.io includes a bunch of different roles and there’s great things about all of them. Full-stack developers get to work up-close and personal with our users, shipping immediately-visible features. The networking team is building an ambitious high-speed Rust-based proxy network. We’ve got a Kurt. We’ve got developers pushing the state of the art on platforms like Elixir, and SREs rolling out a fleet of servers around the world.

Platform Product Engineer, though: it’s the kernel of the platform. And not just because there’s kernel code involved. This job is all about sculpting infrastructure into a product customers want. The work is deeply technical, but UX is the forcing function and it goes all the way down to the metal. You gotta know what customers need and what makes a good UX.

Here’s some of what Platform is working on right now:

A new orchestration system that makes it easy, predictable, and fast to spin up instances of apps anywhere in the world in response to scaling signals and user requests, one flexible enough to be the compute engine for everything from managed databases to remote developer environments.
Volume and state management systems that can seamlessly move compute loads, along with their data, around different machines in our fleet.
Private networking that just works, without anyone writing a line of Terraform code, to connect any Docker image (and any libc) to any service anybody thinks to run on our platform, and makes peering between companies trivial and secure.
Always-on metrics, logging, and visibility features that scale to hundreds of thousands of apps and can answer questions about app behavior our users didn’t think of before deploying their apps.

This is sort of a golden moment to come work with us at Fly.io. We’ve hammered out our basic service and have a base of enthusiastic users shipping super cool stuff on it. Our team is growing but still at a point where everyone knows who everyone else is and what they’re working on. Nobody’s off in a corner on a solitary death march. It’s still easy to have a good idea here, float it to the team, and have it take off. We are having fun. For some of us, this kind of environment is why we work in startups.

It’s not all sunshine and marshmallows. Platform Engineering is a not-messing-around serious role. It doesn’t do anybody any good to sell you a bill of goods about what the work is like. So, if you’re going to be comfortable working in this role, here are some messy things we want you to know:

We’re a small team working on something ambitious. There’s a lot going on. There is some chaos. We work hard to tame it, but we don’t let its existence paralyze us or keep us from shipping. It’s a whole thing, and you’d want to be on board with it.
We’ve got a lightweight management style. At times, we feel more like a large open source project than an industrial software development team. We do 1:1’s and we keep track of what’s being worked on, but there isn’t a board you can go look at to know exactly what you’re going to be working on next week.
We’re ruthless about working on stuff that our users will see and care about. We are not ruthless about shaping and polishing our code into a radiant-cut gem of perfection. We have a “no refactoring for your first several months” rule, and you’ll remember that rule is there as you bounce around our code; there’s a lot that could be refactored.
We’re on call, 24/7. Everyone shares a rotation. We’ve chosen a cortisol-intensive domain to work in: when our stuff breaks, our users notice. We’re a chill bunch of people (many of us with families; nobody’s pulling 80 hour weeks), but our problem space is unmerciful.
We’re a helpful bunch, but all of us are learning stuff as we go along and we expect you to do the same. You’d want to be comfortable diving deep into the details of complicated systems and teaching yourself enough to solve problems. There’s a scary amount of technical complexity (and some of own-goal complexity we brought along for the ride), and we need people who won’t freeze up.
We’re not running Kubernetes, or whichever database the cool kids are using. We’re addicted to code that works, right away, with minimal ceremony. We like SQLite, and we get nervous when people talk about Raft. The engineering culture here is pragmatic to what HN would consider a fault.
Your enjoyment of a job here will hinge less on your technical skills, and more on your ability to make decisions that benefit end users. We don’t have a rigid product roadmap, we don’t have detailed issues for you to implement, and it’s gonna take some persistence to figure out what’s important to build.

We like this project very much and it’s hard to write negatively about it. We could take another 3-4 editing passes on those last bullets and make them more honest and clear about how these things might creep up and annoy you. But instead, you can do that for yourself: give them another read, and just extrapolate all the bad implications you can from them. Then ask us about them, and we’ll be candid.

It’s such a cool job, though.

More Details

This is a mid to senior level job. The salary ranges from $120k to $200k USD. We also offer competitive equity grants. Hopefully that’s enough to keep you intrigued, here’s what you should really care about:

We’re a small team, almost entirely technical.
Most of our platform code is in Go. Our networking code is in Rust. We like both languages and you very much need to be on board with both of them yourself (you don’t need to be a fluent Rust programmer for this gig, but you can’t be allergic to the idea of picking it up - or, for that matter, allergic to Go).
We are active in developer communities, including our own at community.fly.io.
Virtually all customer communication, documentation and blog posts are in writing. We are a global company, but most of our communication is in English. Clear writing in English is essential.
We are remote, with team members in Colorado, Quebec, Chicago, London, Mexico, Spain, Virginia, Brazil, and Utah. Most internal communication is written, and often asynchronous. You’ll want to be comfortable with not getting an immediate response for everything, but also know when you need to get an immediate response for something.
We are an unusually public team; you’d want to be comfortable working in open channels rather than secretively over in a dark corner.
We’re a real company - hopefully that goes without saying - and this is a real, according-to-Hoyle full-time job with health care for US employees, flexible vacation time, hardware/phone allowances, the standard stuff.

How We Hire People

We’re weird about hiring. We’re skeptical of resumes and we don’t trust interviews (we’re happy to talk, though). We respect career experience but we aren’t hypnotized by it, and we’re thrilled at the prospect of discovering new talent.

The premise of our hiring process is that we’re going to show you the kind of work we’re doing and then see if you enjoy actually doing it; “work-sample challenges”. Unlike a lot of places that assign “take-home problems”, our challenges are the backbone of our whole process; they’re not pre-screeners for an interview gauntlet.

For this role, we’re asking people to write us a small proxy that does just a couple of interesting things (we’ll tell you more). We’re looking for people who are super-comfortable with Go and network programming in general, but we’re happy to bring people up to speed with the domain-specific stuff in Fly.io.

If you’re interested, mail jobs+platform@fly.io. You can tell us a bit about yourself, if you like. Please also include your GitHub username and a sentence (yes, just one) about your least favorite Linux system call. ioctl doesn’t count.

Work From: Anywhere