Platform Engineering: Private Networking

Now Hiring: Intern; Level 1; Level 2; Senior

Fly.io runs apps close to their users around the world, by converting Docker containers into VMs running on our hardware around the world.

Networking at Fly.io is pretty funky.

Our hosts are racked hardware that we manage, and they’re all linked together in a WireGuard mesh. Once traffic hits our network, everything is WireGuard, until we spit the response back out.

Every app running on Fly.io is linked to a customer-specific IPv6 private network (we call them 6PNs). That’s how apps talk to each other. Fly Postgres, for instance, just boots up a Postgres VM that only knows how to talk on its attached 6PN.

Customers can talk directly to their apps’ 6PN services. They do that by bringing up a WireGuard tunnel to one of our gateways, where they’re bridged into their 6PN. We use WireGuard for everything, so much so that we built the WireGuard client and a full TCP/IP stack into our flyctl CLI so that we could bring up on-demand tunnels without asking the OS for permission.

Underneath all of this is a bit of eBPF code, which makes the routing and access control simple and easy to reason about.

If this stuff sounds interesting, we’re building a new team that will own all of it, and take it in new directions. Here’s some of what we want to do:

On-demand load WireGuard peers when connections for them come in, rather than keeping everything installed in the kernel all the time.
Allow WireGuard peers to “float” across multiple gateways, rather than having them locked to particular regions, which is how it works today.
Figuring out what it means to Anycast WireGuard, and then doing that.
Building a state-sharing scheme (or something else!) that will allow us to run failover pairs for particular gateways. This is tricky! We don’t run a routing protocol for this stuff.
Integrating our private networks with our new scoped Macaroon tokens.
Improving the UX for flyctl, the only developer CLI in the industry that runs its own TCP/IP stack.

This work is pretty low level. There’s packet parsing involved. We’re not afraid to crack open WireGuard packets if we have to. There’s some routing involved. Lots of distributed state.

The codebase you’d be dropping into is Go, BPF-flavored C, and Rust, in that order. The C will become Rust, and we’re not religious about languages, so new serverside code can be Rust as well. You’ll need to be pragmatic and open-minded about languages though: you can’t hate Rust or Go and be comfortable in this role. We have all-Rust and all-Go roles if that’s what you’re looking for; this won’t be one of those.

Some things you should know about us:

We’re ruthless about working on stuff that our users will see and care about, to the exclusion of a lot of engineering formalism. “How will this immediately help users?” is a standard we hold ourselves to, even when it makes us uncomfortable.
We’re on call, 24/7. Everyone shares a rotation (a couple days every 6 weeks or so, right now). We’ve chosen a cortisol-intensive domain to work in: when our stuff breaks, our users notice, and because we’re global, they notice in every time zone.
We don’t care what the cool kids are using. We’re addicted to code that works, right away, with minimal ceremony. We like SQLite, and we get nervous when people talk about Raft. The engineering culture here is pragmatic to what Hacker News would consider a fault.

This is a mid to senior level job. The salary ranges from $120k to $200k USD. We also offer competitive equity grants.

We’re remote-first, with team members in Colorado, Quebec, Chicago, London, Mexico, Spain, Virginia, Brazil, and Utah. Most internal communication is written, and often asynchronous. You’ll want to be comfortable with not getting an immediate response for everything.

How We Hire

We’re weird about hiring. We’re skeptical of resumes and we don’t trust interviews (we’re happy to talk, though). We respect career experience but we aren’t hypnotized by it, and we’re thrilled at the prospect of discovering new talent.

The premise of our hiring process is that we’re going to show you the kind of work we’re doing and then see if you enjoy actually doing it; “work-sample challenges”. Unlike a lot of places that assign “take-home problems”, our challenges are the backbone of our whole process; they’re not pre-screeners for an interview gauntlet.

For this role, we’re asking people to write us a small proxy in Go or Rust that does just a couple of interesting things (we’ll tell you more). We’re looking for people who are super-comfortable with Go or Rust and network programming in general, but we’re happy to bring people up to speed with the domain-specific stuff in Fly.io.

If you’re interested, mail jobs+6pn@fly.io. Tell us a bit about yourself, if you like, and also tell us your least favorite IPv6 feature, just so we know you’re not a bot.

Work From: Anywhere