Infrastructure Engineering

Now Hiring
Intern
Level 1
Level 2
Senior
Work From

This is not a typical infra role.

Infrastructure engineers are our human to metal interface. They manage the foundation that everything else is built on. It’s one of the most important roles at Fly.io and it has evolved over time.

Over the years, infra at Fly.io has taken on more responsibility for platform engineering. In the long long ago, the dividing line between “Infra Engineer” and “Platform Engineer” was that Infra owned ops, monitoring, and tooling, while Platform owned the code for the services we operate. Today, Infra has direct responsibility for a lot of that code as well; the new dividing line is between Infra “Making things run better” and Platform “Making new features”.

So, while Infra still does the bread-and-better work of monitoring and deployment and alerting, it also does core software engineering work, hacking on our orchestrator and building out our networking fabric.

This role is a good fit for you if:

  • You care about users. You can empathize with the problems they encounter using our products and build solutions that treat reliability as a feature.
  • You’re comfortable building software, using software development as a tool to solve user problems, and working with a team on large software projects.
  • You’re excited about building tools and systems to safely operate a fleet of bare metal servers that outnumber your team 1000:1.
  • You like working with all the weird Linux things, from eBPF to LVM2 to WireGuard to policy routing.
  • You’re comfortable building things that are imperfect and solve the problems we’re facing right now; our infrastructure is evolving rapidly as we scale.
  • You want to help the team develop green-field automation, infrastructure-as-code, observability and alerting work; we’re up to our eyeballs in fun infra projects.

You know you’re succeeding in this job if:

  • You bring experience, pragmatism, composure, and positivity to interactions with teams outside of the Infra Ops organization.
  • You get that code doesn’t matter unless the rest of the team knows how to use it. You have a habit of writing guides, docs, and frameworks so the wheels keep turning when you’re not online.
  • You never waste a crisis. You treat each failure as a chance to learn and improve so it doesn’t happen again. You consider it a crisis when users notice a problem before we do.
  • You know how to prioritize when everything is vying for your attention. Even if it means letting some fires burn while you put out more important ones. We’re building something big and ambitious with a very tight team; small fires are common!
  • You know the helpless feeling when us-east-1 is on fire while the status page stays green. You communicate with users clearly and transparently, the way you’d want as a user.
  • You seek out ways to take the systems and best-practices we use to run our own infrastructure and bring them to life in ways that our users can benefit.

More Details

This is an mid-career to senior level fully-remote full-time position. You can live anywhere in the world; your work hours and holidays observed are up to you. The salary ranges from $120k to $200k USD. We offer competitive equity grants with a long exercise window. Hopefully that’s enough to keep you intrigued; here’s what you should really care about:

  • We’re a small team that is deeply technical.
  • Engineers at Fly.io have an unusual amount of autonomy and decision-making power. You will be making real product decisions that directly impact users, on a daily basis, without anyone standing over your shoulder telling you what to do.
  • We are active in developer communities, including our own at community.fly.io.
  • Virtually all customer communication, documentation and blog posts are in writing. We are a global company, but most of our communication is in English. Clear writing in English is essential.
  • The infrastructure work we encounter is varied. One day we might be debugging a kernel issue and the next we’re looking at how to make our platform more observable for our users. You’re not expected to be amazing at everything, but an enthusiasm for learning is critical.

How We Hire People

We’re weird about hiring. We’re skeptical of resumes and we don’t trust traditional interviews. We respect career experience but we’re more excited about potential.

The premise of our hiring process is that we’re going to give you a series of challenges that each simulate the kind of work you’ll actually be doing here. Unlike a lot of places that assign “take-home problems”, our challenges are the backbone of our whole process; they’re not pre-screeners for an interview gauntlet. Checkout our hiring documentation to learn more than you probably ever wanted to know.

If you’re interested, send a message to jobs+infra@fly.io. You can tell us a bit about yourself, if you like. Please also include 1. your GitHub username (so we can create a private work sample repo for you) 2. your location (so we know what timezone you’re in for scheduling) and 3. a sentence about your favorite food (so we know you’re not a bot.)