Introducing Fly Kubernetes

Frankie, or Linda (I can't tell who's who), the Fly.io hot-air balloon mascot, at the helm of a sailing ship. The ocean is rolling with waves, and two shark fins are visible in pursuit of the ship. There are some unreasonably happy-looking smaller fish leaping out of the water around the ship. Frankie or Linda looks pretty chill and content.
Image by Annie Ruygt

We’re Fly.io, and if you’ve been following us awhile you probably just did a double-take. We’re building a new public cloud that runs containerized applications with virtual machine isolation on our own hardware around the world. And we’ve been doing it without any K8s. Until now!

Update, March 2024: FKS does more stuff now, and you can read about it in Fly Kubernetes does more now

We’ll own it: we’ve been snarky about Kubernetes. We are, at heart, old-school Unix nerds. We’re still scandalized by systemd.

To make matters more complicated, the problems we’re working on have a lot of overlap with K8s, but just enough impedance mismatch that it (or anything that looks like it) is a bad fit for our own platform.

But, come on: you never took us too seriously about K8s, right? K8s is hard for us to use, but that doesn’t mean it’s not a great fit for what you’re building. We’ve been clear about that all along, right? Sure we have!

Well, good news, everybody! If K8s is important for your project, and that’s all that’s been holding you back from trying out Fly.io, we’ve spent the past several months building something for you.

Fly.io For Kubernetians

Fly.io works by transmogrifying Docker containers into filesystems for lightweight hypervisors, and running them on servers we rack in dozens of regions around the world.

You can build something like Fly.io with “standard” orchestration tools like K8s. In fact, that’s what we did to start, too. To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system based on eBPF). But the ideas are the same.

The way we look at it, the signature feature of a “standard” orchestrator is the global scheduler: the global eye in the sky that keeps track of vacancies on servers and optimized placement of new workloads. That’s the problem we ran into. We’re running over 200,000 applications, and we’re doing so on every continent except Antarctica. The speed of light (and a globally distributed network of backhoes) has something to say about keeping a perfectly consistent global picture of hundreds of thousands of applications, and it’s not pleasant.

The other problem we ran into is that our Nomad scheduler kept trying to outsmart us, and, worse, our customers. It turns out that our users have pretty firm ideas of where they’d like their apps to run. If they ask for São Paulo, they want São Paulo, not Rio. But global schedulers have other priorities, like optimally bin-packing resources, and sometimes GIG looks just as good as GRU to them.

To escape the scaling and DX problems we were hitting, we rethought orchestration. Where orchestrators like K8s tend to work through distributed consensus, we keep state local to workers. Each racked server in our fleet is a source of truth about the apps running on it, and provide an API to a market-style “scheduler” that bids on resources in regions. You can read more about here, if you’re interested. We call this system the Fly Machines API.

An important detail to grok about how this all works – a reason we haven’t, like, beaten the CAP theorem by doing this – is that Fly Machines API calls can fail. If Nomad or K8s tries to place a workload on some server, only to find out that it’s filled up or thrown a rod, it will go hunt around for some other place to put it, like a good little robot. The Machines API won’t do this. It’ll just fail the request. In fact, it goes out of its way to fail the request quickly, to deliver feedback; if we can’t schedule work in JNB right now, you might want instead to quickly deploy to BOM.

Pluggable Orchestration and FKS

In a real sense what we’ve done here is extract a chunk of the scheduling problem out of our orchestrator, and handed it off to other components. For most of our users, that component is flyctl, our intrepid CLI.

But Fly Machines is an API, and anything can drive it. A lot of our users want quick answers to requests to schedule apps in specific regions, and flyctl does a fine job of that. But it’s totally reasonable to want something that works more like the good little robots inside of K8s.

You can build your own orchestrator with our API, but if what you’re looking for is literally Kubernetes, we’ve saved you the trouble. It’s called Fly Kubernetes, or FKS for short.

FKS is an implementation of Kubernetes that runs on top of Fly.io. You start it up using flyctl, by running flyctl ext k8s create.

Under the hood, FKS is a straightforward combination of two well-known Kubernetes projects: K3s, the lightweight CNCF-certified K8s distro, and Virtual Kubelet.

Virtual Kubelet is interesting. In K8s-land, a kubelet is a host agent; it’s the thing that runs on every server in your fleet that knows how to run a K8s Pod. Virtual Kubelet isn’t a host agent; it’s a software component that pretends to be a host, registering itself with K8s as if it was one, but then sneakily proxying the Kubelet API elsewhere.

In FKS, “elsewhere” is Fly Machines. All we have to do is satisfy various APIs that virtual kubelet exposes. For example, the API for the lifecycle of a pod:

type PodLifecycleHandler interface {
    CreatePod(ctx context.Context, pod *corev1.Pod) error
    UpdatePod(ctx context.Context, pod *corev1.Pod) error
    DeletePod(ctx context.Context, pod *corev1.Pod) error
    GetPod(ctx context.Context, namespace, name string) (*corev1.Pod, error)
    GetPodStatus(ctx context.Context, namespace, name string) (*corev1.PodStatus, error)
    GetPods(context.Context) ([]*corev1.Pod, error)
}

This interface is easy to map to the Fly Machines API. For example:

CreatePod -> POST /apps/{app_name}/machines
UpdatePod -> POST /apps/{app_name}/machines/{machine_id}

K3s, meanwhile, is a stripped-down implementation of all of K8s that fits into a single binary. K3s does a bunch of clever things to be as streamlined as it is, but the most notable of them is kine, an API shim that switches etcd out with databases like SQLite. Because of kine, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.

So that’s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine. We compile a kubeconfig, with which you can talk to your K3s via kubectl. We set the whole thing up to run Pods on individual Fly Machines, so your cluster scales out directly using our platform, but with K8s tooling.

One thing we like about this design is how much of the lifting is already done for us by the underlying platform. If you’re a K8s person, take a second to think of all the different components you’re dealing with: etcd, specifically provisioned nodes, the kube-proxy, a CNI binary and configuration and its integration with the host network, containerd, registries. But Fly.io already does most of those things. So this project was mostly chipping away components until we found the bare minimum: CoreDNS, SQLite persistence, and Virtual Kubelet.

We ended up with something significantly simpler than K3s, which is saying something.

Fly Kubernetes has some advantages over plain flyctl and fly.toml:

  • Your deployment is more declarative than it is with the fly.toml file. You declare the exact state of everything down to replica counts, autoscaling rules, volume definitions, and more.
  • When you deploy with Fly Kubernetes, Kubernetes will automatically make your definitions match the state of the world. Machines go down? Kubernetes will whack them back online.

This is a different way to do orchestration and scheduling on Fly.io. It’s not what everyone is going to want. But if you want it, you really want it, and we’re psyched to give it to you: Fly.io’s platform features, with Kubernetes handling configuration and driving your system to its desired state.

We’ve kept things simple to start with. There are K8s use cases we’re a strong fit for today, and others we’ll get better at in the near future, as K8s users drive the underlying platform (and particularly our proxy) forward.

Interested in getting early access? Email us at sales@fly.io and we’ll hook you up.

Not invested in K8s?

Nothing has to change for you! You can deploy apps on Fly.io today, in a matter of minutes, without talking to Sales.

Deploy an app in minutes.

What It All Means

One obvious thing it means is that you’ve got an investment in Kubernetes tooling, you can keep it while running things on top of Fly.io. So that’s pretty neat. Buy our cereal!

But the computer science story is interesting, too. We placed a bet on an idiosyncratic strategy for doing global orchestration. We replaced global consensus, which is how Borg, Kubernetes, and Nomad all work, with a market-based system. That system was faster and, importantly, dumber than the consensus system it replaced.

This had costs! Nomad’s global consensus would do truly heroic amounts of work to make sure Fly Apps got scheduled somewhere, anywhere. Like a good capitalist, Fly Machines will tell you in no uncertain terms how much work it’s willing to do for you (“less than a Nomad”).

But that doesn’t mean you’re stuck with the answers Fly Machines gives by itself. Because Fly Machines is so simple, and tries so hard to be predictable, we hoped you’d be able to build more sophisticated scheduling and orchestration schemes on top of it. And here you go: Kubernetes scheduling, as a plugin to the platform.

More to come! We’re itching to see just how many different ways this bet might pay off. Or: we’ll perish in flames! Either way, it’ll be fun to watch.