Accident Forgiveness

A ghost shocked at his cloud bill.
Image by Annie Ruygt

We’re Fly.io, a new public cloud with simple, developer-friendly ergonomics. Try it out; you’ll be deployed in just minutes, and, as you’re about to read, with less financial risk.

Public cloud billing is terrifying.

The premise of a public cloud — what sets it apart from a hosting provider — is 8,760 hours/year of on-tap deployable compute, storage, and networking. Cloud resources are “elastic”: they’re acquired and released as needed; in the “cloud-iest” apps, without human intervention. Public cloud resources behave like utilities, and that’s how they’re priced.

You probably can’t tell me how much electricity your home is using right now, and may only come within tens of dollars of accurately predicting your water bill. But neither of those bills are all that scary, because you assume there’s a limit to how much you could run them up in a single billing interval.

That’s not true of public clouds. There are only so many ways to “spend” water at your home, but there are indeterminably many ways to land on a code path that grabs another VM, or to miskey a configuration, or to leak long-running CI/CD environments every time a PR gets merged. Pick a practitioner at random, and I bet they’ve read a story within the last couple months about someone running up a galactic-scale bill at some provider or other.

Implied Accident Forgiveness

For people who don’t do a lot of cloud work, what all this means is that every PR push sets off a little alarm in the back of their heads: “you may have just incurred $200,000 of costs!”. The alarm is quickly silenced, though it’s still subtly extracting a cortisol penalty. But by deadening the nerves that sense the danger of unexpected charges, those people are nudged closer to themselves being the next story on Twitter about an accidental $200,000 bill.

The saving grace here, which you’ll learn if you ever become that $200,000 story, is that nobody pays those bills.

See, what cloud-savvy people know already is that providers have billing support teams, which spend a big chunk of their time conceding disputed bills. If you do something luridly stupid and rack up costs, AWS and GCP will probably cut you a break. We will too. Everyone does.

If you didn’t already know this, you’re welcome; I’ve made your life a little better, even if you don’t run things on Fly.io.

But as soothing as it is to know you can get a break from cloud providers, the billing situation here is still a long ways away from “good”. If you accidentally add a zero to a scale count and don’t notice for several weeks, AWS or GCP will probably cut you a break. But they won’t definitely do it, and even though your odds are good, you’re still finding out at email- and phone-tag scale speeds. That’s not fun!

Explicit Accident Forgiveness

Charging you for stuff you didn’t want is bad business.

Good business, we think, means making you so comfortable with your cloud you try new stuff. You, and everyone else on your team. Without a chaperone from the finance department.

So we’re going to do the work to make this official. If you’re a customer of ours, we’re going to charge you in exacting detail for every resource you intentionally use of ours, but if something blows up and you get an unexpected bill, we’re going to let you off the hook.

Not So Fast

This is a Project, with a capital P. While we’re kind of kicking ourselves for not starting it earlier, there are reasons we couldn’t do it back in 2020.

The Fully Automated Accident-Forgiving Billing System of the Future (which we are in fact building and may even one day ship) will give you a line-item veto on your invoice. We are a long ways away. The biggest reason is fraud.

Sit back, close your eyes, and try to think about everything public clouds do to make your life harder. Chances are, most of those things are responses to fraud. Cloud platforms attract fraudsters like ants to an upturned ice cream cone. Thanks to the modern science of cryptography, fraudsters have had a 15 year head start on turning metered compute into picodollar-granular near-money assets.

Since there’s no bouncer at the door checking IDs here, an open-ended and automated commitment to accident forgiveness is, with iron certainty, going to be used overwhelmingly in order to trick us into “forgiving” cryptocurrency miners. We’re cloud platform engineers. They’re our primary pathogen.

So, we’re going to roll this out incrementally.

Why not billing alerts? We’ll get there, but here too there are reasons we haven’t yet: (1) meaningful billing alerts were incredibly difficult to do with our previous billing system, and building the new system and migrating our customers to it has been a huge lift, a nightmare from which we are only now waking (the billing system’s official name); and (2) we’re wary about alerts being a product design cop-out; if we can alert on something, why aren’t we fixing it?

Accident Forgiveness v0.84beta

All the same subtextual, implied reassurances that every cloud provider offers remain in place at Fly.io. You are strictly better off after this announcement, we promise.

I added the “almost” right before publishing, because I’m chicken.

Now: for customers that have a support contract with us, at any level, there’s something new: I’m saying the quiet part loud. The next time you see a bill with an unexpected charge on it, we’ll refund that charge, (almost) no questions asked.

That policy is so simple it feels anticlimactic to write. So, some additional color commentary:

We’re not advertising a limit to the number of times you can do this. If you’re a serious customer of ours, I promise that you cannot remotely fathom the fullness of our fellow-feeling. You’re not annoying us by getting us to refund unexpected charges. If you are growing a project on Fly.io, we will bend over backwards to keep you growing.

How far can we take this? How simple can we keep this policy? We’re going to find out together.

To begin with, and in the spirit of “doing things that won’t scale”, when we forgive a bill, what’s going to happen next is this: I’m going to set an irritating personal reminder for Kurt to look into what happened, now and then the day before your next bill, so we can see what’s going wrong. He’s going to hate that, which is the point: our best feature work is driven by Kurt-hate.

Obviously, if you’re rubbing your hands together excitedly over the opportunity this policy presents, then, well, not so much with the fellow-feeling. We reserve the right to cut you off.

Support For Developers, By Developers

Explicit Accident Forgiveness is just one thing we like about Support at Fly.io.

Go find out!  

What’s Next: Accident Protection

We think this is a pretty good first step. But that’s all it is.

We can do better than offering you easy refunds for mistaken deployments and botched CI/CD jobs. What’s better than getting a refund is never incurring the charge to begin with, and that’s the next step we’re working on.

More to come on that billing system.

We built a new billing system so that we can do things like that. For instance: we’re in a position to catch sudden spikes in your month-over-month bills, flag them, and catch weird-looking deployments before we bill for them.

Another thing we rebuilt billing for is reserved pricing. Already today you can get a steep discount from us reserving blocks of compute in advance. The trick to taking advantage of reserved pricing is confidently predicting a floor to your usage. For a lot of people, that means fighting feelings of loss aversion (nobody wants to get gym priced!). So another thing we can do in this same vein: catch opportunities to move customers to reserved blocks, and offer backdated reservations. We’ll figure this out too.

Someday, when we’re in a monopoly position, our founders have all been replaced by ruthless MBAs, and Kurt has retired to farm coffee beans in lower Montana, we may stop doing this stuff. But until that day this is the right choice for our business.

Meanwhile: like every public cloud, we provision our own hardware, and we have excess capacity. Your messed-up CI/CD jobs didn’t really cost us anything, so if you didn’t really want them, they shouldn’t cost you anything either. Take us up on this! We love talking to you.