Never Miss a Webhook

golfer missing a shot
Image by Annie Ruygt

Fly.io converts Docker images to fast-booting VMs and runs them globally. This is great or ingesting incoming data! Check out how to scale Laravel on Fly.io.

If your app ingests data from users (webhooks or similar), then it’s important not to miss any of that data.

This is actually a bit scary. We need, like, a lot of uptime! The typical way to increase uptime smells like “scaling” and often feels like expensive premature optimization. How do we do this!?

Here’s how I thread the needle between complex and, well, cheap.

We’ll cover a few problem areas and how to think about them:

  1. Handling Bugs
  2. Handling Spikey Traffic
  3. Handling Availability

The Bug Problem

Who among us has not deployed bugs into production? What often happens is we adjust something way over here but accidentally cause errors over there. This is bad when “over there” is your ingest code.

We want to reduce our chances of adding bugs to our ingest code.

The Bug Solution

We are not about to write less bugs. Instead, we need to protect ourselves from our bugs. Here’s a few ideas.

KISS

Our ingest code should, in all cases, be simple. Ideally the ingest data we care about (or perhaps the entire HTTP request?) is stored as quickly as possible, and then a queue job is fired off to do the actual work needed.

The code path should be simple, and preferably have very little “hidden” complexity.

To that end, consider which middleware can be stripped out of the hot path of ingestion. CSRF protection is one, and perhaps even rate limiting and/or authentication can be moved out of PHP and into, like, Nginx + LUA (check out Openresty).

# An example invokable controller for ingest
public function __invoke(Request $request)
{
    // TODO: Validate the data if needed (e.g. validate a signed request)

    // Save webhook to storage (likely s3 or similar)
    $webhook = json_encode([
        'method' => $request->method(),
        'headers' => $request->headers->all(),
        // String representation of the body, 
        // I'm assuming no files are sent.
        // An alternative to this would be base64'ing
        // a binary format of the body
        'body' => $request->getContent(), 
        'uri' => $request->getUri(),
    ]);

    $reqId = $request->header('fly-request-id') 
        ?: Str::uuid()->toString();

    $file = sprintf("%s.json", $reqId);

    // An S3 (compatible) storage
    Storage::disk('webhook')->put($file, $webhook);

    // Queue a job to process the webhook
    ProcessWebhook::dispatch($file);
}

Even this code relies on external services, albeit reliable ones (S3 + SQS for me). Choose your dependencies carefully!

Use a Separate Code Base

A great way to avoid bugs is to create a separate code base for ingest. This is attractive as the code base will likely be much smaller - it has a reduced surface area.

We can deploy this separately (and perhaps much more rarely) - letting it sit around in glorious, reliable boredeom for years.

The drawback of this is that you don’t have all of your business logic, models, and other supporting code hanging around. Probably. There are creative ways around that, too, but why add complexity?

This means it’s generally “harder” to do logic in the ingest application. If you don’t mind me sitting in my ivory tower for a second, I’d simply suggest not having logic in the ingest application. Preferably, any other processing can be done by the “main” application via a queue worker.

Reusing Our Code Base

If you need (or want) to re-use your code base, perhaps host it as a second, separate app (e.g. the same code base but at a subdomain like ingest.my-app.com).

This has a bunch of benefits. All of our code is there with our models, relationships, business logic, and so on. We can re-use that as-needed when ingesting data.

We can also smoke-test the code base before deploying udpates to the ingest app. Since we’re most likely going to break something after a deployment (bugs aren’t bugs until they’re shipped to production), we can update our main application first. Once we have a known stable deployment, we can deploy updates to the ingest app.


The Spikey Traffic Problem

Perhaps the most common (but often overlooked) problem is how easily spikes in traffic can overload our PHP servers!

The most common PHP setup is Nginx -> PHP-FPM -> your PHP code. In this setup, Enemy Number 1™ is request concurrency, and it’s all because of PHP-FPM’s max_children setting.

PHP-FPM spins up multiple child processes to handle requests. Each child process handles requests in series (one at a time). We only get request concurrency by spinning up more processes! However, once we reach a certain concurrency (max_children), FPM won’t create additional processes and instead returns errors.

It turns out the max_children setting is pretty low by default.

So, our job is to deal with that most-likely scenario (and others if we can).

The Spikey Traffic Solution

There’s a few solutions to spikey traffic. One is to just remove PHP-FPM and therefore it’s (artificial) limits. Another is to over-provision your server(s) so additional traffic can be absorbed.

We’ll talk about using more servers next, first let’s talk about handling spikey traffic within each app isntance.

Octane

For Laravel, you may want to use Octane for ingest.

The main benefit is ditching PHP-FPM. This way we get around the max_children setting which can artificially impose limits. You could set it to some super high value tho, but I like removing it out of the equation if I can.

The other benefit to Octane is its increased throughput - higher requests per second. This handles spikey traffic better!

If you’re running on Fly.io, you want to be sure your fly.toml concurrency settings aren’t too high. If the Fly Proxy thinks the server can handle more requests per second than it can, it can starve the VM.

That’s a good problem to have, people are actually using your app! One simple way around this is “vertical scaling” - give yourself a larger server.

# look, a larger server!
# increase the size, and run `fly deploy`
[[vm]]
  memory = "4gb"
  cpus = 4
  cpu_kind = "performance"
  processes = ["app"]

The Availability Problem

Most projects start life on a single server. “We’ll scale out when we get enough users”, we declare.

Many projects live (and die) there, but some few are lucky enough to have users. Then a new fear creeps in: What if my server goes down? We’ll miss that precious user data.

Now we’re looking at adding extra servers. That means more complexity! We need a load balancer, we need to deploy to multiple servers, and we need your database/redis/whatever to live somewhere else.

How do we do all of that!?

The Availability Solution

Let’s tackle what HA means. It comes in a few flavors:

  1. Server redundancy
  2. Auto-scaling (in and out)
  3. Zero-downtime Deployments

Here are some ideas!

Scale Out

We want multiple instances of our ingest application. This gives us 2 things:

  1. Higher availability (if a host fails, or something equally wonky)
  2. More “throughput” - more requests per second when load balancing between instances

However, once we get more servers, we need a load balancer. In these situations, I usually reach for something managed. On Fly, it’s just…kinda there for you from the beginning. Everything goes through the Fly Proxy, which is a “reverse proxy” already - a load balancer (and more).

To scale our application, we can run a few commands:

# Scale to 10 instances,
# spread across possible regions
# Boston, Sydney, London, South Africa, Sao Paulo
fly scale count -y -r bos,syd,lhr,jnb,gru 10

Fly.io will do it’s best to equally spread the 10 instances out across the 5 regions there. BGP (aka “fancy DNS”, don’t @ me) will handle routing requests to the closest region, and the Fly Proxy will load balance across app instances within a region.

This is faster for users sending data to our application across the globe, and we gain higher availability by having more than one instance per region (each Fly.io region has a bunch of physical hosts, and Fly tries its best to distribute app instances across different physical hosts).

Auto Shut Down

A nice feature of Fly.io is that Machines can turn off while idle. To handle traffic spikes, we should carefully setup our fly.toml configuration to get the most out of this. Here’s what I would use, with the caveat that I totally made up the hard/soft limit concurrency settings (idk how many requests-per-second your app can handle):

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 2
  processes = ['app']
  [http_service.concurrency]
    type = "requests"
    soft_limit = 100
    hard_limit = 150

[[http_service.checks]]
  grace_period = "10s"
  interval = "30s"
  method = "GET"
  timeout = "5s"
  path = "/up"

We’ve kept the default auto_(stop|start)_machines configuration (set to true). However we also kept the minimum machines running set to 2, so some Machines will always be on, even during low traffic periods.

We’ve also set concurrency settings - a soft and hard limit. These help the Fly Proxy know when to start spinning up extra instances and provide additional capacity. The values for these depend on how many concurrent requests your application can handle.

Lastly, we setup a health check so servers with issues can be taken out of rotation. This also helps with deployments.

Deployment

The default deployment strategy (with multiple machines) is “rolling” - one machine gets replaced at a time. Since we defined health checks, these are checked before each new Machine is replaced.

However, we can also use blue/green deployments! This creates a whole new set of Machines, checks their health, and then promotes them if/when they all pass. The old servers are then destroyed.

fly deploy --strategy=bluegreen

Blue/green is faster when you have many servers. While rolling swaps out servers one at a time, blue/green creates all new Machines concurrently. However it may fail deployments more often since the whole deployment reverts if any one of the servers fail a health check.

Blue/green is my prefered way to deploy.

Fly.io ❤️ Laravel

Fly your servers close to your users—and try scaling on Fly.io!

Deploy your Laravel app!  

Another Language?

This small use case (ingesting traffic and throwing the request into a queue) is actually a great opportunity to expand out of PHP (if that’s your thing).

A great language for this is Golang. It’s strict typing and relatively simple built-in HTTP support makes this type of use case really stable and fast. We’d get a faster bootup time and more throughput (requests per second) overall. Since it’s a small use case, the code shouldn’t (in theory!) get overly complex.

An example of Golang doing the same thing as our __invoke() controller method above is here.