Rails Background Jobs with Fly Machines

DALL-E-generated image of 'whimsical painting of robots stretching taffy in a factory'
Image by DALL-E

Fly Machines can boot a VM in 500ms, run a Rails background job, then turn off when it’s done. That means you don’t have to pay for a server to sit idle if there’s no jobs to process and you can have a much more scalable pool of on-demand workers when your application starts to get busy.

When a Rails application needs to do some heavy lifting like processing lots of data or making calculation that takes a long time, a common approach is to spin up an ActiveJob and asynchronously run it in a background worker.

According to The Ruby Toolbox, the most popular background job framework for Rails is Sidekiq. Sidekiq is a great framework for creating background jobs, managing retries when they fail and queue priorities/topologies, and monitoring their status from the application or admin panel.

Like many background workers, Sidekiq runs in a separate process from the Rails application, which requires additional CPU and memory resources. That’s all great when there’s jobs to process, but it’s not fun paying for a server that sits around doing nothing when there’s no jobs on the queue to process.

Additionally, when a bunch of jobs start rolling in that need to be processed, it would be great if the pool of workers could scale up temporarily to handle the increased load. Then scale back down when things are less busy.

Fly Machines could solve that problem by having zero background workers or processes running if there’s no jobs. When a job comes in, the Rails ActiveJob processor spins up a Fly Machine to process it. Here’s what that looks like as a really basic ActiveJob adapter.

module ActiveJob
  module QueueAdapters
    # == Fly Machine adapter for Active Job
    #
    # Boot VMs in 500ms, run a Rails background job, and shuts it down when it's all done.
    #
    #   Rails.application.config.active_job.queue_adapter = :fly_machine
    class FlyMachineAdapter
      def enqueue(job) # :nodoc:
        Fly.app.machine.fork init: {
          cmd: [
            "/app/bin/rails",
            "runner",
            "ActiveJob::Base.deserialize(#{job.serialize}).run"
          ]
        }
      end

      def enqueue_at(*) # :nodoc:
        raise NotImplementedError, "Does not yet support queueing background jobs in the future"
      end
    end
  end
end

When the job is done and exits, the Fly Machine also shuts down. If the machine took 3 seconds to run, you pay for 3 seconds of Fly Machine time. The important part is that you do not have to keep paying for resources when there’s no jobs on the queue.

The Fly.app.machine.fork looks like magic, but it’s not. All it does is gets the image name and ENV of the currently running Rails application and stuffs it into a Fly Machines API call that boots a Firecracker VM. It’s almost like forking, close enough to call it a fork for purposes of this demo.

You can see the source code at https://github.com/fly-apps/rails-machine-workers and the running application at https://rails-machine-workers.fly.dev, which includes a really big security warning you should know about if you try doing this today.

But wait, there’s more!

Embracing the monolith

It’s tempting to reach for edge function services from $BIG_CLOUD_COMPANY, but those usually require managing functions as a “separate thing” from the Rails app, which means as a developer you have to build it differently and manage it differently, like deploying the functions separately from the application code. There’s tooling that makes this a little easier, but it’s still another thing you’d have to worry about.

When using Fly Machines for background workers, you don’t have to do anything special—you just treat your background jobs as background jobs and there’s nothing extra to manage once you get it all setup. This makes the fly deploy command even more powerful, which is insane because the thing can already deploy your single application to a fleet of servers around the world. Now it can also deploy your background workers. 🤯

Limitations to this proof-of-concept

This Fly Machines background worker proof-of-concept is still very basic and comes with a few issues depending on the needs of your application.

Lower processing latency

The basic concept of “fire-and-forget-a-background-job-into-a-machine” comes with latency that may or may not be acceptable for your application. Here’s what the back-of-napkin math looks like:

  Fly Machine Boot Time     500ms
+ Rails Boot Time          1500ms..5000ms+
--------------------------------------
  Total latency            2000ms..5500ms+

This latency could be negligible if the background job takes a few minutes to complete, but if the background job should only take a half a second, it doesn’t make sense to spend five seconds booting up to do it.

This problem is very solvable. The most straight forward way would be dumping the background job into a queue, boot the machine, the machine phones back home to the Rails app “I’m alive and processing jobs”. The machine then proceeds to process all jobs on the queue until its empty. When the queue is empty the worker could either be configured to shutdown right away or wait for a configurable amount of time for another job before shutting down.

Here’s what that might look like:

./bin/fly-machine-worker --wait 600

If the value of 600 is passed, the worker would wait for 600 seconds after the last job was processed before terminating. The first job would have to boot the Rails application, but subsequent jobs would get picked up by the already booted VM. When things slow down for 10 minutes, the work would terminate.

To guarantee one worker is always running so there’s no wait time, forever could be passed to the worker.

./bin/fly-machine-worker --wait forever

Handle retries for jobs that fail

If a Fly Machine jobs fails, the work wouldn’t retry because the state of the job isn’t being persisted anywhere and updated. It would be great to track the state of a job and the current attempts so they could be retried.

Too many workers

The funny thing about this approach is the problem of too many workers spinning up and potentially bringing down your application. Let’s say somebody loaded up your application with a gazillion requests that spin up a gazillion background jobs. If those jobs are competing for the same resource in your application, like trying to write to the same row in a database, your application could start to have some problems.

There’s lots of approaches to solving these problems. One way is to name queues and set limits to the number of workers that can be spun up per queue. This approach has many precedents in a lot of Rails existing job queue frameworks.

ActiveJob might not be the right place to solve these problems

The list of limitations above have mostly been solved by all the great background job frameworks in Ruby. It might make more sense to perform the “Fly Machine Fork” from within a worker process itself.

Does this mean you’d then be back to paying for a worker that sits mostly idle monitoring a queue? Not necessarily. Sidekiq 7.0 introduces a concept called “embedding”, which is another way of saying Puma creates a Sidekiq worker process from within the Rails server. If an embedded work is monitoring a queue and sitting mostly idle, you don’t care as much because you’re not paying for a dedicated server. When a job comes rolling in, Sidekiq would kick off a job in a new Fly Machine, getting back to the world where you only have to pay for what you use.

“Compress the complexity of modern web apps”

The ability to “fork” a Rails app on an entirely different machine without working about managing “cloud functions” is valuable to developers because they can spend more time worrying about their application and less time worrying about “managing stuff”.

It makes the monolith even more majestic since you don’t have to worry as much about capacity planning for your background jobs and can think about doing some really hefty work with the full might of Fly Machines at your disposal.

The approach still needs a lot of work before its production ready, but the basic building blocks are there to make Rails applications do even more incredible things.

Additional resources