Print on Demand

A drone delivering a printer
Image by Annie Ruygt

Save money by using appliance machines to only allocate memory and other machine resources when you actually need them.

Scaling discussions often lead to recommendations to add more memory, more CPU, more machines, more regions, more, more, more.

This post is different. It focuses instead on the idea of decomposing parts of your applications into event handlers, starting up Machines to handle the events when needed, and stopping them when the event is done. Along the way we will see how a few built in primitives make this easy.

To make the discussion concrete, we are going to focus on a common requirement: generation of PDFs from web pages. The code that we will introduce isn’t merely an example produced in support of a blog post - rather it is code that was extracted from a production application, and packaged up into an appliance that you can deploy in minutes to add PDF generation to your existing application.

But before we dive in, let’s back up a bit.


Normally the way this is approached is to start with a tool like Puppeteer, Grover, Playwright, ChromicPDF, or BrowserShot. These and other tools ultimately launch a browser like Chrome headless.

Now a few things about Chrome itself:

  • It likely is bigger than your entire web server.
  • It likely uses more memory than you see with a typical load on your server.
  • All total, people using your server likely spend much less time generating PDFs than they do using the rest of your application.

Taken together, this makes splitting PDF generation into a completely separate application an easy win. With a smaller image, your application will start faster. Memory usage will be more predictable, and the memory needed to generate PDFs will only be allocated when needed and can be scaled separately.

Diving in

Without further ado, the entire application is available on GitHub as fly-apps/pdf-appliance. Installation is a simple matter of: clone repository, create app, adjust config, deploy, and scale.

Next, you will need to integrate this into your application. All that is needed is to reply to requests that are intended to produce a PDF with a fly-replay response header. This can either be done on individual application routes / controller actions, or it can be done globally via either middleware or a front end like NGINX. You can find a few examples in the README.

And, that’s it. The most you might consider doing is issuing an additional HTTP request in anticipation of the user selecting what they want to print as this will preload the machine.

Scale at your own pace

Deploy your project in a few minutes with Fly Launch. Then do more with Fly Machines.

Run your entire stack near your users

If you don’t have an application handy, you can try a demo. Go to Click on Demo, then on Publish, and finally on Invoices to see a PDF. The PDF you see will likely be underwhelming as you would need to enter students, entries, packages and options to fill out the page. But click refresh anyway and see how fast it responds. If you want to explore further, links to the documentation and code can be found on the front page.

Implementation Details

The basic flow starts with a request comes into your app for a PDF. That request is replayed to the PDF appliance. A Chrome instance in that app then issues a second request to your app for the same URL minus the .pdf extension and then converts the HTML which it receives in response to a PDF. That PDF is then returned as the response to the original request.

A single Google Chrome instance per machine will be reused across all requests, which itself is faster than starting a new instance per request. As all HTTP headers will be passed back to your application, this will seamlessly work with your existing session, cookies, and basic authentication.

Starting up a machine on demand is handled by the auto_stop_machines setting in your fly.toml. With this in place, machines can confidently exit when idle, secure in the knowledge that they will be restarted when needed. See the README for more information on scaling.

Note that different machines can use different languages and frameworks. This code is written in JavaScript and runs on Bun. It was designed to support a Ruby on Rails app, but can be used with any app.

A Reusable Pattern

If your app is small and your usage is low, scaling may not be much of a concern, but as your need grow your first instinct shouldn’t merely be to throw more hardware at the problem, but rather to partition the problem so that each machine has a somewhat predictable capacity.

Do this by taking a look at your application, and look for requests that are somehow different than the rest. Streaming audio and video files, handling websockets, converting text to speech or performing other AI processing, long running “background” computation, fetching static pages, producing PDFs, and updating databases all have different profiles in terms of server load.

It might even be helpful – purely as a thought experiment – to think of replacing your main server with a proxy that does nothing more than route requests to separate machines based on the type of workload performed.

Once you have come up with an allocation of functions performed to pools of machines, Fly-Replay is but one tool available to you. There is also a Machines API that will enable you to orchestrate whatever topology you can come up with. Cost-Effective Queue Workers With Machines gives a preview of what that would look like with Laravel.