Around the World With SQLite3 and Rsync

Author

Name: Sam Ruby
bsky:intertwingly.net: bsky:intertwingly.net

Michael Jackson's anti-gravity lean from Smooth Criminal — Image by Annie Ruygt

Fly.io runs apps close to users around the world. This same infrastructure can be used to route requests to where the data resides. Give us a whirl and get up and running quickly.

Take a typical Rails application, run fly launch, then fly deploy, say yes a few times and you will have a Dockerfile provided for you, with two instances of your application up and running, a PostgreSQL database, and an Upstash Redis database.

This is a great default, but should you desire to do so you are welcome to configure your application however you want. This can be done by updating your Dockerfile yourself, but you will rarely need to resort to this as dockerfile-rails provides lots of options to help with this process.

This blog post will take you through the configuration of the showcase application. Highlights:

Each event is a separate instance of the same Rails application, with a separate sqlite3 database, and a separate active storage directory, both on a mounted volume.
Multiple events in the same region run on the same machine using Phusion Passenger and share the same Action Cable process and redis instance.
Requests are dynamically routed to the machine nearest to the event, and data is synchronized between machines using rsync.
I’m not sure I’m going to keep it, but each machine is also running sshd enabling me to syncronize the content with machines outside of fly.

This application is running at smooth.fly.dev. You are welcome to see the index, but access to the individual event pages are password protected as they contain customer names, invoicing information, and scores.

While a large number of techniques will be shown below, your needs are undoubtedly different, but hopefully these examples will inspire you to make your own customizations.

Starting with a single event

We are going to start with one event, but in the process we will prepare for multiple events and multiple regions. This can be done using the following steps:

Create a volume, and mount it. fly launch will do this automatically for Rails applications unless you select PostgreSQL.
Set DATABASE_URL to sqlite3:///data/db/2022-harrisburg.sqlite3. This places the database on the volume, with a unique name per event. This can be done via bin/rails generate dockerfile env=DATABASE_URL:sqlite3:///data/db/2022-harrisburg.sqlite3
Set RAILS_STORAGE to /data/storage/2022-harrisburg, and update config/storage.yml as follows:
```
   local:
       service: Disk
       public: true
       root: <%= ENV.fetch('RAILS_STORAGE',
                 Rails.root.join("storage")) %>
```
This will place active storage files on the volume, again in a separate location per event.

Install, configure, and launch redis. This requires a number of sub-steps:

Create a Procfile.fly with the following contents:
```
   web: bin/rails server
   redis: redis-server /etc/redis/redis.conf 
```
This will launch the rails and redis servers in separate processes.

Create config/deploy.fly with the following contents:

   # configure redis
   RUN sed -i 's/^daemonize yes/daemonize no/' /etc/redis/redis.conf &&\
     sed -i 's/^bind/# bind/' /etc/redis/redis.conf &&\
     sed -i 's/^protected-mode yes/protected-mode no/' /etc/redis/redis.conf &&\
     sed -i 's/^logfile/# logfile/' /etc/redis/redis.conf &&\
     echo "vm.overcommit_memory = 1" >> /etc/sysctl.conf

Configuration of redis is beyond the scope of this blog post, but hopefully the above example makes clear that any Dockerfile instructions can be included.

Run the following command to update your Dockerfile:

   bin/rails generate dockerfile --add=redis \
     --procfile=Procfile.fly --instructions=config/deploy.fly

While the above is indeed a fair amount of preparation, it illustrates how you can set environment variables, add packages, run multiple processes, and even add custom instructions to your Dockerfile in a way that retains the ability to regenerate the remaining portions of your Dockerfile at any time.

You can play with this right now.

It’ll take less than 10 minutes to get your Rails application running globally.
Try Fly for free →

Add additional events in the same region

While the Phusion Passenger documentation for developing multiple applications and microservices with Passenger + Nginx isn’t written yet, it really is just a matter of following the instructions for deploying an app to a sub-URI or subdirectory, and repeating this step as many times as necessary.

Once again, this involves multiple discrete steps. In the showcase application it starts with a showcases.yml file containing the list of events and an nginx-config script which generates the nginx configuration from this data and places the results into the /etc/nginx/sites-enabled directory.

This script also has another responsibility: it runs db:prepare (and therefore db:migrate) against each of the databases in this region.

This also implies that the right time to run this script is in place of the rails db:prepare script. Doing both, namely installing passenger and reconfiguring what script is run at startup, can be done with a single command:

bin/rails generate dockerfile --passenger \
  --migrate=config/tenant/nginx-config.rb

One last change needs to be made. In Procfile.fly we need to run nginx instead of bin/rails server:

web: nginx
redis: redis-server /etc/redis/redis.conf

Events in multiple regions

Much of this step builds on concepts in previous steps.

We already have a script that builds an nginx configuration file. Fly machines have a number of environment variables set. We will make use of FLY_REGION and PRIMARY_REGION.

In that generated nginx configuration we will produce HTTP 409 responses that add the FLY-Replay header, thus:

# Charlotte
location /showcase/2023/charlotte {
  return 409 "wrong region\n";
  add_header Fly-Replay region=atl always;
}

# Chicago
location /showcase/2023/chicago {
  return 409 "wrong region\n";
  add_header Fly-Replay region=ord always;
}

# Clearwater - Glass Slipper Gala
location /showcase/2023/clearwater/glassslipper {
  return 409 "wrong region\n";
  add_header Fly-Replay region=mia always;
}

Events that are in the current FLY_REGION continue to contain passenger directives. This means that the nginx configuration file is different in each region. showcase.conf contains an example of a full generated configuration.

Perhaps more interesting is the use of rsync and openssh to initially load and synchronize data. Installation is done via passing --add rsync openssh-server to the generate dockerfile command. Configuration is done by config/deploy.fly. Next the migration script is changed to bin/deploy which will load the volume with data from the primary region during startup. A few notes:

Configuration is a matter of mapping directories to user ids, and allowing all accesses. As the port is not exposed to the external internet, this is safe.
As the rsync server is started as a daemon it can’t be launched by a procfile. Instead it is launched by the deploy/migrate script.
Rsync is called on both the db and storage subdirectories, passing the --update flag indicating that only files that are newer in the primary region are to be copied.

Finally, a detached_process passenger hook is defined which is called after 300 seconds of idle time. This hook script checks how many processes remain and when there are no non-cable processes left it will use dig commands to identify other app instances and call rsync to copy all files to them. Again the --update flag is specified so that only files that are newer on the source will be copied to the receiver.

Rsync between fly machines runs quite fast, and running it when machines are idle means that eventually all machines have complete copies of all databases. In the future I may refine this strategy but as the databases are small (typically one megabyte or less), the comfort of knowing that there are multiple backups and the regions can be reconfigured at will outweigh the concerns of the additional disk space required.

Separately, ssh is also set up to enable machines outside of the fly network to rsync automatically rsync data to and from volumes. Placing lines like the following in ~/.ssh/config makes it easy:

Host smooth
  HostName smooth.fly.dev
  Port 2222
  User rails
Host mia.smooth
  HostName mia.smooth.internal
  ProxyJump rails@smooth.fly.dev:2222
  Port 2222
  User rails

Recap

The joy of this setup is that provisioning a new regions is as easy as running flyctl machine clone passing the name of the new region. Adding events to existing or new regions is merely a matter of updating showcase.yml and running fly deploy.

I’m also continuing to host this application both at home and on Hetzner. The experience I have gained from each has lead to numerous improvements in making dockerfile-rails work with fly.io. I still need to work out synchronization of authentication and do more testing before I make fly.io the primary for future events, but that will happen soon.

As with everything distributed, a number of engineering trade-effs are involved, specifically:

all static files (css, js, images) as well as the index page are all served from the machine closest to the requester.
accesses (both read and write) from users near to events will be fast.
accesses (even read-only) from users distant from events will be routed to the machine hosting the event.
while scaling CPUs and RAM is possible, creating a second machine in the same region is not supported.
deploying changes will require momentary downtime.

This blog post covered a lot of ground. It contained a mix of step by step instructions and a description and pointers to running code. It is not expected that others will mimic exactly this setup, but hopefully seeing how this application was set up will inspire others to configure their own network of machines using dockerfile-rails. Should there be interest, similar functionality can be added to dockerfile-node and, over time, spread to other frameworks.

Finally, take a peek at config/dockerfile.yml to see the complete list of options I use, as well as the resulting Dockerfile. If you have an interesting use case or set of options that believe may be useful to others, start a discussion, open an issue or make a pull request.

And you can always use community.fly.io for more generic, or fly.io specific questions.

Next post ↑: Pattern Matching on Ruby Objects
Previous post ↓: RubyKaigi 2023: Matsumoto

Starting with a single event

You can play with this right now.

Add additional events in the same region

Events in multiple regions

Recap