Multi-region Postgres (Legacy)
Most read-heavy, PostgreSQL-backed applications work natively across regions on Fly.io, no architectural changes required. Deploying an app and database in multiple regions takes advantage of two Fly features:
- Regional read replicas
- The
fly-replay
header
With regional read replicas configured, the fly-replay
header lets you specify exactly which requests need to be serviced by the primary, writable database. When we detect this header, we will replay the entire request to the region you specify. It looks something like this:
In most runtimes, it's straightforward to catch a read-only database error in a replica region and serve a response with the appropriate replay header.
This guide is all about PostgreSQL, but the deployment topology will work with MySQL, MongoDB, and any other database with read replica support.
Create a PostgreSQL Cluster
If you don't already have a PostgreSQL cluster running, you can create one with the fly
CLI:
fly pg create --name chaos-postgres --region scl
Choose a high-availability configuration to create a two-node PostgreSQL cluster in Santiago Chile, one leader for writes, one replica for redundancy.
Add Read Replicas
Adding read replicas is simple. Just create more disks to match the existing one(s):
fly volumes create pg_data -a chaos-postgres --size 10 --region ord
fly volumes create pg_data -a chaos-postgres --size 10 --region ams
fly volumes create pg_data -a chaos-postgres --size 10 --region syd
If you already have a Postgres cluster called chaos-postgres
running, you can check the volume name and size using fly volumes list -a chaos-postgres
.
Then, add one new VM for each new volume:
fly scale count 6 -a chaos-postgres
The chaos-postgres
cluster will now have read replicas in Atlanta, Chicago, Amsterdam, and Sydney. When you run fly status -a chaos-postgres
you should see output like this:
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS
240eb1cd app 2 ams run running (replica) 3 total, 3 passing
83b849fa app 2 ord run running (replica) 3 total, 3 passing
d8e8a317 app 2 syd run running (replica) 3 total, 3 passing
4c27cd52 app 2 scl run running (replica) 3 total, 3 passing
987f4b41 app 2 scl run running (leader) 3 total, 3 passing
Configure Connection Strings
Attach Database to Application
To hook your app up to your cluster, run the attach
command from your application directory:
fly pg attach chaos-postgres
This installs a DATABASE_URL
secret in your application, which is available to your app processes as an environment variable. The command also prints the connection string to the console.
Connect to Regional Replicas
The generated connection string uses port 5432
to connect to PostgreSQL. This port always forwards you to a writable instance. Port 5433
is direct to the PostgreSQL member, and used to connect to read replicas directly.
You can use the proxy port (5432
) to connect from every region, but it will be quite slow. Especially because we put our PostgreSQL cluster in Santiago. Connecting to "local" replicas is much quicker, but does take some app logic.
The basic logic to connect is:
- Set a
PRIMARY_REGION
environment variable on your app,scl
for ourchaos-postgres
cluster. - Check the
FLY_REGION
environment variable at connect time, useDATABASE_URL
as is whenFLY_REGION=scl
- Modify the
DATABASE_URL
when running in other regions:- Change the port to
5433
- Change the port to
This is what it looks like in Ruby:
class Fly
def self.database_url
primary = ENV["PRIMARY_REGION"]
current = ENV["FLY_REGION"]
db_url = ENV["DATABASE_URL"]
if primary.blank? || current.blank? || primary == current
return db_url
end
u = URI.parse(db_url)
u.port = 5433
return u.to_s
end
end
Running this in scl
will use the built-in DATABASE_URL
and connect to port 5432
:
postgres://<user>:<password>@top1.nearest.of.chaos-postgres.internal:5432/rails_on_fly?sslmode=disable
In the other regions, the app will connect to port 5433
:
postgres://<user>:<password>@top1.nearest.of.chaos-postgres.internal:5433/rails_on_fly?sslmode=disable
Connecting External Services
Sometimes we need to be able to allow external services to connect to our Postgres instance. While we don't open up any external ports by default, we can achieve this through some simple configuration changes.
Allocating an IP Address
If you haven't already, you will need to allocate an IP address to your application. You can view your list of IP's by running the following command from your application directory:
fly ips list
You can allocate an IPv4 address by running the following:
fly ips allocate-v4
If your network supports IPv6:
fly ips allocate-v6
If you're not sure which one to use, just provision one of each and you should be good to go.
External Port Configuration
Now that we have an IP address, let's configure our app to expose an external port and direct incoming requests to our Postgres instance.
If you haven't already pulled down your fly.toml
configuration file, you can do so by running:
fly config save --app <app-name>
Now, let's open up our fly.toml
file and configure our port mappings by defining a new Service
.
[[services]]
internal_port = 5432 # Postgres instance
protocol = "tcp"
# Open port 10000 for plaintext connections.
[[services.ports]]
handlers = []
port = 10000
For additional information on services and service ports: The services sections
Deploying Configuration Changes
Once your Service
has been specified, it's time to deploy our new configuration.
Before running the command below, be sure to verify the version of Postgres you are running. As an example, if you are running Postgres 12.x you would specify flyio/postgres:12
as your target image.
fly deploy . --app <app-name> --image flyio/postgres:<major-version>
After the deploy completes, you can verify your Service
configuration by running the info
command:
fly info
...
Services
PROTOCOL PORTS
TCP 10000 => 5432 []
...
Establishing External Connection
Now that you have your Service
and port mappings in place, you should now be able to establish new connections to your Postgres using the <app-name>.fly.dev
hostname along with your external facing port.
psql postgres://postgres:<password>@<app-name>.fly.dev:10000
Restoring a PostgresSQL Cluster
Fly.io performs daily storage-based snapshots of each of your provisioned volumes. These snapshots can be used to restore your dataset into a new Postgres application.
Listing Snapshots
Snapshots are volume specific, so you will need to first identify a volume to target. You can list your volumes by running the volumes list
command with your Postgres app name.
fly volumes list -a chaos-postgres
ID NAME SIZE REGION ATTACHED VM CREATED AT
vol_x915grn008vn70qy pg_data 10GB syd b780ce3d 2 weeks ago
vol_ke628r677pvwmnpy pg_data 10GB syd 359d0e24 2 weeks ago
Once you have identified which volume to target, you can go ahead and list your snapshots by running the following command:
fly volumes snapshots list <volume-id>
ID SIZE CREATED AT
vs_2AjJ4lGqQwDbRfxm 29 MiB 2 hours ago
vs_BAARBQxZKl6JKU04 27 MiB 1 day ago
vs_OPQXXna6kA2Qnhz8 26 MiB 2 days ago
Restoring From a Snapshot
To restore a Postgres application from a snapshot, simply specify the --snapshot-id
argument when running the create
command as shown below:
fly postgres create --snapshot-id <snapshot-id>
Detect Write Requests
Catch Read-only Errors
PostgreSQL conveniently sends a "read only transaction" error when you attempt to write to a read replica. All you need to do to detect write requests is catch this error.
Replay the Request
Once caught, just send a fly-replay
header specifying the primary region. For chaos-postgres
, send fly-replay: region=scl
, and we'll take care of the rest.
If you're working in Rails, just add this to your ApplicationController
:
class ApplicationController < ActionController::Base
rescue_from ActiveRecord::StatementInvalid do |e|
if e.cause.is_a?(PG::ReadOnlySqlTransaction)
r = ENV["PRIMARY_REGION"]
response.headers["fly-replay"] = "region=#{r}"
Rails.logger.info "Replaying request in #{r}"
render plain: "retry in region #{r}", status: 409
else
raise e
end
end
end
Library Support
We would like to build libraries to make this seamless for most application frameworks and runtimes. If you have a particular app you'd like to distribute with PostgreSQL, post in our community forums and we'll write some code for you.
Consistency Model
This is a fairly typical read replica model. Read replicas are usually eventually consistent, and can fall behind the leader. Running read replicas across the world can exacerbate this effect and make read replicas stale more frequently.
Request With Writes
Requests to the primary region are strongly consistent. When you use the replay header to target a particular region, the entire request runs against the leader database. Your application will behave like you expect.
Read Only Requests
Most apps accept a POST
or PUT
, do a bunch of writes, and then redirect the user to a GET
request. In most cases, the database will replicate the changes before the user makes the second request. But not always!
Most read heavy applications aren't especially sensitive to stale data on subsequent requests. A lagging read replica might result in an out of date view for users, but this might be reasonable for your use case.
If your app is sensitive to this (meaning, you never, under any circumstances want to show users stale data), you should be careful using read replicas.
Managing Eventual Consistency
For apps that are sensitive to consistency issues, you can add a counter or timestamp to user sessions that indicates what "version" of the database a particular user is expecting. When the user makes a request and the session's data version differs from the replica, you can use the same fly-replay
header to redirect their request to the primary region – and then you'll know it's not stale.
In theory, you could run PostgreSQL with synchronous replication and block until replicas receive writes. This probably won't work well for far flung read replicas.
This Is Wrong for Some Apps
We built this set of features for read heavy apps that are primary HTTP request based. That is, most requests only perform reads and only some requests include writes.
Write-heavy Workloads
If you write to the database on every request, this will not work for you. You will need to make some architectural changes to run a write-heavy app in multiple regions.
Some apps write background info like metrics or audit logs on every request, but are otherwise read heavy. If you're running an application like this, you should consider using something like nats.io to send information to your primary region asynchronously.
Truly write-heavy apps require latency aware data partitioning, either at the app level or in a database engine. There are lots of interesting new databases that have features for this, try them out!
Long Lived Connections
If your app makes heavy use of long lived connections with interpolated writes, like websockets, this will not work for you. This technique is specific to HTTP request/response based apps that bundle writes up into specific requests.