High Availability & Global Replication
Fly Postgres uses repmgr to manage replication clusters of three or more Postgres servers. You’ll know your app uses repmgr if it uses a postgres-flex image. Run fly status --app <app name>
or fly image show --app <app name>
to view the image name and version.
Older Fly Postgres apps use stolon for leader election and streaming replication between two or more Postgres servers.
Adding replicas
The easiest way to add replicas (standbys) is with the fly machine clone
command:
fly machine clone <machine ID> --region <region> --app <app name>
For example:
fly machine clone 148e306c77e089 --region iad --app my-app-name
Cloning machine 148e306c77e089 into region iad
Volume 'pg_data' will start empty
Provisioning a new machine with image registry-1.docker.io/flyio/postgres-flex:15...
Machine 17814e3b990389 has been created...
Waiting for machine 17814e3b990389 to start...
Waiting for 17814e3b990389 to become healthy (started, 3/3)
Machine has been successfully cloned!
The fly machine clone
command clones the spec from the source Machine and uses it to create the new replica in the specified region.
Performing a failover
To perform a failover, first make sure you have 3 or more servers in your primary region. Then run the following command:
fly postgres failover --app <app name>
For example:
fly postgres failover --app my-app-name
Performing a failover
Connecting to fdaa:2:45b:a7b:174:f553:db80:2... complete
Stopping current leader... 17811073b07918
Starting new leader
Promoting new leader... 4d891247b0d698
Connecting to fdaa:2:45b:a7b:174:f553:db80:2... complete
NOTICE: promoting standby to primary
DETAIL: promoting server "fdaa:2:45b:a7b:174:f553:db80:2" (ID: 665772506) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "fdaa:2:45b:a7b:174:f553:db80:2" (ID: 665772506) was successfully promoted to primary
NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
Waiting 30 seconds for the old leader to stop...
Restarting old leader... 17811073b07918
Waiting for leadership to swap to 4d891247b0d698...
Waiting for 17811073b07918 to become healthy (started, 3/3)
Waiting for 4d891247b0d698 to become healthy (started, 3/3)
Waiting for 3d8d9e15c9de58 to become healthy (started, 3/3)
Failover complete
Note: Only healthy servers residing in your primary region will be considered for leadership.
Performing a regional failover
There may be situations where you want to move leadership into a completely new region.
Add at least three replicas into the new region to maintain quorum when switching leadership. See Adding replicas.
Run
fly status --app <app name>
to check the number of servers in each region. For example:fly status --app my-app-name
ID STATE ROLE REGION CHECKS IMAGE CREATED UPDATED e78423d1a33148 started replica iad 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T15:35:20Z 2024-01-03T15:35:32Z e2866756a37148 started replica iad 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T15:45:44Z 2024-01-03T15:45:54Z d891116b6505e8 started primary ewr 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T14:17:32Z 2024-01-03T15:36:19Z 784eee9a216118 started replica ewr 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T14:25:01Z 2024-01-03T16:01:31Z 784e2edc425348 started replica iad 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T15:48:59Z 2024-01-03T15:49:10Z 2874443f054548 started replica ewr 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T14:24:15Z 2024-01-03T15:34:29Z
The output shows the primary and two replicas in ewr, and three replicas in iad, which is the new target region for this example.
If you haven’t already pulled down your
fly.toml
configuration file, you can do so by running:fly config save --app <app name>
Open the
fly.toml
file and set theprimary_region
and thePRIMARY_REGION
environment variable to your new target region. For example, to move leadership into the iad region:primary_region = "iad" [env] PRIMARY_REGION = "iad"
Before deploying this change, identify which Postgres image you are currently running. Use
fly status
to get the image info.For example:
fly status -app my-app-name
ID STATE ROLE REGION CHECKS IMAGE CREATED UPDATED 784eee9a216118 started replica ewr 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T14:25:01Z 2024-01-03T14:25:13Z 2874443f054548 started primary ewr 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T14:24:15Z 2024-01-03T14:24:24Z ...
Copy the image name, in this example
flyio/postgres-flex:15.3
, you’ll use it for the next command.Deploy the app to pick up the changes you made to the
fly.toml
file, specifying the image to use. For example:fly deploy . --image flyio/postgres-flex:15.3 --strategy=immediate
Warning: The deploy process will result in a small amount of downtime. Once thefly.toml
file change has been deployed, your cluster will become read-only until the failover process completes.Once the deploy process has completed, you might need to wait a moment for all the servers to become healthy.
Run the
fly pg failover
command to move the primary server to the new region:fly pg failover
Example output for a successful failover:
Performing a failover Connecting to fdaa:2:45b:a7b:22b:ea40:9fed:2... complete Stopping current leader... d891116b6505e8 Starting new leader Promoting new leader... e2866756a37148 Connecting to fdaa:2:45b:a7b:22b:ea40:9fed:2... complete NOTICE: promoting standby to primary DETAIL: promoting server "fdaa:2:45b:a7b:22b:ea40:9fed:2" (ID: 453196900) using pg_promote() NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful DETAIL: server "fdaa:2:45b:a7b:22b:ea40:9fed:2" (ID: 453196900) was successfully promoted to primary NOTICE: executing STANDBY FOLLOW on 4 of 4 siblings INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes Waiting 30 seconds for the old leader to stop... Restarting old leader... d891116b6505e8 Waiting for leadership to swap to e2866756a37148... Waiting for 784eee9a216118 to become healthy (started, 3/3) Waiting for d891116b6505e8 to become healthy (started, 3/3) Waiting for e2866756a37148 to become healthy (started, 3/3) Waiting for 2874443f054548 to become healthy (started, 3/3) Waiting for 784e2edc425348 to become healthy (started, 3/3) Waiting for e78423d1a33148 to become healthy (started, 3/3) Failover complete
That’s it! Run fly status
to verify your changes. Continuing the example, the output shows that the primary server has switched from ewr to iad:
fly status -a my-app-name
ID STATE ROLE REGION CHECKS IMAGE CREATED UPDATED
e78423d1a33148 started replica iad 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T15:35:20Z 2024-01-03T16:13:53Z
e2866756a37148 started primary iad 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T15:45:44Z 2024-01-03T16:13:21Z
d891116b6505e8 started replica ewr 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T14:17:32Z 2024-01-03T16:15:57Z
784eee9a216118 started replica ewr 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T14:25:01Z 2024-01-03T16:12:44Z
784e2edc425348 started replica iad 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T15:48:59Z 2024-01-03T16:12:47Z
2874443f054548 started replica ewr 3 total, 3 passing flyio/postgres-flex:15.3 (v0.0.46) 2024-01-03T14:24:15Z 2024-01-03T16:12:09Z
Connecting to read replicas
The generated connection string uses port 5432
to connect to PostgreSQL. This port always forwards you to a writable instance. Port 5433
is direct to the PostgreSQL member, and used to connect to read replicas directly.
You can use the proxy port (5432
) to connect from every region, but it will be quite slow. Connecting to “local” replicas is much quicker, but does take some app logic.
The basic logic to connect is:
- Set a
PRIMARY_REGION
environment variable on your app. - Check the
FLY_REGION
environment variable at connect time, useDATABASE_URL
as is whenFLY_REGION
is the same asPRIMARY_REGION
. - Modify the
DATABASE_URL
when running in other regions:- Change the port to
5433
- Change the port to
This is what it looks like in Ruby:
class Fly
def self.database_url
primary = ENV["PRIMARY_REGION"]
current = ENV["FLY_REGION"]
db_url = ENV["DATABASE_URL"]
if primary.blank? || current.blank? || primary == current
return db_url
end
u = URI.parse(db_url)
u.port = 5433
return u.to_s
end
end
Running this in the primary region will use the built-in DATABASE_URL
and connect to port 5432
:
postgres://<user>:<password>@top1.nearest.of.<postgres cluster app name>.internal:5432/rails_on_fly?sslmode=disable
In the other regions, the app will connect to port 5433
:
postgres://<user>:<password>@top1.nearest.of.<postgres cluster app name>.internal:5433/rails_on_fly?sslmode=disable
Detecting write requests
Catch read-only errors
PostgreSQL conveniently sends a “read only transaction” error when you attempt to write to a read replica. All you need to do to detect write requests is catch this error.
Replay the request
Once caught, just send a fly-replay
header specifying the primary region, for example fly-replay: region=scl
, and we’ll take care of the rest.
If you’re working in Rails, just add this to your ApplicationController
:
class ApplicationController < ActionController::Base
rescue_from ActiveRecord::StatementInvalid do |e|
if e.cause.is_a?(PG::ReadOnlySqlTransaction)
r = ENV["PRIMARY_REGION"]
response.headers["fly-replay"] = "region=#{r}"
Rails.logger.info "Replaying request in #{r}"
render plain: "retry in region #{r}", status: 409
else
raise e
end
end
end
Library support
We would like to build libraries to make this seamless for most application frameworks and runtimes. If you have a particular app you’d like to distribute with PostgreSQL, post in our community forums.
Consistency model
This is a fairly typical read replica model. Read replicas are usually eventually consistent, and can fall behind the leader. Running read replicas across the world can exacerbate this effect and make read replicas stale more frequently.
Request with writes
Requests to the primary region are strongly consistent. When you use the replay header to target a particular region, the entire request runs against the leader database. Your application will behave like you expect.
Read only requests
Most apps accept a POST
or PUT
, do a bunch of writes, and then redirect the user to a GET
request. In most cases, the database will replicate the changes before the user makes the second request. But not always!
Most read-heavy applications aren’t especially sensitive to stale data on subsequent requests. A lagging read replica might result in an out-of-date view for users, but this might be reasonable for your use case.
If your app is sensitive to stale data (meaning, you never, under any circumstances want to show users stale data), you should be careful using read replicas.
Managing eventual consistency
For apps that are sensitive to consistency issues, you can add a counter or timestamp to user sessions that indicates what “version” of the database a particular user is expecting. When the user makes a request and the session’s data version differs from the replica, you can use the same fly-replay
header to redirect their request to the primary region – and then you’ll know it’s not stale.
In theory, you could run PostgreSQL with synchronous replication and block until replicas receive writes. This probably won’t work well for far flung read replicas.
This is wrong for some apps
We built this set of features for read-heavy apps that are primarily HTTP request based. That is, most requests only perform reads and only some requests include writes.
Write-heavy workloads
If you write to the database on every request, this will not work for you. You will need to make some architectural changes to run a write-heavy app in multiple regions.
Some apps write background info like metrics or audit logs on every request, but are otherwise read heavy. If you’re running an application like this, you should consider using something like nats.io to send information to your primary region asynchronously.
Truly write-heavy apps require latency aware data partitioning, either at the app level or in a database engine. There are lots of interesting new databases that have features for this, try them out!
Long lived connections
If your app makes heavy use of long lived connections with interpolated writes, like WebSockets, this will not work for you. This technique is specific to HTTP request/response based apps that bundle writes up into specific requests.