Migration Recipes

Image by Annie Ruygt

Fly.io runs apps close to users, by transmuting Docker containers into micro-VMs that run on our own hardware around the world. This post is part of the Safe Ecto Migrations series guide. If you just want to ship your Phoenix app, the easiest way to learn more is to try it out; you can be up and running in just a couple minutes.

This is part 3 in a 4-part series on designing and running Safe Ecto Migrations:

This is a non-exhaustive guide on common migration scenarios and how to avoid trouble.

These migration recipes may evolve over time, so be sure to check the git edition of these recipes at https://github.com/fly-apps/safe-ecto-migrations with up-to-date information.

Adding an index

Creating an index blocks both reads and writes. This scenario is used as an example in the “How to inspect locks in a query” section.

Bad

def change do
  create index("posts", [:slug])
  # This obtains a ShareLock on "posts" which will block writes to the table
end

Good

Instead, have Postgres create the index concurrently which does not block reads. To do this, you have to disable the migration transactions.

@disable_ddl_transaction true
@disable_migration_lock true

def change do
  create index("posts", [:slug], concurrently: true)
end

While the migration may still take some time to run, it does not block reads and updates on rows. For example, indexing 100,000,000 rows took 165 seconds (2.75 minutes) to run the migration, but it didn’t lock up the table for selects and updates while it was running!

When disabling transactions in a migration like this, do not make other changes in the same migration! Only create the index concurrently. Use separate migrations for other changes. Separating these changes into their own migrations will make failures clear and helps prevent leaving the database in a mangled state. For example, if half of the steps in a migration succeed but the the remaining half failed— trying to re-run the migration will be confusing and cause problems.

Adding a reference or foreign key constraint

Adding a foreign key blocks writes on both tables.

When a foreign key constraint is added, two things happen:

  1. It creates a new constraint for changing records going forward
  2. It validates the new constraint for existing records

If these commands are happening at the same time, it obtains a lock on the table as it validates and scans the entire table. To avoid a full table scan, we can separate the operations.

Bad

def change do
  alter table("posts") do
    add :group_id, references("groups")
  end
end

Good

In the first migration:

def change do
  alter table("posts") do
    add :group_id, references("groups", validate: false)
  end
end

In the next migration:

def change do
  execute "ALTER TABLE posts VALIDATE CONSTRAINT group_id_fkey", ""
end

Adding a column with a default value

Adding a column with a default value to an existing table may cause the table to be rewritten. While that happens, Postgres blocks both reads and writes, and MySQL and MariaDB block writes.

Bad

Note: This becomes safe in:

def change do
  alter table("comments") do
    add :approved, :boolean, default: false
    # This took 34 seconds for 10 million rows with no fkeys,
    # This took 10 minutes for 100 million rows with no fkeys,

    # Obtained an AccessExclusiveLock on the table, which blocks reads and
    # writes.
  end
end

Good

Add the column first and then alter it to include the default.

First migration:

def change do
  alter table("comments") do
    add :approved, :boolean
    # This took 0.27 milliseconds for 100 million rows with no fkeys,
  end
end

Second migration:

def change do
  alter table("comments") do
    modify :approved, :boolean, default: false
    # This took 0.28 milliseconds for 100 million rows with no fkeys,
  end
end

Schema change to read the new column:

schema "comments" do
+ field :approved, :boolean, default: false
end

In the Ecto schema, the default: false isn’t necessary, but it matches what the database will do so it avoids reading back the field on inserts and behaves consistently.

Changing the type of a column

Changing the type of a column may cause the table to be rewritten. While that happens, Postgres blocks both reads and writes, and MySQL and MariaDB block writes.

Bad

Safe in Postgres:

  • increasing length on varchar or removing the limit
  • changing varchar to text
  • changing text to varchar with no length limit
  • Postgres 9.2+ - increasing precision (NOTE: not scale) of decimal or numeric columns. eg, increasing 8,2 to 10,2 is safe. Increasing 8,2 to 8,4 is not safe.
  • Postgres 9.2+ - changing decimal or numeric to be unconstrained
  • Postgres 12+ - changing timestamp to timestamptz when session TZ is UTC

Safe in MySQL/MariaDB:

  • increasing length of varchar from < 255 up to 255.
  • increasing length of varchar from > 255 up to max.
def change do
  alter table("posts") do
    modify :my_column, :boolean, :text
  end
end

Good

Multi deployment strategy:

  1. Create a new column
  2. In application code, write to both columns
  3. Backfill data from old column to new column
  4. In application code, move reads from old column to the new column
  5. In application code, remove old column from Ecto schemas.
  6. Drop the old column.

Removing a column

Removing a column happens when we are “cleaning up” or restructuring the database as it evolves to solve different problems.

If your application deployment rolls out new versions while the old version is still running, then running database structure migrations that change tables out from under the older running app creates problems when they query for data. Queries may fail because the database is no longer structured how the code expects. This scenario is often forgotten or ignored. However, there are safe ways to remove columns that don’t temporarily break things as they roll out.

Bad

# Without a code change to the Ecto Schema
def change
  alter table("posts") do
    remove :no_longer_needed_column

    # Obtained an AccessExclusiveLock on the table, which blocks reads and
    # writes, but was instantaneous.
  end
end

Good

A “multi-stage” deployment makes removing a column go smoothly. First, remove references to the column from the application so it’s no longer loaded or queried. Then a second deploy safely removes the column from the table.

First deploy, edit the Ecto schema:

defmodule MyApp.Post do
  schema "posts" do
-   column :no_longer_needed_column, :text
  end
end

Second deploy, run with the migration:

def change
  alter table("posts") do
    remove :no_longer_needed_column
  end
end

Renaming a column

Ask yourself: “Do I really need to rename a column?”. Probably not, but if you must, read on and be aware it requires time and effort.

Any running instances of your application that expect the field to still have the old name will fail when the database structure changes. This happens when you have multiple application instances running and you roll out new versions.

There is a shortcut: don’t rename the database column. Instead, rename the schema’s field name and configure it to point to the database column.

Bad

In your schema:

schema "posts" do
  field :summary, :text
end

In your migration

def change do
  rename table("posts"), :title, to: :summary
end

Good

Strategy 1

Rename the field in the Ecto schema only. Configure the schema field to point to the unchanged column name. Ensure all calling code referencing the old field name is updated to reference the new field name.

defmodule MyApp.MySchema do
  use Ecto.Schema

  schema "weather" do
    field :temp_lo, :integer
    field :temp_hi, :integer
-   field :prcp, :float
+   field :precipitation, :float, source: :prcp
    field :city, :string

    timestamps(type: :naive_datetime_usec)
  end
end

Update references in other parts of the codebase:

   my_schema = Repo.get(MySchema, "my_id")
-  my_schema.prcp
+  my_schema.precipitation

Strategy 2

Take a “multi-stage” approach:

  1. Create a new column
  2. In application code, write to both columns
  3. Backfill data from old column to new column
  4. In application code, move reads from old column to the new column
  5. In application code, remove old column from Ecto schemas.
  6. Drop the old column.

Renaming a table

Ask yourself: “Do I really need to rename a table?”. Probably not, but if you must, read on and be aware it requires time and effort.

Any running instances of your application that expect the table to still have the old name will fail when the database structure changes. This happens when you have multiple application instances running and you roll out new versions.

There is a shortcut: rename the schema only, and do not change the underlying database table name.

Bad

def change do
  rename table("posts"), to: table("articles")
end

Good

  1. Create the new table. This should include creating new constraints (checks and foreign keys) that mimic the behavior of the old table.
  2. In application code, write to both tables, continuing to read from the old table.
  3. Backfill data from old table to new table
  4. In application code, move reads from old table to the new table
  5. In application code, remove the old table from Ecto schemas.
  6. Drop the old table.

Adding a check constraint

Adding a check constraint blocks reads and writes to the table in Postgres, and blocks writes in MySQL/MariaDB while every row is checked.

When a check constraint is added, two things happen:

  1. It creates a new constraint for changing records going forward
  2. It validates the new constraint for existing records

When these happening at the same time, it obtains a lock on the table as it validates and fully scans the entire table. To avoid a full table scan, we can separate the operations.

Bad

def change do
  create constraint("products", :price_must_be_positive, check: "price > 0")
  # Creating the constraint with validate: true (the default when unspecified)
  # will perform a full table scan and acquires a lock preventing updates
end

Good

First migration, add the constraint but don’t let it validate:

def change do
  create constraint("products", :price_must_be_positive, check: "price > 0"), validate: false
  # Setting validate: false will prevent a full table scan, and therefore
  # commits immediately.
end

Second migration, validate the constraint:

def change do
  execute "ALTER TABLE products VALIDATE CONSTRAINT price_must_be_positive", ""
  # Acquires SHARE UPDATE EXCLUSIVE lock, which allows updates to continue
end

These can be in the same deployment, but ensure they are 2 separate migrations.

Setting NOT NULL on an existing column

Setting NOT NULL on an existing column blocks reads and writes while every row is checked.  Just like the Adding a check constraint scenario, two things are happening:

  1. It creates a new constraint for changing records going forward
  2. It validates the new constraint for existing records

To avoid the full table scan, we can separate these two operations.

Bad

def change do
  alter table("products") do
    modify :active, :boolean, null: false
  end
end

Good

Add a check constraint without validating it, backfill data to satisfy the constraint and then validate it. This is functionally equivalent.

In the first migration:

def change do
  create constraint("products", :active_not_null, check: "active IS NOT NULL"), validate: false
end

This enforces the constraint in all new rows, but does not care about existing rows until a row is updated. You’ll likely also need a data migration to ensure that the constraint is satisfied.

Then in the next deployment’s migration, enforce the constraint on all rows:

def change do
  execute "ALTER TABLE products VALIDATE CONSTRAINT active_not_null", ""
end

If you’re using Postgres 12+, you can add the NOT NULL constraint to the column after validating the constraint. From the Postgres 12 docs:

SET NOT NULL may only be applied to a column provided none of the records in the table contain a null value for the column. Ordinarily this is checked during the ALTER TABLE by scanning the entire table; however, if a valid check constraint is found which proves no null can exist, then the table scan is skipped.

# **Postgres 12+ only**
def change do
  execute "ALTER TABLE products VALIDATE CONSTRAINT active_not_null", ""

  alter table("products") do
    modify :active, :boolean, null: false
  end

  drop constraint("products", :active_not_null)
end

If the constraint fails, then first consider backfilling data to cover the gaps in your desired data integrity, then revisit validating the constraint.

Adding a JSON column

In Postgres, there is no equality operator for the json column type, which can cause errors for existing SELECT DISTINCT queries in your application.

Bad

def change do
  alter table("posts") do
    add :extra_data, :json
  end
end

Good

Use jsonb instead. Some say it’s just like “json” but “better.”

def change do
  alter table("posts") do
    add :extra_data, :jsonb
  end
end

Where to next?

Next finish up our journey by seeing how to safely fill holes in our data created by changing the structure of our tables in “Backfilling Data”!

References

These recipes took a lot of inspiration from Andrew Kane and his library strong_migrations.

PostgreSQL at Scale by James Coleman

Strong Migrations by Andrew Kane

Adding a NOT NULL CONSTRAINT on PG Faster with Minimal Locking

Postgres Runtime Configuration

Automatic and Manual Ecto Migrations by Wojtek Mach

Reference Material

Postgres Lock Conflicts

Current Lock →
Requested Lock ↓ ACCESS SHARE ROW SHARE ROW EXCLUSIVE SHARE UPDATE EXCLUSIVE SHARE SHARE ROW EXCLUSIVE EXCLUSIVE ACCESS EXCLUSIVE
ACCESS SHARE X
ROW SHARE X X
ROW EXCLUSIVE X X X X
SHARE UPDATE EXCLUSIVE X X X X X
SHARE X X X X X
SHARE ROW EXCLUSIVE X X X X X X
EXCLUSIVE X X X X X X X
ACCESS EXCLUSIVE X X X X X X X X
  • SELECT acquires a ACCESS SHARE lock
  • SELECT FOR UPDATE acquires a ROW SHARE lock
  • UPDATE, DELETE, and INSERT will acquire a ROW EXCLUSIVE lock
  • CREATE INDEX CONCURRENTLY and VALIDATE CONSTRAINT acquires SHARE UPDATE EXCLUSIVE
  • CREATE INDEX acquires SHARE lock