SQLite3 Full Text Search with Phoenix

Illustration of a database searching documents
Image by Annie Ruygt

This is a start of a series using SQLite3 Full Text Search and Elixir. Fly.io is a great place to run Elixir applications! Check out how to get started!

One of the benefits of SQLite is that you can query lightning quick because your database is colocated next to your server. And a bonus it also comes with built-in Full Text search capabilities! But how does this work with Ecto?

In this Series we will walk through

  • Use SQLite Full Text with Ecto.
  • Building a LiveView autocomplete using the new index.
  • How to set up LiteFS and Fly.io to distribute your search index close to the users.

Setup

Before we begin we will need to set up our existing or new application to use SQLite3, luckily we have a guide to help with just that: SQLite3!

In this guide we will be using the built-in FTS5 plugin. This is compiled and shipped with the ecto_sqlite3 library, so we should be all set. If you have problems here because you have a custom setup check here first.

Background

The FTS5 plugin will only work on FTS5 specified virtual tables. In SQLite3 we have the concept of a Virtual Table which SQLite sees as a normal table with columns. This virtual table can be queried normally but plugin’s can manipulate the data before it’s written to disc or returned in a query. In this case FTS5 maintains an index and will build the correct result set and give each result row a rank based on its content. This plugin also gives us special syntax to query the virtual table and rank them.

In this guide we’ll first start with the phoenix generators, and we’ll change the generated code as we go. So lets begin

mix phx.gen.context Data Document documents title:string body:string  author:string url:string
* creating lib/app/data/document.ex
* creating priv/repo/migrations/20230118183954_create_documents.exs
* creating lib/app/data.ex
* injecting lib/app/data.ex
* creating test/app/data_test.exs
* injecting test/app/data_test.exs
* creating test/support/fixtures/data_fixtures.ex
* injecting test/support/fixtures/data_fixtures.ex

Remember to update your repository by running migrations:

    $ mix ecto.migrate

This will generate everything we need to get started. Before running the migrations lets replace them wholesale to look like so:

defmodule App.Repo.Migrations.CreateDocuments do
  use Ecto.Migration

  def change do
    execute("""
      CREATE VIRTUAL TABLE documents USING fts5(
        updated_at UNINDEXED,
        inserted_at UNINDEXED,
        url UNINDEXED, 
        title, 
        author,
        body
      );
      """,
      """
      DROP TABLE documents;
      """
    )
  end
end

Ecto SQL does not come with facilities to allow us to define a virtual table using custom functions so we will be using execute/2 migration function which accepts an up string and down string.

In the up we are defining a virtual table named documents using the fts5 function which accepts a list of columns and column attributes. We included the default timestamps and URL, marking them as UNINDEXED this tells the index to ignore those columns.

And now by default SQLite3 will index the title, author and body column and allow us to directly search them. Notice we don’t have a primary key and that is cause FTS5 has a built-in primary key of rowid.

Next up we need to update our schema in Elixir to handle the new table structure so lets change the lib/app/data/document.ex like so:

defmodule App.Data.Document do
   use Ecto.Schema
   import Ecto.Changeset

+  @primary_key {:id, :id, autogenerate: true, source: :rowid}
   schema "documents" do
     field :author, :string
     field :body, :string
     field :title, :string
     field :url, :string
+    field :rank, :float, virtual: true

     timestamps()
   end
   ...
end

Here we are telling ecto that our primary key is called :id but when queried it’s called :rowid this lets us use the built-in SQLite functions seamlessly with Ecto Queries. And we’re also adding a virtual :rank column that lets us query and order by the special column :rank provided by FTS5.

And finally lets query it using the most basic query syntax:

  @doc """
  Searchs our documents based on a simple query.

  ## Examples

      iex> search_documents("hello")
      [%Document{}, ...]

  """
  def search_documents(q) do
    from(d in Document, 
      select: [:title, :url, :rank, :id],
      where: fragment("documents MATCH ?", ^q),
      order_by: [asc: :rank]
    )
    |> Repo.all()
  end

The bulk of the magic happens in the where where we use a fragment("documents MATCH ?", ^q) which is the syntax for querying an FTS5 virtual table. It is a little strange since we’re matching an entire table name and not a specific column but the intuition of it is all indexed columns are searched as one string.

With the order_by returning the results ordered by rank, which is required otherwise results with be ordered unpredictably.

Finally when we are ready we can insert, update and delete documents normally using our Data context functions generated for us.

And there we go, we now have the basic structure and setup required to do full text searching with Ecto and SQLite3. But this is just scratching the surface here on what’s possible with some extra effort.

Further Considerations

Keep your data in Sync

No matter what you are responsible for building your and keeping your index fresh.

If this is a specific search index for external data we might run a nightly job to rebuild it, or we might kick off a build every time something changes. We might even set up an Oban job to do this periodically or asynchronously whenever our content changes.

If this search works on other content already in the database we have other options:

  • We could manually update our index within the Context whenever the base data changes.
  • We could set up triggers to automagically change the underlying data whenever we need it to change. That is left up as a challenge to the reader.
  • Or we could use the built-in External Content and Contentless Tables functionality. This can be helpful if the content you are searching is large and you need to reduce your SQLite database size. This will build the index in the FTS5 table, and then query the original table to show the data as needed. We won’t deep dive into this as the guides do a good job of it, and it is almost entirely something that lives in migrations anyway.

Advanced Queries

The basic query syntax can work if your content is text heavy like documents or blog posts but, sometimes you need to specify certain columns more directly. The FTS5 documentation go directly into this with the Full Text Query Syntax section of the documentation. Suffice to say we will need to build out our query string and then shove them into fragments. I won’t go into super detail here as the steps are specific to each use case.

Fly.io ❤️ Elixir

Fly.io is a great way to run your Phoenix LiveView app close to your users. It’s really easy to get started. You can be running in minutes.

Deploy a Phoenix app today!