Tag All the Things!

Author

Name: Mark Ericksen
@brainlid: @brainlid

Illustration with the text "words words words list list list fly, fly, fly" — Image by Annie Ruygt

This is about a simple way of tagging database records using Postgres and Ecto that fits a number of situations. Fly.io is a great place to run Elixir applications! Check out how to get started!

UPDATED: This was updated to support clearing a list of tags from a UI input component.

Whether you link labels to a Github issue, track the different muscle groups strengthened by an exercise, or describe a book by the set of genres that apply to it, it’s all the same thing, tagging. Developers define tags that apply to things like cars listings, a job posting, or a property listing and it adds richness and meaning to the tagged item. It often isn’t called a “tag”, it may be referred to as “categories”, “labels”, “traits”, “genres”, whatever!

There are numerous ways to add tagging to a system and the “right way” depends on the system and the problem being solved. Here we’ll explore a very easy way to add tagging to our Phoenix applications that also uses of a cool feature in PostgreSQL.

Problem

We have a book tracking system. In our system, we let trusted users create new book entries. A book can belong to multiple genres. There is a finite set of pre-defined genres we support. The application provides the full set of supported and possible genres.

A book entry should support being tagged as “Sci-Fi”, “Mystery” and “Young Adult” all at the same time. In our system, we call it “genres”, but it’s really just tagging, isn’t it?

How do we add the ability to “tag” a database entry (i.e. book) with a set of tags (i.e. genres)?

Solution

For this discussion, a “tag” is a custom descriptor added to a database record.

This solution assumes the list of possible tags are defined by the system and not user managed. For instance, when talking about book genres, we don’t allow users to create a new genre called “stupid”. A genre has specific meaning in the book tracking system and users don’t get a vote.

Similarly, this applies to exercises and the impacted muscle groups. People don’t get to create new muscle groups! They just exist and as users, we tag an item with one or more of the pre-defined tags.

This solution is for that type of system.

Given that constraint, this solution covers how to do the following:

enable an item to be tagged in our Postgres DB
add the tag field to our Ecto schema
process tags through a changeset
query items using the tags

Database Migration

Our solution starts in the database. We’re using Postgres for our application (the default DB for Phoenix applications). We’ll take advantage of Postgres’ array column type. This let’s us create a single field on a table that holds an array of values. For our tags, it will be an array of strings.

Here’s the relevant part of a DB migration to add a books table with a genres field that holds an array of strings.

    create table(:books) do
  add :title, :string, null: false
  # ...
  add :genres, {:array, :string}
  # ...
end

# add an index on the genres
execute("create index books_genres_index on books using gin (genres);")

  

GIN indexes are “inverted indexes” which are appropriate for data values that contain multiple component values, such as arrays.

Notice that the field type used is {:array, :string}. Also note, the last line executes an SQL statement to create a special GIN index on our array field. The GIN index will keep our look ups fast!

With our database table ready, let’s move on to Ecto.

Ecto Schema

Next, we need to mirror the database change in our Ecto Schema. Here’s the relevant part of our schema definition:

    schema "books" do
  field :title, :string
  # ...
  field :genres, {:array, :string}, default: []
  # ...
end

  

Again, the field is described as an array of strings. Yes, that means Ecto natively supports working with an array of strings. Nice! That will come in handy!

We have the structure part handled. Now we want to control how data is stored there.

Validate the Array

In this section, we’re talking about the changeset functions used on our data. We will parse, cast, validate, and process user values.

Let’s start with an abbreviated list of the valid genres. Since the allowed genres are built-in to our application, we can code it with our schema.

    defmodule MyApp.Books.Book do
  use Ecto.Schema
  import Ecto.Query, warn: false
  import Ecto.Changeset

  schema "books" do
    field :name, :string

    field :genres, {:array, :string}, default: []
    # ...
  end

  @genre_options [
    "fantasy",
    "sci-fi",
    "dystopian",
    "adventure",
    "romance",
    "mystery",
    "horror",
    "thriller",
    "historical-fiction",
    "young-adult",
    "children-fiction",
    "autobiography",
    "biography",
    "cooking",
    # ...
  ]

  # ...

  

The changeset steps to address here are validating the array of values and sorting them. Why sort? We’ll come back to that later. We can use the @genre_options module attribute in our changeset function to validate the user’s data.

    defp common_validations(changeset) do
  changeset
  # ...
  |> validate_required([:name])
  |> trim_array(:genres)
  |> validate_array(:genres, @genre_options)
  |> sort_array(:genres)
end

  

Instead of creating our own validate_array, we could leverage Ecto’s existing validate_subset/4 function to ensure only our listed @genres_options are accepted.

Note that the trim_array/2, validate_array/3 and sort_array/2 functions don’t exist in Ecto, we will create them. We want trim_array to handle data from our UI and validate_array to enforce that only values in our @genre_options list are accepted.

We’ll talk more about trim_array in a bit. First, here’s one way to build validate_array:

    @doc """
Validate that the array of string on the changeset are all in the
set of valid values.
"""
def validate_array(changeset, field, valid_values) do
  validate_change(changeset, field, fn ^field, new_values ->
    if Enum.all?(new_values, &(&1 in valid_values)) do
      []
    else
      unsupported = new_values -- valid_values
      [{field, "Only the defined values are allowed. Unsupported: #{inspect(unsupported)}"}]
    end
  end)
end

  

Our validate_array uses Ecto.Changeset.validate_change/3 to run our custom validation function only when the field value changes. It checks each item in the new array to ensure they are part of the approved set of valid_values. If any are not valid, an error is added to the changeset.

Sort the Array

In this step, we’ll process the tag values to sort them. Why sort it here? There are a couple reasons:

We want the tag values to appear in a consistent order. For instance, when listing our books, we might display different books of a series together and they all share the same genre. It looks wrong for the genres to appear in different order when viewed this way!
1. Harry Potter 1 - Genres: young-adult, fantasy, adventure
2. Harry Potter 2 - Genres: adventure, young-adult, fantasy
3. Harry Potter 3 - Genres: fantasy, adventure, young-adult
It is more efficient to sort the values when setting them so later, when it’s displayed, we won’t be repeatedly doing the work to sort them.

How to sort them?

The values could be sorted logically or according to a pre-determined order. However, here we’ll take the simple approach of sorting them alphabetically.

    @doc """
When working with a field that is an array of strings, this
function sorts the values in the array.
"""
def sort_array(changeset, field) do
  update_change(changeset, field, &(Enum.sort(&1)))
end

  

The sort_array function leans heavily on the nifty Ecto.Changeset.update_change/3 function. When the field value is changed it runs it through the Enum.sort/1 function.

Trimming the Array

We want to be able to clear the array, or replace it with an empty array. It works fine to do that in Elixir code, but this last step supports the UI needs of our code. This article doesn’t deal with the UI, but if we peer into the future, here’s the problem we need to address.

HTML forms can’t easily send an “empty” array. A simple way to deal with that is to have it send something that indicates a blank value. Then, after removing the blank value, we’ll be left with the array value we need. In our case, we’ll assume that the blank value is "". So, if the array value looks like this:

    %{"genres" => [""]}

  

Then after trimming the list, it ends up like this:

    %{"genres" => []}

  

Here’s a function to do that for us. Additionally, we’ll let the “blank” value be specified.

    @doc """
Remove the blank value from the array.
"""
def trim_array(changeset, field, blank \\ "") do
  update_change(changeset, field, &Enum.reject(&1, fn item -> item == blank end))
end

  

Combining the changeset functions

Our helper functions validate_array/3, sort_array/2, and trim_array/3 can be moved to a separate module for easy reuse when needed using an import MyApp.EctoHelpers.

While we’re refactoring, we’ll create a single function that does all three operations at once for us.

    @doc """
Clean and process the array values and validate the selected
values against an approved list.
"""
def clean_and_validate_array(changeset, field, valid_values, blank \\ "") do
  changeset
  |> trim_array(field, blank)
  |> sort_array(field)
  |> validate_array(field, valid_values)
end

  

Now our schema’s common_validations can be cleaned up. Instead of 3 changeset functions for every field that works with some type of tags, it is instead a single line:

    defp common_validations(changeset) do
  changeset
  # ...
  |> validate_required([:name])
  |> clean_and_validate_array(:genres, @genre_options)
end

  

This is one of the powerful benefits of changeset functions that I love! 😍 They are composable and easy to reuse!

See this gist for the full code including the UI component.

Setting the Array

It helps to see how the values are set. Writing tests is a great way to explore the behavior and ensure it works as we expect. For now, we can play with setting the array in an IEx session:

    alias MyApp.Books.Book
changeset = Book.changeset(%Book{}, %{genres: ["historical-fiction", "thriller"]})

Nice! We can play with sending both valid and invalid tags and see the changeset validate and sort our array.

Because Ecto natively supports the array of string field type, we don’t need to do anything to get the data back. It’s just up to us to represent and display the tag data in a way that makes sense in our application.

With the ability to store tags in the database, we’re ready to query our data using those tags. Let’s see how that works!

Query Using Tags

When it comes to querying our records using the tags, recall that we added the index to the field in the migration. This makes filtering by a tag something we can do without any hassle.

Let’s try it out!

How can we query for only books tagged as “historical-fiction”?

    import Ecto.Query
alias MyApp.Books.Book
books = from(b in Book, where: "historical-fiction" in b.genres)

  

Nice!

How do we query for multiple tags at the same time?

We don’t want to say where b.genres == ^genre_list because that returns only the books that match the exact set of genres. We want to be able to find books where the genres include “historical-fiction” and “thriller” but may also include “mystery” or other genres.

Here’s how we write that query:

    import Ecto.Query
alias MyApp.Books.Book
books =
  from(b in Book,
    where: "adventure" in b.genres,
    where: "fantasy" in b.genres
  )

  

How do we query for “adventure” and “fantasy” but exclude all tagged as “young-adult”?

    import Ecto.Query
alias MyApp.Books.Book
books =
  from(b in Book,
    where: "adventure" in b.genres,
    where: "fantasy" in b.genres,
    where: "young-adult" not in b.genres
  )

  

Makes sense right? We can build up the conditions to help us filter down our results.

Armed with the knowledge of how to query and filter using tags, we can write that logic into some handy helper functions to make querying easier.

    def where_in_array(query, column, tag_value) do
  from(q in query, where: ^tag_value in field(q, ^column))
end

def where_not_in_array(query, column, tag_value) do
  from(q in query, where: ^tag_value not in field(q, ^column))
end

  

Theses query helpers let us write more composable queries like this:

    books =
  from(b in Book)
  |> where_in_array(:genres, "adventure")
  |> where_in_array(:genres, "fantasy")
  |> where_not_in_array(:genres, "young-adult")

  

Great! We built the tagging feature we needed for our application! Users can tag a book with one or more of the allowed genre options and we validate the data and sort it when it’s written.

We also saw how to query our list of books using the tagged genres!

Discussion

We can tag lots of things this way! Keep in mind that this approach works for systems where the possible tags are defined in code and the list does not come from the database. That happens to still apply to a lot of situations!

Once you start tagging records and you realize how easy it is, you may be tempted to tag all the things!

Tag all the things meme image

But don’t go too crazy with it!

Having said that, I have added “categories” and “tags” to the same table because they had different meanings and uses in the system. Still, the approach covered here works just great for that!

Happy tagging!

Want more? Grab the code from the gist and check out Making a CheckboxGroup Input to build a custom checkbox group input component for setting your tags with a Phoenix rendered UI!

Fly.io ❤️ Elixir

Fly.io is a great way to run your Phoenix LiveView app close to your users. It’s really easy to get started. You can be running in minutes.
Deploy a Phoenix app today! →

Next post ↑: SQLite3 Full Text Search with Phoenix
Previous post ↓: Copy to Clipboard with Phoenix LiveView