The problem

Let’s say you have a simple blog system. It’s written in Elixir, uses Ecto (v3.x) and you have the following schemas: Post, Author, Comment, Tag and Tagging.

  • Each Post has one Author.
  • Each Post has many Commnents.
  • Each Comment has one Author.
  • Each Post has many Tags, through join table Tagging.

In multiple places in your application, you will have to list the posts. First, on the blog index page, you will likely want to see all of the most recent posts. Similar in RSS feed. In the blog sidebar, you may want to display some subset of these.

Then you may want to have Author page where a short bio of author appears, and a list of posts they wrote. You may also need to implement search, filtering by tag, pagination etc.

In all of these use cases you are likely to re-use significant portion of the same code, so at some point you might want to de-duplicate your code base and provide some sort of generic Posts finder, that allows you to preload some associations if you know you’ll need them, but also makes it possible to filter - by say specific Tag.

You obviously want to fetch the data and filter it as efficiently as possible. How do you achieve this with Elixir & Ecto?

If you ware using Ecto, you may use Ecto query composition with it’s functional API.

Where Ecto really shines is, however, it’s Ecto.Query domain-specific language (DSL).

The DSL is not only arguably nicer to use than chaining functions, but also is more powerful if you know how to use it correctly, allowing to make complicated joins, filters and preloads.

Our desired API

We want to have a single function, which takes two optional maps as parameters: the first map will tell us which associations to preload, the second one will specify optional filters.

In either case, our top-level function will return list of Posts.

We want something like this for preloading:

# return list of posts with no filtering and no
# associations preloaded:
list_posts()

# return list of posts with Author association preloaded
list_posts(%{author: true})

# return list of posts with Comment and it's Author preloaded:
list_posts(%{comment: %{author: true}})

# return list of posts with everything related to it preloaded:
list_posts(%{author: true, comment: %{author: true}, tags: true})

And then we want to optionally apply some filters, let’s say to filter list of comments with author by given tag, you’d do:

# return list of posts with Comment and it's Author preloaded and
# filtered by tag "elixir":
list_posts(%{comment: %{author: true}}, %{tag: "elixir"})

And if you want to filter by tag and Post author, you would go with:

list_posts(%{comment: %{author: true}}, %{tag: "elixir", author: "Author 1"})

The API is simple and flexible enough for us to re-use in multiple places in our code base, but how would one go about implementing it?

tl;dr just give me the code

You can find the example to this blog post on GitHub with module implementing all the preloading and filtering API defined here and relevant tests here

Ecto.Query DSL compositions

You can compose Ecto.Query queries before executing them. In all of the examples below we will use the pattern, where we first build a simple query, append additional conditions, joins and preloads, and then execute it whenever it’s ready.

Preloading Ecto associations with SQL JOINs

We can start with a simple query which loads up all the Posts:

base_query = from(posts in Post)

Then, we expand the query with a JOIN, and use the data it returns to populate Post’s author association:

base_query = from(posts in Post)

query_with_author_preloaded = from(posts in base_query,
  left_join: author in Author,
  on: author.id == posts.author_id,
  preload: [author: author]
)

If you then execute this code, you will get the list of Posts with their Authors already loaded, but importantly you will perform that operation using single SQL command behind the scenes:

Repo.all(query_with_author_preloaded)

# 12:42:53.206 [debug] QUERY OK source="posts" db=1.1ms decode=2.2ms queue=1.1ms
# SELECT p0."id", p0."title", p0."body", p0."author_id", a1."id", a1."name" FROM "posts" AS p0 LEFT OUTER JOIN "authors" AS a1 ON a1."id" = p0."author_id" []

[
  %Post{
    __meta__: #Ecto.Schema.Metadata<:loaded, "posts">,
    author: %Author{
      __meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
      comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
      id: 309,
      name: "Author 1",
      posts: #Ecto.Association.NotLoaded<association :posts is not loaded>
    },
    author_id: 309,
    body: "Post body 1",
    comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
    id: 305,
    taggings: #Ecto.Association.NotLoaded<association :taggings is not loaded>,
    tags: #Ecto.Association.NotLoaded<association :tags is not loaded>,
    title: "Post 1"
  },
  %Post{
    __meta__: #Ecto.Schema.Metadata<:loaded, "posts">,
    author: %Author{
      __meta__: #Ecto.Schema.Metadata<:loaded, "authors">,
      comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
      id: 310,
      name: "Author 2",
      posts: #Ecto.Association.NotLoaded<association :posts is not loaded>
    },
    author_id: 310,
    body: "Post body 2",
    comments: #Ecto.Association.NotLoaded<association :comments is not loaded>,
    id: 306,
    taggings: #Ecto.Association.NotLoaded<association :taggings is not loaded>,
    tags: #Ecto.Association.NotLoaded<association :tags is not loaded>,
    title: "Post 2"
  }
]

Nice, we got the list of Posts with authors preloaded. Let’s think on how to build our original list_posts/2 function:

def list_posts(params \\ %{}, filters \\ %{}) do
  build_query(params, filters)
  |> Repo.all()
end

defp build_query(params, _filters) do
  query = base_query()

  query
  |> maybe_preload_author(params[:author])
end

defp maybe_preload_author(query, nil), do: query
defp maybe_preload_author(query, _) do
  from(posts in query,
    left_join: author in Author,
    on: author.id == posts.author_id,
    preload: [author: author]
  )
end

What we are doing here is always composing our end result query of output of base_query/0 function, and optionally append join and preload clause to fetch author of post for us in one go.

We achieve conditional join and preload, by pattern match in maybe_preload_author/2 function. Only if params[:author] evaluates to true, we preload author in one go.

We want to use the same pattern many times, so our function can grow in number of preloads and filters, and finally provide desired functionality:

defp build_query(params, filters) do
  query = base_query()

  query
  |> maybe_preload_author(params[:author])
  # uncomment below as you implement the rest:
  # |> maybe_preload_comments(params[:comments])
  # |> maybe_preload_tags(params[:tags])
  # |> maybe_filter_by_tag(filters[:tag])
end

Preloading Ecto associations with SQL JOINs

Preloading Ecto associations with single SQL command, as tempting as it may sound, is not always a good idea.

Join-preloading something like comments is a really bad idea - if your post has 100 comments this means the body of the post will be sent 100 times over the wire. Joins create a huge n*m table and send it to the application. Ecto dedups the rows, so you don’t see it, but it’s still extremely wasteful - especially with something like posts and comments that are very data-intesive records. That’s the primary reason why ecto does separate queries for preloads

Michał Muskała (@michalmuskala) April 16, 2019

Bummer. We will have to use different strategy here, and execute second SQL query to preload our comments. Luckily, the resulting code is only slightly different from preloading using JOINs:

defp build_query(params, filters) do
  query = base_query()

  query
  |> maybe_preload_author(params[:author])
  |> maybe_preload_comments(params[:comments]) # add this line
end

...

defp maybe_preload_comments(query, nil), do: query
defp maybe_preload_comments(query, true), do: maybe_preload_comments(query, %{})

defp maybe_preload_comments(query, params) do
  comments_query = from(comment in Comment)

  from(posts in query, preload: [comments: ^comments_query])
end

With the above code we can now preload Posts and it’s Authors in one SQL command, and Ecto will load up Comments using separate SQL command as it will be more efficient in this case:

# return list of Posts with Comment and Posts's Author
list_posts(%{author: true, comment: true})

Super cool!

Mixing both types of preloads

What if we want to preload Comment’s Authors too? Since these are :belongs_to association, we could load it in the same SQL query as the one which preloads Comments. The function call we want to handle looks like this:

# return list of posts with Comment and it's Author preloaded:
list_posts(%{author: true, comment: %{author: true}})

To achieve this, we only slightly need to amend our maybe_prealod_comments/1 function, and add one condition to preload Comment’s Authors:

defp maybe_preload_comments(query, params) do
  comments_query =
    from(comment in Comment)
    |> maybe_preload_comment_author(params[:author])

  from(posts in query, preload: [comments: ^comments_query])
end

defp maybe_preload_comment_author(query, nil), do: query

defp maybe_preload_comment_author(query, _) do
  from(comment in query,
    left_join: author in Author,
    on: author.id == comment.author_id,
    preload: [author: author]
  )
end

Filtering the data

We also want to fitler the data: by Tag, and by Author’s name. Note that preloading Tags is different to filtering by tag.

When preloading the Tags, we want to go through Taggings first, so our query to fetch all the Tags will involve JOIN clause:

defp build_query(params, filters) do
  query = base_query()

  query
  |> maybe_preload_author(params[:author])
  |> maybe_preload_comments(params[:comments])
  |> maybe_preload_tags(params[:tags])
  |> maybe_filter_by_tag(filters[:tag])
  |> maybe_filter_by_author(filters[:author])
end

...

defp maybe_preload_tags(query, nil), do: query

defp maybe_preload_tags(query, _) do
  tags_query =
    from(tagging in Tagging,
      inner_join: tag in Tag,
      on: tag.id == tagging.tag_id,
      preload: [tag: tag]
    )

  query = from(posts in query, preload: [{:taggings, ^tags_query}, :tags])
end

Similar, filtering by Tag’s name will require having to go through Taggings:

defp maybe_filter_by_tag(query, nil), do: query

defp maybe_filter_by_tag(query, tag_name) do
  from(posts in query,
    inner_join: tagging in Tagging,
    on: tagging.post_id == posts.id,
    inner_join: tag in Tag,
    on: tag.id == tagging.tag_id,
    where: tag.name == ^tag_name
  )
end

And finally, remaining piece of functionality is to filter by Author’s name:

defp maybe_filter_by_author(query, nil), do: query

defp maybe_filter_by_author(query, author_name) do
  from(posts in query,
    inner_join: author in Author,
    on: author.id == posts.author_id,
    where: author.name == ^author_name
  )
end

We did it! We can now filter & preload our list of Posts in very flexible, efficient manner!

Example project with implementation & tests

You can find the example to this blog post on GitHub with module implementing all the preloading and filtering API defined here and relevant tests here

Post by Hubert Łępicki

Hubert is partner at AmberBit. Rails, Elixir and functional programming are his areas of expertise.