This is a blog of AmberBit - a Elixir and Ruby web development company. Hire us for your project!

How Elixir’s Ecto differs from Ruby’s ActiveRecord

Hubert

Posted by Hubert Łępicki

Hubert is partner at AmberBit. Rails, Elixir and functional programming are his areas of expertise.
hubert.lepicki@amberbit.com | @hubertlepicki | @hubertlepicki

Introduction

At AmberBit we still mostly develop Ruby on Rails applications for our clients. For about a year now, however, we have been looking at Elixir programming language and it’s ecosystem, to help us solve problems that Ruby (and Rails) fail to address. We used Elixir with client’s projects already, and the experience was generally positive.

The issues we have, when using Ruby, are usually a result of limitations put on us by the language itself (performance, scaling, handling failures and recovery), but also architectural decisions made by libraries and framework’s authors. Generally, as a development company that specializes in solving difficult problems to our clients, we use best tool for the job approach. We still love Ruby and Rails, but it’s been a bitter-sweet relationship for quite some time now. Our approach tries to be pragmatic, and we did use Java, C++ or even PHP in place of Ruby, where it was appropriate. I believe Elixir and it’s libraries allow us to solve client’s problems in similar fashion to Rails, yet it provides some advantages that Ruby does not. On the other hand, there is plenty of reasons to choose Ruby/Rails over Elixir/Phoenix, so the choice has to be done carefully.

To help making decisions for ourselves, and you, my dear readers, I have put together this post describing Ecto, Elixir’s flagship library to talk to databases.

The issues with ActiveRecord

As a Ruby programmer, first glimpse on the Ecto may give us the feel of similarity. “Oh, it’s just like ActiveRecord” were my first thoughts. These were not entirely happy thoughts. I consider ActiveRecord problematic in many ways, and often a reason for bad code, bugs, and performance bottlenecks. on the other hand it’s also easy to use and straight to the point, which allows a programmer to build solutions faster. This is, of course, until a critical mass is met, when it tends to fall apart pretty badly.

To minimize the risk of failure, when developing reasonably large Rails applications, our brave team came up with many rules when dealing with ActiveRecord. In many projects we proactively decided:

  • never put business logic into ActiveRecord models - methods that perform operations on models, calculate some results etc. Instead we use service objects/decorators/serializers etc. etc.
  • never use validations in models - extract “form objects”, using Virtus.model and ActiveRecord::Validations or similar solution
  • never use callbacks - to avoid “callback hell”

My personal opinion is that ActiveRecord model looks best naked, which may be in the form of:

class User < ActiveRecord::Base
  has_many :roles
  has_many :organizations, through: :roles

  def self.active
    where(deleted_at: nil)
  end
end

These precautions help significantly avoid the usual mess on the models layers, but there are still a few problems that remain unsolved. One of the issues we have been facing is unexpected performance issues, and tracking them. Since ActiveRecord does provide objects, that are related to other objects using associations, it is extremely easy to write - good looking on the surface - code, that will perform badly. Moreover, ActiveRecord actively encourages us to execute SQL queries in unexpected places, like views. How? Consider the following code

class PostsController < ApplicationController
  ...
  def create
    @posts = Post.where(deleted_at: nil)
  end
end

It looks like we are passing a list of posts to view, while in fact the core SQL query will be executed when the view starts rendering the list of posts. More confusingly, if we execute the same query in the Rails console rails c, it will look like the query executes instantly. How’s that possible? ActiveRecord tries to delay execution of SQL queries as far as possible. In example above, the SQL will hit he database when template will start iterating over the list of records. When using rails c, it will be executed when console tries to inspect result and print it to terminal.

The delayed execution is good and bad at the same time. It is good because we can chain the queries, joining scopes and adding additional conditions to @posts as we need them. The bad side of things has two faces:

  1. The view executes SQL queries, which means all the exceptions will be reported as happening in the view. There is no easy and nice way to recover from those errors. The SQL execution will also be mixed with execution of template’s code (and stuff like helpers if we use them), so figuring out what is the real performance bottleneck is more difficult.

  2. The database connection needs to be kept open throughout the request life cycle, because we have absolute no idea when the SQL will get sent to the database.

The second point above is important for applications that have multiple workers, as in Unicorn workers, but also for applications that perform a lot of stuff in background. The hard constraint you’ll hit, when using ActiveRecord in such environment is number of connections in the pool. While this may seem like artificial constraint to you, we have hit the issue very much recently for one of our larger clients.

When you spawn a rake task via Cron job, you need to check out connection from the pool manually, using ActiveRecord::with_connection. This takes a block of code that will be able to use ActiveRecord’s models to talk to database. While you may think that it is easy to wrap up only portions of your code that need to talk to the database into with_connection, the reality is that you often need to wrap the whole jobs in the block. This is precisely because of the point 1) above: the SQL execution is delayed, and it can happen in the part of the code you did not intend to.

The issue with hitting connection limit has no real good resolution when using ActiveRecord. You can’t simply increase the pool of connections for ever. For once, there may be hard limits set on you by the provider, such as Heroku Postgres, on the other hand, increasing the number of connections on the database level can increase the overall memory footprint and database on the database server, really significantly.

Let’s have a look on how Ecto helps us deal with the issues above and some more.

Supported databases

Ecto primarily focuses on PostgreSQL. It is the best supported database it can use. You can choose, however, to use MySQL/MariaDB/whatever it is called nowdays, or even attempt to use MongoDB if you don’t mind writing 10000 documents and reading all 99997 using Ecto later ;>. You can also use Sqlite and MSSQL.

ActiveRecord, on the other hand, supports slightly larger number of SQL databases. Yet it’s tailored specifically towards SQL. ActiveRecord does not support any NoSQL databases at this point, and probably never will.

No objects, just data

ActiveRecord is an ORM. This means it converts data that sits in database tables into Ruby’s native objects, and the other way around. These objects are mutable, provide access to data, changes, functions to manipulate the data, validations and callbacks.

Ecto is not an ORM, but it looks like one on the surface. When we query database, we will get and array of %Post{}, a structure. When you attempt to insert or update records in database, we are getting a special Changeset structure, that on the surface looks like ActiveRecord object: it contains valid? and error fields. But these are fields in already generated structure, not a methods to call:

iex(8)> { :error, changeset } = Repo.insert Post.changeset(%Post{}, %{title: "Hello"})
{:error,
 %Ecto.Changeset{action: :insert, changes: %{title: "Hello"}, constraints: [],
  errors: [body: "can't be blank"], filters: %{},
  model: %HelloEcto.Post{__meta__: #Ecto.Schema.Metadata<:built>, body: nil,
   id: nil, inserted_at: nil, title: nil, updated_at: nil}, optional: [],
  opts: [], params: %{"title" => "Hello"}, prepare: [], repo: HelloEcto.Repo,
  required: [:title, :body],
  types: %{body: :string, id: :id, inserted_at: Ecto.DateTime, title: :string,
    updated_at: Ecto.DateTime}, valid?: false, validations: []}}
iex(9)> changeset.valid?
false
iex(10)> changeset.errors
[body: "can't be blank"]

Repository and queries

Ecto uses concept of Repositories to write and read data from database. These are in fact database wrappers and Ecto is not using a repository pattern as you may know it. The fact that they will not rename Ecto.Repo does not concern me much, however: it’s pretty easy to rename it yourself in your generated app.

Repository is more similar to global DB object if you ever used Sequel in Ruby. It provides insert, all, get_by functions, and many similar. Check out full documentation for a list.

Repository is a place that performs previously prepared queries. You actually have two syntax to choose from when preparing queries: a fancy SQL-like format (keyword query), and traditional chained functions to assemble a query:

  import Ecto.Query
  query = from w in Post,
            where: w.title == "Hello Ecto"

  Repo.all(query)

or

import Ecto.Query
query = Post
  |> where(title: "Hello Ecto")

Repo.all(query)

The important note here is that you can prepare and chain query as much as you like them. You can even mix both versions if you want to create more complex queries (please don’t). The query, will be sent to database and executed when you execute Repo.all/1, however. There is nothing delayed. You get back an array of %Post{} that cannot be chained further.

This is important from the predictability point of view: you are sure that your queries will be executed in this place, and this place only. No unexpected database calls will be issued from the view or template.

Validations

Ecto’s validations mechanism is remarkably similar to Rails. By convention, our Post module will have function changeset defined. This function takes a structure of %Post{}, map of changed fields (taken from params for example) and performs validations.

def changeset(model, params \\ :empty) do
  model
  |> cast(params, ~w(title body), [])
  |> validate_length(:title, min: 1, max: 200)
end

There is a fundamendal difference in philosophy here, between Ecto and ActiveRecord. In ActiveRecord, you put the validations into the class definition. In each context where a model instance will be used and save would be performed, the validations will be run. This can be undesirable, when you have different set of requirements for registration and then update. You often end up with a mess of conditional validations, or you are forced to extract form objects that have set of common, and set of different validations.

In Ecto you deal with this situation by simply creating different changeset functions. You can create registration_changeset/2 function that will perform some extra validations, and update_changeset that will have completely different set of validations. These cases are remarkably common in my experience: different levels of access, different use contexts, all require different validations on your forms.

Ecto handles database index errors as validations - and ActiveRecord does not. The changeset returned from Repo.insert, if the insert fails because of unique key, will have appropriate error message associated with a field. This means you do not have to catch exceptions, as you do in ActiveRecord, and write any custom code to handle database index errors differently than validations.

Handling database connections

Ecto handles database connections differently than ActiveRecord. We generally do not need to keep the connection open and assigned to current process all the time. Since we’re sure we’ll not execute queries from our templates, only from our controller, it is reasonable approach to return the connection to the pool as soon as we don’t need it. This allows Elixir’s applications (and Phoenix apps) to spawn more workers, sharing smaller number of database connections in a pool, and still do some work, since the connections are checked in and checked out of pool at shorter intervals. There is a way in Ecto 2.0 to perform an operations in transactions, where the same connection needs to be reused, however.

Callback hell

There are no callbacks in Ecto 2.0 anymore. Period.

Summary

I hope I got you interested in Ecto after reading this post. Yes, it’s still young, but it also means you can help it grow. In my experience it is very much usable already. The amazingly friendly community of Elixir makes it also really easy to contribute to Ecto, in case it lacks features you may require.

Hubert

Hi there!

I hope you enjoyed the blog post. Can we help you with Elixir or Ruby work? We are looking for new opportunities at the very moment, and we do have team available just for you.

Email me at: contact@amberbit.com or use the contact form below.

- Hubert Łępicki

comments powered by Disqus

Want to get in touch? Drop us a line!