Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
Maintainable State Machines Part 2 – don't store state names in the database

In relational databases it’s common to use foreign keys to reference other tables so that when you make a change to values in the table you maintain referential integrity without paying a performance cost. For some reason, many Rails programmers don’t apply this concept to state machines in ActiveRecord. In this post I’ll describe the benefits of not storing state names in the database.

Let’s say you have a community site that allows users to signup. Further, you need to verify their email addresses. You might start with a class like this:

class User
  state_machine :status do
    state :unverified
    state :verified
  end
end

Over time your community grows and some users stop going to the site. You decide that you are going to retire inactive accounts so that the usernames can be claimed by new users. You decide to add the concept of “active” and “inactive”. You update your class to look like this:

class User
  state_machine :status do
    state :unverified
    state :active
    state :inactive
  end
end

Unfortunately, you decided to store the states as strings in the database, which means that you need to run a migration to update existing users:

class AddActiveAndInactiveToUsers < ActiveRecord::Migration
  def self.up
    execute "update users set status = 'active' where status = 'verified';"
  end
end

But with 27 million users in the database, this query takes the site down for over 15 minutes so you have to schedule it for 3am. Even still, you get bad press for being down. And to think, all this can be avoided by just storing integers in the database!

Some state machine implementations provide easy ways to accomplish this. For example if you use PluginAWeek’s state_machine gem you could write:

class User
  state_machine :status do
    state :unverified, :value => 0
    state :verified, :value => 1
  end
end

Since you store integers in the database, you can easily add a new state and change the name of an existing state without having to migrate data. Having state names in the database is a classic example of dependency inversion, where your code, which doesn’t change very often, relies on a value in a database table, which is likely to change more often.

If your state machine doesn’t support storing values in the database as integers (or guids), I suggest you look into upgrading to one that does, or patching your existing state machine to support integers. You may end up getting one less sleepless night because of it!

Comments
  1. Evan Farrar says:

    I kind of hate lookup tables, mostly when the key and value are in different places. This is sort of like having a bunch of magic values. Having a lookup table for referential integrity is great…when you that table is actually in your database and you can query and get a semantically meaningful name.

    When the lookup is defined in ruby but stored in SQL it’s a little painful for maintainability. Next thing you know your DBA will replace a 3 with a 4 somewhere and update 27 million users to be inactive with no clue what he was actually doing.

    Why, oh why, would you do this pre-emptively when you have 27 million users to contend with? Why not set the :matcher option for a release or so (or 15 minutes) while you run the script to update those 27 million users in spare cycles at 3am:

    state :active, :matcher => labmda{ |val| val =~ /(verified|active)/ }

  2. Jeff Dean says:

    Magic values are always an issue – and in the case I described above there are at least 2 easy solutions to magic values. The first is to store the number values in constants (like Product::IN_STOCK) and the other is to use a gem like active_hash to store the key/value pairs in one place and simulate an ActiveRecord class.

    I can’t speak to whether or not storing integers will increase the probability of human error, but I would prefer to use pairing and test-driving to mitigate those concerns.

    The matcher you mentioned is pretty particular to state_machine, and wouldn’t help you deal with other parts of code that touch these states, such as named scopes. Adding matching scaffolding like that just to ease the pain of a migration that wouldn’t have been painful in the first place if you had just used integer keys baffles me.

    To me storing named states in the database seems like a classic case of dependency inversion, where code depends on values in the database that are likely to change more often than the code itself. More often what I’ve witnessed in real code is that the concepts change, but the state names stay the same (because of the pain of migrations) and the code quickly becomes very difficult to understand.

  3. Zach Moazeni says:

    From a raw data perspective, it sounds like you’re wasting a lot of code/logic readability to save 15 minutes on a migration for a table with 30 million rows.

    When I’m debugging objects/logic, I personally like seeing the literal state value instead of having to mentally translate it (e.g. Object#inspect). I’ve found boiling away those mental translations offers big wins when trying to narrow down issues.

    While I don’t have any perf numbers, you could always drop the index, update the values, then re-apply the index. That’s a lot speedier than updating with the index on.

    It’s also important to note that when you have tables with 30+ million rows and a user base that will notice an extended 3 a.m. downtime, you are going to have to alter your perspective about a lot of things including schema/data migrations, deployments, and user communication around deployments. Until then, it feels like YAGNI.

  4. Jeff Dean says:

    I agree that readability goes down slightly, but I think the tradeoffs are well worth that slight decrease. At the end of the day, storing state names increases the cost of responsible change, because a migration is required. If you think that cost is lower than the cost of debugging objects with `inspect` and have data to support that in your app, that’s the right choice.

    In theory, the point of SOLID code is line up your dependencies such that you can quickly and easily make changes with low risk and low cost.

    In practice, I’ve seen several apps where there are a mix of status columns and booleans that both denote state because the cost of introducing new states or renaming existing ones was too high. I’ve also worked on an app where some state names in the database did not match the state names in code, which was particularly painful to work with.

    You can call YAGNI on any refactoring to make code more SOLID, and most Rails developers I know do, but in my experience business pay 3 or 4 times the cost in the long run because developers called YAGNI too soon.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *