Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
Basic Ruby Webapp Performance Tuning (Rails or Sinatra)

My company launched our app, Cohuman, a few weeks ago. The rush of finishing features, fixing bugs, and responding to user feedback has subsided a bit, and it’s time to go back and give the little baby a tune-up. I find that a good development process will ebb and flow, and as long as you don’t let something slide for too long, it’s perfectly acceptable to let bugs, or performance issues, or development chores pile up for a bit and then attack them concertedly for an entire day or two. A bug-fest or chore-fest or tuning-fest can actually increase efficiency as you get in a rhythm… and it feels really good at the end of the day when you see all the bugs you slayed or all the milliseconds you shaved.

In this article I’d like to describe some of my techniques. I make no claim of originality or great expertise; I just want to share what I know, and hear (in comments) what other people have learned. I’m using Sinatra and ActiveRecord, but not Rails; hopefully this discussion will help people no matter what framework they’re using.

Metrics and Logs

The first step, and often the most overlooked, is to gather metrics. Without knowing how it’s working now, how are you going to know what to improve? And how are you going to know whether you made things better or worse? Frequently I’ll make a change that I’m sure will improve performance, only to discover that it’s made no change, or helped in one place but hurt in another.

Where to begin? We’re using New Relic for live performance monitoring, so my decision of what to optimize was easy: I went to their Web Transactions panel and looked at the Most Time Consuming and Slowest Average Response Time reports. If you don’t have a flashing signpost like that, it’s easy enough to decide on a path to work on: either go with user reports, or click around your app and see what feels slow, or choose the most popular request (which is usually the home page).

I always pick a single path to work on, from request to controller to database to view, and work on the slowest parts. This demands more metrics! It’s a common mistake to jump in and start tuning the database when the view is actually taking twice as long. What’s the use of cutting the database access from 400 to 200 msec when the view is taking 1200 msec to render?

I also like to grab a copy of the production DB and bring it to my development machine so I can be sure I’m profiling real cases, and not being fooled by artifacts of generated data. We’re lucky that our app is currently small enough to do this; when the app gets bigger we’ll have to write a script that grabs only selected users’ data as a slice of the whole enchilada. (Note that there are some privacy concerns here: we are careful to only log in locally using our own accounts, and only to gather statistics in aggregate, not to look at details of user-entered data unless it’s to diagnose a specific user-reported issue or bug.)

Lots of in-app metrics tools exist (e.g. ruby-prof, benchmark), but I prefer the simple approach: I rolled my own Marker class that spits out basic msec timing information to the logs. In single-request performance tuning, what matters is relative timing between sections of code, so any objections to this technique on grounds of accuracy or detail are outweighed by its advantages: it’s simple, it shows where your bottlenecks are, and it divides the logs into sections so you can get a sense of who’s making what calls.

class Marker
  def self.mark(msg, logger = ActiveRecord::Base.logger)
    start = Time.now
    logger.info("#{start} --> starting #{msg} from #{caller[2]}:#{caller[1]}")
    result = yield
    finish = Time.now
    logger.info("#{finish} --< finished #{msg} --- #{"%2.3f sec" % (finish - start)}")
    result
  end
end

Usage is simple: pick a block you’re interested and wrap it in Marker.mark("foo") do...end. You can then scan the logs using “less” (or a text editor) and search for the name you gave the block. Marking your controller and your view is a natural place to start; later you can insert marks inside interesting blocks of domain code. In Sinatra, you can do something like this:

get '/foo/:id' do
  foo = Marker.mark("loading foo") do
    Foo.find(params[:id])
  end
  Marker.mark("rendering foo") do
    FooWidget.new(:foo => foo).to_s # Erector
  end
end

I’ve also got a nice little Rack middleware component that marks the time spent inside each request. Note here that you can put lots of fun information in the name that can be helpful for debugging.

class Marking
  def initialize(app)
    @app = app
  end

  def call(env)
    response = nil
    Marker.mark("#{env['REQUEST_METHOD']} #{env['SCRIPT_NAME']}#{env['PATH_INFO']}") do
      response = @app.call(env)
    end
    response
  end
end

Figuring out where a particular log message (especially a DB query) is coming from is essential. It’s important not to make assumptions. If you think you know where the call is coming from, put in a stack trace to make sure, and rerun the request to confirm. That’s why Marker is outputting caller — caller[0] is the code that names the mark, so you already know where that is; caller[1] is the line that called it, and caller[2] is the line that called caller[1]. If that’s not enough context, drop in a logger.info(caller.join(”nt”)) so you can scan the entire stack trace back up to the application code that you understand.

I’ve found that while ActiveRecord (2.3.5) tries to show where a result is coming from, it doesn’t always get it right, especially if you’re using plugins or gems that insert themselves into the call chain. So I monkey-patched AR to be a little smarter about its tracing:

module ActiveRecord
  module ConnectionAdapters
    class AbstractAdapter
      # strip library file pathnames from logged stack traces
      def log_info(sql, name, ms)
        if @logger && @logger.debug?
          c = caller.detect{|line| line !~ /(activerecord|active_support|__DELEGATION__|vendor|new_?relic)/i}
          c.gsub!("#{File.expand_path(File.dirname(RAILS_ROOT))}/", '') if defined?(RAILS_ROOT)
          name = '%s (%.1fms) %s' % [name || 'SQL', ms, c]
          @logger.debug(format_log_entry(name, sql.squeeze(' ')))
        end
      end
    end
  end
end

All this leads to log entries that look like this:

Thu Apr 15 11:09:17 -0700 2010 --> starting GET /app from /Users/cohumancomputer27inmac/dev/cohuman/lib/query_caching.rb:15:in `call':/Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/connection_adapters/abstract/query_cache.rb:34:in `cache'
  User Load (0.8ms) domain/user.rb:362:in `authenticate_from_login_token'   SELECT * FROM "users" WHERE ("users"."login_token" = E'abc123xyz') LIMIT 1
Thu Apr 15 11:09:17 -0700 2010 --> starting rendering ApplicationPage from /Users/cohumancomputer27inmac/dev/cohuman/controllers/app_controller.rb:4:in `GET /app':/Library/Ruby/Gems/1.8/gems/sinatra-0.9.4/lib/sinatra/base.rb:779:in `call'  Project Load (13.4ms) domain/user.rb:146:in `projects'   SELECT "projects".* FROM "projects" INNER JOIN "memberships" ON "projects".id = "memberships".project_id WHERE (("memberships".user_id = 2))
  User Load (3.2ms) domain/user.rb:135:in `coprojectmates'   SELECT "users".* FROM "users" INNER JOIN "memberships" ON memberships.user_id = users.id WHERE (memberships.project_id in (4,129,122,1,66,82,102,684,533,139,3,155,624,106,90,394,399,153) AND memberships.user_id != 2)   Email Load (2.1ms) domain/user.rb:135:in `coprojectmates'   SELECT "emails".* FROM "emails" WHERE ("emails".user_id IN (1,3,5,7,8,11,12,9,6,14,15,16,17,18,22,27,26,35,45,32,30,79,37,109,80,504,507,508,39,165,521,725,727,729,730,731,734,735,736,105,28,58,240,381,51,40,36,785,834,839,844,847,850,842,840,841,843,889))
  User Load (11.7ms) domain/user.rb:128:in `cohumans'   SELECT "users".* FROM "users" INNER JOIN "cohumanities" ON "users".id = "cohumanities".cohuman_id WHERE (("cohumanities".actor_id = 2))
  Email Load (19.4ms) domain/user.rb:128:in `cohumans'   SELECT "emails".* FROM "emails" WHERE ("emails".user_id IN (1,6,108,8,22,35,509,852,853,854,862,864,866,3,895,896,897,929,930,931,30,944,165,827,976,977,978,735,1024,2003,2004,59))
  SQL (0.5ms) domain/user.rb:177:in `temporary?'   SELECT count(*) AS count_all FROM "emails" WHERE ("emails".user_id = 2)
  Email Load (0.3ms) domain/user.rb:173:in `verified?'   SELECT * FROM "emails" WHERE ("emails".user_id = 2)
Thu Apr 15 11:09:17 -0700 2010 --< finished rendering ApplicationPage --- 0.307 sec
Thu Apr 15 11:09:17 -0700 2010 --< finished GET /app --- 0.311 sec

I know it can look daunting, but when scanning logs, it’s important to keep a clear head. Let’s examine this little burst of gibberish and try to make sense of it.

Line 1 says “–> starting GET /app” which means that the user has made a GET request for our main URL. We can skip ahead (search for “–< GET /app”) and see that the entire request took 0.311 seconds. This isn’t bad, but it could be better.

Line 3 says “–> starting rendering ApplicationPage” which means that all the other queries are happening from inside the rendering view code.

Note that the database queries are only taking 49.3 msec out of 311 msec, which means 84% of the time is spent either processing DB results or rendering them. This request is probably not a good candidate for DB-level tuning.

(How’d I add up all those scary milliseconds without an abacus? Piped my log text into this bad boy:

  ruby -e 'x = 0; STDIN.each do |line| if line =~ /(([0-9.]*)ms)/; then x += $1.to_f; end; end; puts x'

)

Indexes

Most (if not all) databases add an index for the primary key of a table. But a quick scan of the database logs will show many fields that are used in queries, and chances are you haven’t added indexes for them. (In fact, you probably shouldn’t add an index for a field until it shows up in the logs, since indexing slows down writes and takes up extra disk space. Not a lot, but it might add up.) In the above example, look at the User Load — every time a user hits the site we check to see if he’s logged in by querying the database for his login cookie. Adding an index for the “login_token” field in the users table sped up this query by a factor of 10. (Yes, that violates my “don’t fix what ain’t slow” dictum, since going from 10 ms to 1 ms isn’t really fixing much, but I figure it adds up over time since it happens on every single app request.)

Avoidance

The only perfect program is the one with zero lines of code. And the fastest code is that which is not run.

Sometimes you can optimize a section of code by removing unnecessary calls from your app layer. One nice trick these days is to move stuff behind an Ajax call. In Cohuman, we do this with some of our tabs: if you switch to a tab, and it hasn’t been loaded yet, it shows a spinny and starts an Ajax call to load it in. As long as we can keep each Ajax call under a second in length, the user-perceived delay is negligible.

Query Caching

ActiveRecord maintains a query cache, so if you run the same query (and I mean the same SQL), it won’t hit the database again. But if you’re not using Rails, query caching is disabled by default. So I wrote yet another Rack middleware so I don’t have to remember to wrap all my controllers in a ActiveRecord::Base.cache do block:

# a Rack middleware component that enables ActiveRecord query caching
# To use, put "use QueryCaching" in your Sinatra app.

class QueryCaching
  def initialize(app)
    @app = app
  end

  def call(env)
    if is_static_file?(env)
      @app.call(env)
    else
      response = nil
      ActiveRecord::Base.cache do
        response = @app.call(env)
      end
      response
    end
  end

  def is_static_file?(env)
     # if the path end with a dot-extension (e.g. 'foo.jpg') then we assume
     # it's a static file and don't enable the query cache. (This will only
     # work for some application URL schemes, naturally.)
    env['PATH_INFO'] =~ //[^/]*.[^/.]+$/
  end

end

Note that this is a query cache, not an object cache (see below).

Query Tuning

Once you’ve identified some troublesome queries, you need to decide how to optimize them. You’ve basically got two choices here; which to choose should be obvious from the logs. Are there many low-latency queries, or a few high-latency queries? High-latency queries are an obvious target, and you should do your best (with indexes and SQL) to cut them down to size, but don’t let them distract you. There are two hidden costs to low-latency queries:

They actually take longer than they say they do – the AR log line only displays the time for the database connector to return the raw data. It doesn’t show the time to create AR instances, build association “classes” (which takes an annoyingly long time, since all their methods are built on the fly for each instance), and run post-load initialization code. (I just did a little experiment loading ~3000 of our User objects, which have a fair number of associations; SELECT * FROM users took 21 msec but User.all took 547 msec. That’s about 25x as long!)

They stack up, and I’m not talking pancakes – chances are you’ve got a lot of webapp processes hitting a single database (or a small number of slaves). As traffic increases, the queries will stack up like airplanes requesting permission to land. At a certain point you’ll hit a cliff (sorry for the mixed metaphor — it’s not fun to imagine a plane hitting a cliff) and per-request latency will rise dramatically. Lowering the number of queries per web request will, um, raise the ceiling? Lengthen the runway? Lower the cliff? Anyway, it’ll make this problem, uh, less worse. It’s kind of counterintuitive, but the limiting factor for modern webapps is really the number of queries, not the amount of data returned by each query.

ActiveRecord associations (like has_many and belongs_to) are great for getting an app up and running, but as you peruse your logs you’ll notice some things they’re doing that aren’t very efficient. Our app loads a lot of objects, each of which has lots of associated objects, some of which associate to other objects. If we’re displaying a list of Users, and each user has associated Emails (via has_many :emails), and we want to render a list of users and their email addresses, we’ll probably see one query that loads all users, and then one query for each user loading his or her emails.

Adding an :include to the declaration is a good way to reduce these from N+1 to 2, but it doesn’t always work. I have never been able to comprehend AR’s alien logic, so my logs are often littered with queries despite my best efforts fiddling with the association declaration. Furthermore, AR is quite naive about object graphs: for example, user.emails.first.user will make an extra query and return a different user than the one you started with, even though they have the same id and you loaded the emails via :include.

So I’ve gotten good performance boosts by moving away from ActiveRecord and doing some nested queries by hand. Not by writing literal SQL, but by doing one query, extracting the necessary ids, and then doing the next query, and saving or plugging in values directly. This led naturally to Treasury (see below).

Sometimes, of course, writing SQL is unavoidable; fortunately, AR allows it, and there are many people who are much better at that than I, so I won’t embarrass myself by discussing it further here.

Object Caching and the Repository Pattern

During my first pass at tuning Cohuman several months ago, I took a little time to write a library that implements a Repository Pattern. The Treasury is a work in progress that sits in front of ActiveRecord (and eventually, other ORMs) and caches object instances as they pass through. If you then request an object via the Treasury, it will check in its cache and return a pointer to the existing object instead of making a query; if you specify a list of ids, then it will only query for the ones it doesn’t yet have. (There are other features I won’t go into here, including a DSL for building queries… expect an upcoming article to officially introduce Treasury to the world.)

I’ve heard that DataMapper has an object cache, but I haven’t yet dug into the details of how it works, so I don’t know if Treasury is redundant with it, or if it would make sense to plug in DM behind it. (I’ve also heard it solves the N+1 query problem gracefully. Anyone want to proselytize DM in the comments?)

All the caches I’ve mentioned only persist within a single request. This is probably a good thing, since allowing instances to persist between requests would open a can of data integrity, thread safety, and multi-host worms. But I can’t shake this vision I have of a sort of in-process memcache for Ruby objects, where multiple processes communicate changes to each other via TCP wormholes… Does anyone else share this vision, or am I doomed to wander the Ruby blog desert, mumbling incoherently at strangers?

Comments
  1. Joseph Palermo says:

    I’ve never seen a :include not preload the association, but it does frequently fall back on the old style of using JOINS rather than the new (and faster) one query per table.

    ActiveRecord doesn’t give you any way of disabling the old style (despite our efforts). What I typically do is reopen ActiveRecord::Associations::ClassMethods and redefine references_eager_loaded_tables?(options) to always return false.

    This is a check where it tries to see if you are selecting or putting conditions on a table that you are expecting to get joined into the query because of a (old style) :include. Really though, if you want that table in the query, you should be using a :join statement in your finder call.

    references_eager_loaded_tables? is very aggressive about falling back onto the old style by default since they didn’t want to break anybody who was relying on the joined tables.

  2. Brian Doll says:

    Instead of writing and using your own Marker class to gain insight into transactions on your development environment, you could also use New Relic RPM in developer mode (http://www.newrelic.com/demos/developer-mode.html). It keeps the last 100 or so requests in memory and gives you much of the same goodies as you see in the online tool.

    You can actually use developer mode for free, so this applies to anyone, not just those with an RPM account.

  3. Eric says:

    ActiveRecord::QueryCache is a middleware that comes with AR — so unless Sinatra doesn’t play nicely with it you could use that as well. Looking forward to the post on Treasury!

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *