Last week, I was tasked with diving into Pivotal's allocations application to figure out why it was operating so slowly and hopefully make it a bit better. The application was written as a side project about 4 years ago, and clearly showed its age. It's not every day I run into an application that uses RJS! Anyways, I was able to use an incremental refactoring-based approach to improve the speed by about 80% or so. Edward and Josh Knowles suggested that I write up a bit about what I saw and how I improved it, in hopes that other engineers can make use of these performance tuning and refactoring concepts on other projects.
So, without further ado...
Set up - Measurement measurement measurement
Any time I look into a performance feature, I try to focus on getting a clear piece of functionality to optimize (in this case, the main project matrix page) and measure the heck out of its performance both on production and locally. So, the first place to go was NewRelic. NewRelic told me a couple of things - first off, there were a lot of database queries hogging most of the time. Secondly, most of those database queries involved an ActiveRecord object called 'PersonRange'.
Adding some manual benchmarking (it's easy! Just add a 'stamp' function and a before/after filter to generate a report of your timestamps) told me that many of the database queries were actually happening during view processing - a big no-no. I had a direction of investigation.
So why did the view take so long?
Like a lot of Rails projects, the view on allocations relies on a series of partials to generate a large matrix. All of these partials are looped - and with the main matrix page looping over projects, each project looping over weeks, each week looping over allocation tiles, you can imagine how the numbers add up quickly.
My first line of improvement was to streamline the innermost partial as much as possible. First off, I replaced the partial with a helper method. In general, looping over a partial is slower than calling the partial as a collection, which is in turn slower than using a helper method to generate markup. The innermost partial was very simple and easily lent itself to being a helper method. For good measure, I turned the 'loop over allocation tiles' code snippet into a helper method as well.
When I did this, I naturally started looking at the parameters this method needed. One strangeness - a random lookup hash named 'roles' was passed to this method/partial. The partial then looked up an the person's role from this hash. The lookup hash itself was generated through a DB query generated by a helper method in the next outer partial (project_week_cell), so it generated a DB query per project per week.
On a parallel note, we also needed to look up people's locations on a week by week basis, and there were some on the fly DB calls happening in the view layer for this as well.
So where did role and location come from? Lo and behold, both of these properties were methods of a single PersonRange object.
My direction was clear. In the controller layer, I made one set of queries to figure out the relevant person range for each person for each week. This cache was used for all location and role questions from that point forward. The cache was plopped into the main ProjectAllocationMatrix object along with the other preexisting caches, of which there were many. Boom, 50% speed improvement.
Before I could tackle a bigger refactor, I needed to simplify and organize the code a bit. The codebase had some obvious things to organize, that wouldn't affect the performance much but would clarify flows and responsibilities.
- There were some helper methods that didn't have anything to do with view logic. The easy refactor for these kinds of methods is to figure out which of their arguments is the 'important object', and move the method into becoming an instance method of that object.
- I found a few repeated patterns embedded in the views - an expression of 'billable + unbillable + overhead' that took three separate collections and added them together in this specific order. These three collections were only used to be added together in this way. An easy refactor consolidated them into one collection method. In the process, I simplified the calculations from three separate selects to a single sort.
- One of the caches stored a person's last unique set of initials. This cache was then postprocessed to generate an abbreviated name for that person. It seemed more useful to make this cache store the entire abbreviated name, to reduce postprocessing.
- Some of the caches were exposed so that other code (mostly view logic) used them directly as a hash, knowing exactly how the cache was organized. I reduced the cohesion between the caches and the view layer, adding access methods to the matrix object that held the caches.
Taking another passthrough for performance
Even after these changes, I was still seeing some database calls in the view layer. I decided to track them down and get rid of them.
- One of them was a 'find location by id' lookup. As it turned out, locations were an association from PersonRange, so making the person range cache join with location in the query was a no-brainer and removed all those queries.
- The time calculating allocations for a week was a bit long. Looking at this code, it was running a SQL query then using Ruby to first group allocations by project ID, then sort by person for each project. I couldn't imagine that doing an individual sort for each project was terribly efficient. I decided to do the sort first, then select the projects. This shaved a few milliseconds - no big deal.
- Some of the matrix caches weren't all that useful any more. There was a 'locations by people' cache that was used a bit. The person ranges cache could do everything this cache cared about and more - it also knew if a person's location changed from week to week! I killed the other cache and converted to using the better information.
- The next closest partial was 'project_week_cell', which was surrounded by a loop to render individual copies of this partial. I switched it to a collection partial.
- I also found out about a code branch that stored abbreviated names on a person by person basis in the database rather than calculating it for each matrix. I incorporated it and removed another cache calculation from the matrix.
- Around this time, I realized I could tune the 'resource_target' cache to cache what we actually cared about, target counts. Shifting the cache shaved a bit of time off of matrix generation in general.
All of these changes chopped another 25% or so off of the load time.
Now the basic matrix page render was in fairly decent shape, and I moved onto the other mandate - making the drag/drop operate more smoothly. The mechanism was basically that it would perform the allocation change, recalculate the matrix for the changed projects, and then render RJS that refreshed the project rows for these changed projects.
When I looked at performance, I discovered that most of the time was spent generating two copies of the allocation matrix. The first matrix was the one mentioned in the controller (just calculating for those two projects); the second matrix was a full matrix calculated to generate refreshed billable percentages.
The interesting thing was that both matrices took almost the same time as each other. Restricting projects did not matter for performance.
As a first pass, I decided that the 'restricted projects' matrix was useless. If we used the bigger matrix for everything, we only had to calculate everything once and we would have everything we needed. I made this conversion and I cut server time in roughly half.
We have to refresh large chunks of HTML? Really?
I was now ready to do another performance pass (beyond the minor improvements from having smaller markup).
So why do we need to recalculate the whole matrix, anyways?
I returned to some of the oddities of the first drag/drop pass.
First oddity - making an allocation matrix with only 2 projects of interest was just about as expensive as making an allocation matrix with all projects. I discovered that the reason was buried in the caches - there was a low level shared cache of allocation information used both to calculate project allocations and unallocated people.
Getting rid of this cache and making these two calculations retrieve just the information they needed was the way to go. When I did this, the full matrix stayed at about the same performance level it was before. The smaller matrix, however, became much faster.
This led to the question of whether I could get away with only using the smaller matrix. The answer appeared to be 'yes', provided I figured out how to keep the billable percentages over all projects up to date.
So where to, now?
Every refactor leads to a new refactoring idea. Even though my week was finished, I saw plenty of other things that could be tightened up. Among my random thoughts:
- YUI drag/drop requires element IDs to identify elements, and jQuery uses selectors instead. I know which technology I'd rather use. Guess which one allocations currently uses, and how many HTML IDs needed to be added to the markup because of this.
- Some of the HTML classes were extremely duplicated because they were on the wrong elements. For example, I would see a billable project row with a whole bunch of allocation cells, each with the class 'in_billable_project'. Why not give that class to the project row as a whole?
- I ended up with drag/drop returning chunks of HTML markup. Chunks of basic allocation data would be a lot more lightweight, and managing cells on the client side would open up some nice possibilities for visual effects.
Ask for Help
"bundle install seems very slow everytime, but bundle check seems fast. Why doesn't bundle install use bundle check before doing its thing?"
Consensus was that this seemed like a good idea.
"When setting up a cc.rb box, the box could not connect to Github, yielding the 'You don't exist, go away!' message. How do we fix this situation? We can get to github through the command line without any issues."
- One thing to check is your protocol. The git protocol is closest to SSH and obeys most of the settings SSH does.
- Also check your agent forwarding settings. Is your box explicitly doing everything that it should?
"In Rails 2.3, we tried mocking a has_one association. However, it looks like the association isn't mocking. Why?"
Rails 2.3 associations have a proxy object that delegate to lower level objects. This proxy isn't mockable, but the target (proxy_target) is.
"What is the current best of breed passenger config beyond what you get from the passenger site?"
Recommendations were given for mod_speed.
"What are some easy ways to implement CSS spriting on my site?"
For a quick definition of CSS sprites, look here. Recommendations included Compass/SASS.
- jQuery 1.5.1 is out! It is the first jQuery that explicitly supports IE 9, so it's recommended for next generation web site development.
Ask for Help
"When I tried to clear cookies on IE8, the cookies stuck around anyway. I was only able to delete them through the developer toolbar. What's going on?"
The consensus theory was that the developer toolbar might be affecting IE8's cookie behavior (IE8 is not known for its robust extensions). More investigation seems in order.
- After upgrading to Bundler 1.0.10/Rubygems 1.5.2, build time on one of our projects shrank by 2 minutes. Hurrah for caching!
- jQuery 1.5 changed its AJAX implementation, causing us to upgrade our mock Jasmine library. Our jQuery 1.5 fork is here.
- One of our clients had a large production issue due to a long-standing bug in Rails with case sensitivity.
Here's the situation: Rails validates_uniqueness_of has a flag called :case_sensitive. This flag defaults to 'true', but can be flipped.
MySQL's default collation is case-insensitive. As a result, queries will, in general, ignore case unless specifically overridden.
So one might imagine that setting :case_sensitive to false would be completely harmless in a standard MySQL application.
One would be wrong. Setting case_sensitive to false changes the query to lowercase the field in question, causing the MySQL database to ignore any indices it may have and turning the validates_uniqueness_of operation from something cheap and quick to something requiring a full table scan.
The open Lighthouse ticket on this issue is: https://rails.lighthouseapp.com/projects/8994/tickets/2503-validates_uniqueness_of-is-horribly-inefficient-in-mysql
Ask for Help
"Any clever ways to catch out of bounds exceptions from Solr?"
This is a follow-up to yesterday's Solr question. After some investigation, it looks like none of the major providers catch out of bound exceptions for very large numbers. Rather than instrumenting every Ruby call with validations to prevent these numbers from getting into Solr, are there any other brilliant ideas?
Follow-up to the help from 5/19/2010's SEO routing question. The latest hotness appears to be FriendlyId (http://github.com/norman/friendly_id) This plugin makes human-friendly slugs and comes with a variety of interesting features, including versioning and slug scoping.
Power RubyMine commands:
Goto File + line #: If you use ctrl-shift-N to go to a file, try typing in a line number after a colon, something like "my_file:30". You'll end up on that line.
Analyze stack trace: This tool lets you paste in an external stack trace, and gives you the ability to browse to all of the pieces of that stack trace.
Ask for Help
"Our site requires crafting URLs in a very particular SEO-friendly way. Rails doesn't seem to give us a good solution for our URLs. Any suggestions?"
One of our clients needs to make their app accept and generate compound URLs that look something like the following:
where author, series, and book are all different domain concepts. Rails RESTful resources don't really support this format. There wasn't an immediate solution, but among the peanut gallery of ideas:
Hyphens are better than slashes in URL crafting, but Rails doesn't separate on slashes at all
to_param solutions - Overriding to_param to something that starts with an integer ID generates URLs that look very slug-like, but can use standard Rails Domain.find mechanisms. For example, a book.to_param might be overridden to be "1-bookname", which works for all purposes. The problem with this solution is that it doesn't quite fit the requirements here, and doesn't cover the compound needs.
Custom routes are always a possibility. You can hook up a special (non-resource) controller that understands flexible browse-y routes like the one above, parses them, and delegates to the more standard resource controllers. The problem here is that you have to figure out a decent delegate pattern and route generation pattern.
In general, URL crafting is a separate art from domain model crafting, and Rails doesn't really cater to this. You will have to design URL-centric code to suit your URL crafting.
"Any ideas on ways to performance test IE7?"
No immediate ideas, but potentially more later.
"When users enter very large search parameters for numbers we get the following exception out of RSolr:"
RSolr::RequestError: Solr Response: For_input_string_11111111111111111111__javalangNumberFormatException
Is there an elegant solution to this aside from validating that all input parameters aren't larger than max int?
When using named scope methods that refer to other named scope methods, you may discover that your SQL has some redundant condition clauses. This is a bug in Rails 2, and has been true for several versions. However, it's a harmless bug - MySQL will understand the extraneous condition clauses just fine, without performance implications.
Mocking Paperclip for tests is a careful art. See our other blog post: Stubbing out Paperclip ImageMagick in Tests
Ask for Help
"Anyone have good strategies for using S3 as a content delivery network for static files?"
Using S3 as a CDN is pretty common. S3 is certainly cheap, and fairly easy to set up. However, latency can be large - S3 isn't built to act as a CDN, so the performance can be lacking. In addition, you need to work out your pathing in your CSS files to find background images correctly. Relative paths are a common technique here.
The performance of files in your public directory is much better. Amazon's Cloudfront is another (expensive) option.
Note observation #4 in this blog article: link
"I can't get ImageMagick to work on Snow Leopard. What gives?"
A brief look online shows several step-by-step instructions. It's unclear what this particular problem is about.
"After upgrading to the latest version of Mocha, any_instance doesn't clean up after itself. Why?"
Mocha's any_instance stubbing is one of the few features that distinguish Mocha from other mock frameworks.
One suggestion was to update rspec as well.
"Heroku 1.5.3 isn't letting me use heroku rake commands. What can I do?"
Upgrade to Heroku 1.5.6.
EY Cloud's slave database functionality is broken right now. It's supposed to be fixed this afternoon.
Amazon restricts you to 20 EBS volumes/EC2 instances per account by default. The trick here is that deleting volumes does not immediately free up space. Volumes stay in a 'Deleting volume' state for an indefinite amount of time before they are truly free, making it hard to allot space for them. Finding these deleting instances can be a real challenge - the AWS API can find them, but not the EY cloud GUI.
If you need to manipulate AWS credentials for EY Cloud, it's fairly easy to go to the machine and find the appropriate file - /etc/.mysql.backups.yml
Ask for Help
"Ever since we upgraded to RSpec 1.2.9, we haven't seen any stack traces. What gives?"
One of our projects lost stack traces as soon as they upgraded to RSpec 1.2.9. Reverting to RSpec 1.2.8 fixed the problem. No other projects have reported the issue yet.
- Working with Rails for several years means that, as Rails advances, our testing/mocking codes get stale. We just discovered that one of our old mocks for representing the 'flash' object no longer works as designed with the current version of Rails.
Ask for Help
"What's a good design for sharing a page cache across multiple servers?"
One of our clients would like to have a distributed server environment share its page cache. At this point, they're relying on GFS to do this.. but this solution appears to have problems with reliability.
Several engineers questioned the necessity for such a thing, but memcached appeared to be the solution of choice.
"Any information on RabbitMQ?"
One of our engineers is beginning to play with RabbitMQ. Anyone who has good comments about this technology, please feel free to chime in.
- load "location" and require "location" do not play nicely with Rail's automatic class reloading.
Rails maintains an internal array of files that it 'knows' about for this purpose. However, load and require bypass this mechanism, and lock files into place.
If you'd like to add a require that will class-reload, use the command 'require_dependency "location"'. This command, added by Rails, will require the file AND add it to ActiveSupport.
Ask for Help
"How do I make attachment_fu use both the file system and S3 as storage backends?"
One of our clients would like to migrate attachments from the file system to S3. They want a clever way to make attachment_fu look in S3 or the filesystem, where new files are in S3 and old files are in the filesystem.
Their current solution, which they're not super happy with, is to monkeypatch the S3 backend by extending it with file system methods. This solution doesn't really seem to work too well, since the two backends share some of the same methods, and calling "extend FileSystemBackend" doesn't give them the freedom to pick and choose their methods. In addition, their patch makes the S3 backend not be an S3 backend any more, which could cause problems for maintenance down the road.
A better solution is to define a new backend object, based on the S3 backend but falling back to file system-style methods. Attachment_fu supports defining custom backend modules; a class using :storage => :my_storage would look for a backend module called Technoweenie::AttachmentFu::Backends::MyStorageBackend.
Attachment_fu still has a design problem. The backend objects are all modules, not classes. As a result, it's not easy to make a new backend descend from one of the existing backends.
- The new version of the bundler gem, version 0.7.2, does not seem to be Ruby 1.8.6 compliant. The gem relies on symbol.to_proc, which is part of Ruby 1.8.7 (and Rails). A native Ruby 1.8.6 without Rails does not support this method.
Issue tracking: http://github.com/wycats/bundler/issues/#issue/134