We'll respond shortly.
A surprising amount of simple can get an application over a number of speed bumps. We’re going to look up and down the whole application stack and use stories to show what simple things people have done to build a sustainable system without re-architecting.
One of my favourite stories from a colleague was when they consulted for a previous company. The application was struggling to scale and the answer from the development team was vertical scaling. Buying bigger servers, or putting in more RAM would buy more time for the application to keep ticking over until next time. My diligent colleague when joining this team spent some time digging around trying to find bottlenecks in the application. They discovered that there was not a single index on the database and that pretty much every transaction was doing a full table scan to get the result. The previous answer to get more RAM was so that the entire database could fit within it, in an effort to increase the performance.
OK, so no indexes at all seems like an extreme case, but it can be easy to skip an index or over index. Small companies don’t tend to have database administrators and if a test isn’t failing and no-one is complaining in production it’s easy to see that this area can be skipped without realising. There are some tools out there like ‘Rails Best Practices’ which can help identify where indexes are missing. Some simple changes and checks and drastically improve performance and delay that re-architecting we’re all afraid (or excited depending who you are) to do.
A different colleague started their previous job just as a major rewrite was nearing its completion. There was some stress as a move from ColdFusion to Ruby was not paying off the dividends the team had sold to the product owners; the application performance wasn’t good enough to go live with. Tests were green and no bugs were reported so no light was shed on the troubled areas of the application. By adding instrumentation using a tool such as NewRelic, slow processes and queries were found and refactored. Over the course of a week, working on and off the problems the performance was brought up to an acceptable level where the application could go live.
In a way, this was not a terrible position to be in. Performance is one of those things that it isn’t a problem until it is, and just before pushing a release out of the door seems like an ideal time to do some performance testing. The tool NewRelic itself can be used locally for this and can run as a hosted service against real production requests. On teams I’ve been involved in, I like to go through the slowest requests on and identify and fix any problem areas as a Friday afternoon chore. Instrumentation doesn’t have to come through a tool like NewRelic, it can be looking at logs of web request times and slow database queries, but taking some time to fix these can make some significant improvements.
There are a number of caching techniques I’ve heard about and seen. Some have been effective, others have created more problems they had set out to resolve. First a cautionary tale.
Like scaling a database by putting the entire database into memory, caching can obscure underlying issues. On a recent project caching mechanisms were scattered through the code, where to cache, where to invalidate. This meant that when caching something, either through the application or even changing code, we couldn’t be sure if the caches would be affected. In one instance, our production environment was showing some strange behaviour that we could not replicate. After digging around, we found that the caches were being invalidated by the last update of the object being cached, but we had changed the template within the cache, leaving some objects being presented with the old template, and others with the newer one.
Phil Karlton’s quote “there are only two hard things in Computer Science: cache invalidation and naming things” springs to mind. The lesson here is caching can increase performance significantly but can hide issues. By caching the result of slow running code, are you hiding code that could be improved?
Rails 4 has tried to solve some of these issues by suggesting that applications generally only cache the results of rendered views. It also takes away the cache invalidation part by using the objects name, last updated time and the MD5 of the template being rendered as part of the keys. Using a caching system which automatically drops the least used cached entries should be sufficient to deal with this.
Sometimes the responsibility of caching can be handed to another part of an applications’ infrastructure. This is exactly what we had done on some projects when I was working on the Guardian. The applications we were writing depended heavily on external services for data and these services, being good citizens of the web, had returned appropriate cache headers in the HTTP responses. For this given application, we didn’t want to model the data coming back, we merely wanted to transform the response and place in a HTML template. Using a HTTP caching proxy like Squid installed on the same server as the application making outbound calls meant we could rely on Squid to do the caching. There was the HTTP request out, but as this never left the server, it was a small hit.
Donald Knuth said that “premature optimization is the root of all evil” but there are a series of small optimisations that can be worthwhile. When making a request to a web server or external service, an application is either changing or reading state, sometimes both at the same time. When reading back state at the same time in the same request as changing it, if an application performs only the work necessary for the response and puts the rest of the work in a background job of some implementation, the application can respond more quickly. RFC 2616 HTTP response codes 201 and 202 were made for this sort of operation.
The response code 201 is useful for letting the client know a resource has been created. In one of my first projects in telecommunications, we would send a 201 to indicate a phone call had been started. The client would request a call be made between two phone numbers, but we didn’t want the request to be tied up during the actual phone call and maybe returning only when the call was finished. A 201 with a location header for the client to get the status of a call was an idea choice. A resource (the call) had been created and had an address which the client could use.
A more web based example could be signing users up to a new website. If there are welcome emails to be sent and mailing lists to be joined it’s not in the applications interest to make the user wait while SMTP gateways respond and third party services give their OK to a request. If this is spun out into a thread or background job the application can return to the user and allow them to carry on. The less essential processes will happen, but the delay that occurs is acceptable and the user gets a more performant experience.
Steve Saunders and the Performance Group at Yahoo have done great work with tools like YSlow and highlighting the issues with perceived performance. When discussing this issues, Saunders said “Optimize front-end performance first, that’s where 80% or more of the end-user response time is spent”. So much of the advice YSlow suggests is so simple to implement that I recommend just setting it up at the beginning of the project and checking the advice every now and then. Some of the suggestions are one off and can be done at the start of a project, others require some ongoing checking, but these techniques will give a faster experience for the end users.
The main Guardian website with an empty cache downloaded 1.06mb with 212 requests. It took 2.5 seconds for the DOM to download it’s content and 4.3 seconds before ‘onload’ was fired. For the mobile version of the website, it downloaded 25k of data with 77 requests with the DOM content load event happening after 260ms and ‘onload’ being fired after 950ms. That’s a pretty big difference and sometimes that effort is warranted.
Another technique used by the new Guardian mobile website is conditional loading of content. Say on a sports team page, after the main content has been downloaded and the reader is celebrating or sobbing depending on their team news, asynchronous calls are made for extra content. In this case, fixture/schedule information, results and related content. This information isn’t required for the reader to achieve the main purpose of their visit to the page, but it might help keep them there. Using conditional loading, the page loads quickly, the reader can start reading the article without having to wait longer for a bigger DOM or synchronous loading of the extra content.
One important I’ve personally learnt is when asked to build something in ‘real-time’, it to ask the product owner to define real-time. Real-time can mean something very different to a developer than to a product owner or user. For the longest time I equated real-time with ‘immediately’. When I discovered the actual requirement was ‘within a reasonable time but not necessarily immediately’, this changed many assumptions I’d made about the application.
A company that perform complex calculations to give a user a view to potential savings when switching to a different service provider, the product team had requested the price update in real-time. The developers knew that crunching the numbers can actually take a non-trivial amount of time, so they put the processing in a background job and used the HTTP meta-refresh element to refresh the page and ultimately hold the user in place, followed by a redirect to a page with the crunched numbers.
This may sound crude, but when considered against the time to build a better, more ‘real-time’ experience, it was an easy choice. Consider too that this pattern is often employed during a checkout experience, especially when booking something like a flight. After your credit card information is taken, you’re taking to a holding page until the payment is confirmed and a seat reserved.
Meanwhile, in another part of town, a team was content with itself after building a ‘real-time’ application that displayed government bond finances and gave 10 second updates. They relied on a message queue publish-subscribe architecture to get the very latest information to the users. When they went out to watch the application in the wild, they found that those people watching these bonds would take a look every few minutes in-between doing other tasks. It turns out that working on bonds is not the same as working on the stock exchange. The application had been over-engineered for a problem that didn’t exist. After realising this, they could correct their course and remove complexities from the application while remaining ‘near-time’ for their users.
We’ve looked at some simple changes that can bring big benefits and how user perception of performance is probably more important that server side performance. We’ve also seen how asking the right questions can lead to simplicity in itself.
Step two to a successful business: know the product, embrace the tools that show weaknesses and learn where best to invest your time.