We'll respond shortly.
To make all this process as simple as possible, a variation of the third approach (Rack middleware + Selenium Webdriver + no caching) is available here as a Gem. Drop it in your project, have the dependencies installed, and may the SEO gods bless you!
Luckily, there are several approaches one can take to circumvent the lack of faith from certain crawlers – there’s obviously no “one size fits all” approach, so let’s take a minute to go through three of the most commonly used approaches, highlighting pros and cons of each one of them.
This strategy consists of rendering your app normally, BUT with pieces of “static” content already baked in (usually inside a
<noscript> block). In other words, the client-side templates your app servers to be rendered will have at least some level of server-side logic on them – hopefully, very little.
It’s worth noting that you should NOT rely on rendering just bits of content when serving pages to bots, as some of them state that they expect the full content of the page. It’s also worth pointing that Google also the snapshots it takes when crawling to compose the tiny thumbnails you see on search results, so you want these to be as close to the real thing as possible – which just compounds on the maintenance issues of this approach.
This technique is supported by Google bot alone (with limited support by some other minor search bots – Facebook’s bot works too, for instance) and is explained in detail here.
In short, the process happens as follows:
The search bot detects that there are hash parameters in your URL (eg.:
The search bot then makes a SECOND request to your server, passing a special parameter (
_escaped_fragment_) back. Eg.:
www.example.com/something?_escaped_fragment_=foo=bar – it’s now up to your server-side implementation to return a static HTML representation of the page.
Notice that for pages that don’t have a hash-bang on their URL (eg.: your site root), this also requires that you add a meta tag to your pages, allowing the bot to know that those pages are crawlable.
<meta name="fragment" content="!">
Notice the meta tag above is mandatory if your URLs don’t use hash fragments (which is becoming the norm these days, due to the amazing adoption of html5 across browsers) – analogously, this is probably the only technique of these three that will work if you depend on hash-bang urls on your site (please don’t!).
You still have to figure out a way to handle the
_escaped_fragment_ request arrives. Which leads us directly to the third approach…
The idea is simple: if the requester of your content is a search bot, spawn a SECOND request to your own server, render the page using a headless browser (thankfully, there are many, many options to choose from in Ruby) and return that “pre-compiled” content back to the search bot. Boom.
The biggest advantage of this approach is that you don’t have to duplicate anything: with a single intercepting script, you can render any different page and retrieve them as needed. Another positive point of this approach is that the content search bots will see is exactly what a final user would.
You can implement this rendering in several ways:
before_filteron your controllers checks the user-agent making the request, then fetches the desired content and return it. PROS: all vanilla-Rails approach. CONS: you’re hitting the entire rails stack TWICE.
As for how to store the server-side rendered content (again, from worst to best):
There are a few obvious issues you have to keep in mind when using this approach: