Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

  • Blog Navigation
Standup 4/5/2010: Spidering Web Pages

Ask for Help

“Does anyone have any recommendations for how to crawl web pages and check certain pages have certain things?”

Pivots suggested two main approaches:

  • Mechanize: Mechanize is a library that lets you write Ruby scripts which load pages, fill out forms, click links, and do arbitrarily sophisticated things with the DOM. Its API is very Rubyish and probably works well for most needs.
  • Typhoeus: Unlike Mechanize, Typhoeus is designed for high volume fetching of web pages with good support for concurrent requests. It’s not designed to poke around at content on the page so you’ll need to use Nokogiri/LibXML/Hpricot in combination with Typhoeus if you want that level of functionality.
Share This