Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
Standup 4/5/2010: Spidering Web Pages

Ask for Help

“Does anyone have any recommendations for how to crawl web pages and check certain pages have certain things?”

Pivots suggested two main approaches:

  • Mechanize: Mechanize is a library that lets you write Ruby scripts which load pages, fill out forms, click links, and do arbitrarily sophisticated things with the DOM. Its API is very Rubyish and probably works well for most needs.
  • Typhoeus: Unlike Mechanize, Typhoeus is designed for high volume fetching of web pages with good support for concurrent requests. It’s not designed to poke around at content on the page so you’ll need to use Nokogiri/LibXML/Hpricot in combination with Typhoeus if you want that level of functionality.

Comments
Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *