Ask for Help
“Does anyone have any recommendations for how to crawl web pages and check certain pages have certain things?”
Pivots suggested two main approaches:
- Mechanize: Mechanize is a library that lets you write Ruby scripts which load pages, fill out forms, click links, and do arbitrarily sophisticated things with the DOM. Its API is very Rubyish and probably works well for most needs.
- Typhoeus: Unlike Mechanize, Typhoeus is designed for high volume fetching of web pages with good support for concurrent requests. It’s not designed to poke around at content on the page so you’ll need to use Nokogiri/LibXML/Hpricot in combination with Typhoeus if you want that level of functionality.