Automate tasks on the web with Ruby and Capybara

Capybara is a tool that Ruby on Rails developers mostly use for testing their web applications. This tool, however, can be also used to automate boring/repeating/long running tasks on the web or scraping information from web sites that were not kind enough to provide API.

Use cases

I am lazy and I do not like work that involves repeating actions that won’t stimulate my brain. This is why I automate as much as I can. Ruby and Capybara came handy to me in a few situations. For example, when I learned that I cannot attach multiple files on a local printing services provider web site, and I had hundreds of files to print, I hacked together a quick script that did the clicking and waiting for me. I also used the technique to extract information and images from web sites that are heavy in JavaScript. I have also a handy script that opens Hangouts link, removes all the unnecessary DOM elements around video element so it does not waste so much space. You can also use it to write neat monitors, that’ll log into your web site as a user, perform some actions and verity that basic functionality on your live production system is not broken. Let’s do some Ruby hacking!

How it works?

Capybara starts a browser. This can be a real browser (Firefox via selenium-webdriver) or headless browser (like PhantomJS). The good thing about it, is that you can use selenium backend while you work on your script, and switch over to Phantomjs when it’s done. This way, you will see what is going on while you do the hacking, and executes without opening any extra windows when you are done and want to use it.

Selenium/Webdriver

Selenium has these nice bindings to control real browsers that are written in Ruby. Capybara uses them to interact with browser. To get started, make sure you do have “firefox” and “java” in your $PATH, otherwise it will not work as expected. Of course you need a Ruby installation too.

You need to install “capybara” gem as dependency. Selenium is a dependency and you don’t even have to require it directly. Here’s a script that checks our main page for tagline text:

$ gem install capybara
require 'capybara'
session = Capybara::Session.new(:selenium)
session.visit "http://www.amberbit.com"

if session.has_content?("Ruby on Rails web development")
  puts "All shiny, captain!"
else
  puts ":( no tagline fonud, possibly something's broken"
  exit(-1)
end
$ ruby check_amberbit.rb
All shiny, captain!

Poltergeist/Phantomjs

Poltergeist is a headless browser driver for Capybara. You will need to install PhantomJS and make sure that ‘phantomjs’ command is in your path first.

Let’s change the script above to use Phantore ‘capybara’

$ gem install poltergeist
session = if ARGV[0] != 'phantomjs'
  Capybara::Session.new(:selenium)
else
  require 'capybara/poltergeist'
  Capybara::Session.new(:poltergeist)
end

session.visit "http://www.amberbit.com"

if session.has_content?("Ruby on Rails web development")
  puts "All shiny, captain!"
else
  puts ":( no tagline fonud, possibly something's broken"
  exit(-1)
end
$ ruby check_amberbit.rb phantomjs
All shiny, captain!

Shiny.

We can also use Capybara’s DSL instead of manually starting the session. Let’s find on which web sites our Ruby development company’s logo is used, using Google Image Search:

require 'cgi'
require 'timeout'
require 'capybara'

class GoogleImagesSearcher
  include Capybara::DSL

  def initialize
    Capybara.default_driver = :selenium
  end

  def find_sites_with_image(image_url)
    urls = []

    link = "http://images.google.com/searchbyimage?image_url=#{CGI.escape(image_url)}&filter=0"

    visit link

    return urls unless page.has_content?("Pages that include matching images")

    while true
      page.all("h3.r a").each do |a|
        urls << a[:href]
      end
      within "#nav" do
        click_link "Next"
      end
    end

  rescue Capybara::ElementNotFound
    return urls.uniq
  end
end

images = GoogleImagesSearcher.new.find_sites_with_image ARGV[0]

puts "Found #{images.count} pages using this image:"
images.each do |img|
  puts img
end
$ ruby search_for_image.rb http://www.amberbit.com/assets/amberbit_logo_big-b1c78bb141a0fe6d092afbadf1edc7b9.png

Found 21 pages using this image:
http://amberbit.com/
http://www.amberbit.com/blog
http://amberbit.com/blog/introduction-to-rack-middleware
http://amberbit.com/blog/geospatial-search-with-ruby-and-sphinx
http://www.amberbit.com/blog/2014/2/4/postgresql-awesomeness-for-rails-developers/
https://www.google.com/search?tbs=simg:CAESXRpbCxCo1NgEGgIICgwLELCMpwgaNAoyCAESDPEH2gbfBvwH3QbbBhogv2oqc6m3Z6eTfIxHGmajz_1yTppo1-MlJqEmXGPUIaXkMCxCOrv4IGgoKCAgBEgTtRfB3DA&tbm=isch&sa=X&ei=SXL7Uoa6G-aC4AShv4HoBg&ved=0CEgQsw4
http://www.prweb.com/releases/2009/05/prweb2465382.htm
http://www.amberbit.com/blog/2013/12/20/similar-images-detection-in-ruby-with-phash/
http://www.amberbit.com/blog/2014/1/20/angularjs-templates-in-ruby-on-rails-assets-pipeline/
http://www.amberbit.com/blog/2014/1/20/torquebox-3-rails-4-zero-downtime-deployment-ubuntu-12-04/
http://www.amberbit.com/blog/2011/12/27/render-views-and-partials-outside-controllers-in-rails-3/
http://www.amberbit.com/blog/2012/2/2/building-small-sites-with-locomotivecms-and-deploying-to-heroku-and-gridfs/
https://plus.google.com/+Amberbit
http://www.amberbit.com/blog/2011/10/24/measuring-complexity-of-ruby-19-code-with-metric_abc/
http://www.amberbit.com/blog/2012/02/02/building-small-sites-with-locomotivecms-and-deploying-to-heroku-and-gridfs/
http://www.amberbit.com/blog/2011/11/28/ruby-flv-pseudostreaming-sinatra-rack-evil/
https://plus.google.com/+Amberbit/about
http://www.amberbit.com/work-for-us
https://plus.google.com/+PrzemyslawWroblewski
https://plus.google.com/+PrzemyslawWroblewski/videos
https://plus.google.com/+PrzemyslawWroblewski/about

Not bad.

Summary

Ruby is a perfect language for hacking those sort for scripts together. You do not have to develop web applications only with Ruby and Rails, it is also perfect scripting language. Powerful regular expressions syntax, super easy syntax and - more than anything else - great tools and libraries (like Capybara), make Ruby excellent choice for automating tasks.

More info

Check out documentation for Capybara

Browse my Github repository with some more examples (and please do make pull request if you have some nice scripts too!).

by Hubert Łępicki, twitter: @hubertlepicki

comments powered by Disqus