This is a blog of AmberBit - a Ruby on Rails web development company. Hire us for your project!

Geospatial search with Ruby and Sphinx

Geolocation and geospatial search are hot topics and a lot of people start building web or mobile applications that use it. Companies like Qype are building up databases of points of interest (POIs), which include shops, restaurants etc. With the upcoming HTML5 standard additions, building such applications will be even easier. From this article you will learn:

  • what are the options that you can use to perform geospatial search
  • what is Sphinx and how does it fit into the picture
  • how can you feed POIs data into Sphinx index
  • how can you perform geospatial search with Sphinx and Ruby
  • PostGIS is probably the most mature implementation. This addition to PostgreSQL database is, however, quite hard to install and configure. You also sometimes don’t want to use SQL database, or don’t need a database at all, when all you need is search index.
  • MySQL is limited compared to PostGIS implementation. It’s getting better but not quite there yet in terms of performance and functionality.
  • MongoDB can perform geospatial search. MongoDB authors didn’t yet implement spherical surfaces support, so currently, database when performing search treats Earth like it was flat. This obviously makes accuracy of geospatial search better near the equator and worse near poles.
  • Local Lucene / Solr is very promissing project. Currently functionality of Local Lucene is being ported directly into Solr, but not much works yet.
  • Sphinx search engine, which is lighter, easy to set up and use and quite speedy choice.

Sphinx

Sphinx is full text search engine, and if you already built some Rails app with full text search, you might have used it already. However, Sphinx also supports geospatial search, and we can obviously perform both types of search in the same query.

Installation is pretty simple. All you have to do is to grab binaries from Sphinx downloads site or install it with package manager of your operating system.

I won’t describe how to connect Sphinx to ActiveRecord-enabled Rails application, but will focus on using it with XML datasource and performing search with Riddle client library.

Getting data into Sphinx

Sphinx provides a few ways to get data into it’s search index. You can either point it to your SQL database (MySQL or PostgreSQL), or feed indexer directly with XML. We will use second approach.

Sphinx xmlpipe2 data format is pretty easy to understand:

File: pois.xml

<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>
 <sphinx:schema>
 <sphinx:field name="name"/>
 <sphinx:attr name="lat" type="float"/>
 <sphinx:attr name="lng" type="float"/>
 </sphinx:schema>

 <sphinx:document id="1">
 <name><![CDATA[AmberBit HQ]]></name>
 <lat>0.927042715037538</lat>
 <lng>0.403538937710426</lng>
 </sphinx:document>
 <sphinx:document id="2">
 <name><![CDATA[Google HQ]]></name>
 <lat>0.656188367092825</lat>
 <lng>-2.13395902872886</lng>
 </sphinx:document>
 <sphinx:document id="3">
 <name><![CDATA[Hewlett-Packard HQ]]></name>
 <lat>0.657782603191474</lat>
 <lng>-2.13395902872886</lng>
 </sphinx:document>
</sphinx:docset> 

As you can see, we are doing two things here: first we define schema for documents and then print out documents in required format, according to the schema. The important thing is that we need to convert latitude and longitude coordinates to radians. To do so, we use simple formula: radians = (degrees * Pi) / 180.

We also need to provide sphinx configuration file:

# sphinx.conf
source dummy
{
 type = xmlpipe2
 xmlpipe_command = bash -c "cat pois.xml"
}

index pois
{
 source = dummy
 path = tmp/places
 docinfo = extern
 mlock = 0
 charset_type = utf-8
 html_strip = 0
}

indexer
{
 mem_limit = 32M
}

searchd
{
 listen = 127.0.0.1:5000
 log = tmp/searchd.log
 query_log = tmp/query.log

 read_timeout = 1
 client_timeout = 1
 max_children = 60
 pid_file = tmp/searchd.pid
 max_matches = 10000000
 seamless_rotate = 1
 preopen_indexes = 0
 unlink_old = 1
 mva_updates_pool = 1M
 max_packet_size = 8M
 max_filters = 256
 max_filter_values = 4096
} 

In this file, we specify “dummy” data source, which will read POIs data from our pois.xml file we created in previous step.

Let’s run the indexer:

$ indexer -c sphinx.conf --rotate --all
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file 'sphinx.conf'...
indexing index 'pois'...
collected 3 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 3 docs, 38 bytes
total 0.002 sec, 17025 bytes/sec, 1344.08 docs/sec
total 2 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 7 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg


..and start Sphinx search engine:

$ searchd -c sphinx.conf 
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file 'sphinx.conf'...
listening on 127.0.0.1:5000 

As you can see, we have indexed collection of 3 documents, these are locations of AmberBit Office, Google HQ and HP HQ. We’re in good company ;). Now, let’s try search.

Riddle is quite easy to use library itself, but in a real app you probably want to write a wrapper class to manage conditions and filters and generate final query. We will just use simpliest code to find IT companies near your location. Our script, will ask for user’s location with latitude and longitude and output a list of companies ordered by distance from that given location.

# We require riddle library:

require 'riddle'
require 'riddle/0.9.9' 

# Define company names, because Sphinx will return us only IDs.:

companies = %w(AmberBit Google HP) 

# Let's connect to Sphinx and use most robust search algorighms:

client = Riddle::Client.new "localhost", 5000
client.match_mode = :extended
client.sort_mode = :extended 

# To return records in order of distande, we use this setting:

client.sort_by = "@geodist ASC" 

# We need to ask user for his location and convert it from latitude and longitude coordinates to radians:

puts "What's your latitude: "
lat = gets.to_f * Math::PI / 180.0
puts "What's your longitude: "
lng = gets.to_f * Math::PI / 180.0 

# And we perform search and print companies in desired order to console:

puts "Top IT companies near you, ordered by distance: "

client.set_anchor "lat", lat, "lng", lng
client.query("", "pois")[:matches].each do |record|
  puts companies[record[:index].to_i - 1]
end 

Where to go from here?

Riddle documentation is great place to look for help. You can find out how to perform complex searches and also how to re-order or filter returned records by their attributes. You can also specify radius option in metres, or retrieve distance from given POI. Also, check out this gist for sources of this example.

by Hubert Łępicki, twitter: @hubertlepicki

Do you need skilled professionals to help you build Rails applications? Hire us for your project!
comments powered by Disqus

Want to get in touch? Drop us a line!