Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker
Adam Milligan

Solr demystified

Adam Milligan
Sunday, September 28, 2008

As I mentioned in this post, we’ve decided to set aside some of our weekly brown bags to spread around some knowledge on different technologies via a relatively informal presentation/discussion format. This past week we talked a bit about Solr.

This post covers much of what we discussed, ranging from the introductory to the somewhat arcane. If you’re a seasoned Solr user, this may not have much for you. But, you never know.

Solr: wtf?

For people who have never used Solr (me, for instance), I’ll start with the obvious question: what is it? At its most basic, Solr simply provides a web interface to the Lucene search engine. It’s written in Java and runs as a servlet inside a servlet container such as Tomcat or Jetty. The example application included in the distribution package includes Jetty, so you can get up and running relatively easily. You use Solr by sending your requests in the form of XML over HTTP; the responses also contain XML.

For those of you looking for sense in the world, I’m sorry: Solr isn’t an acronym, and to our knowledge doesn’t stand for anything in particular. It’s just a name with a vowel shortage.

You can find the home page for Solr here, a wiki for discussion of all things Solr here, and tutorial to get you started here. Finally, you can download the distribution (the current release version is 1.3.0) here.

But, why?

Let’s say you’re working on a site to help people find a physician. Users of this site might care about location, age, or gender of each physician. Your site might include how many pending malpractice suits each physician has, how patients have rated their bedside manner, or what magazines they stock in their waiting rooms. As a good citizen of the web community, you want to provide your users the ability to search for any combination of these criteria. You have all the information sitting in your database, so you should be able to search it, right?

Sure, no problem, but in order to ensure quick response times you’ll want to add indices on the columns in your physicians table. But, which indices to add? If your table has columns for age, gender, and rating, and you want to allow users to search on any combination of fields, then you need three indices to match all searches:

  1. age, gender, rating
  2. rating, age, gender
  3. gender, rating, age

Keep in mind that indices match from left to right, and will only match on columns included in the query. Thus, if you allow searching on another column you’ll need to have eight indices:

  1. age, gender, rating, mortality rate
  2. age, gender, mortality rate, rating
  3. age, rating, mortality rate, gender
  4. age, mortality rate, gender, rating
  5. gender, rating, mortality rate, age
  6. gender, mortality rate, age, rating
  7. rating, mortality rate, age, gender
  8. mortality rate, age, gender rating

So, we quickly discover that we need n! / (n – 1) indices to search n columns, and this doesn’t take into account range queries. This could quickly get out of hand; Solr to the rescue.

Solr will build your indices for you based on the columns you tell it you want to search, it will keep these indices up to date as you add or change records, and it will do it fast.

More accurately, Lucene will do these things for you. However, Solr allows you to put Lucene on its own server that your application talks to via HTTP. This way all of your production servers can share the same Solr server, keeping searches consistent for all instances of your application.

Next up, performance and such.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Adam Milligan

The resurrection of the free lunch

Adam Milligan
Sunday, September 28, 2008

At Pivotal we generally have a talk each Wednesday that regards some topic of professional interest. These “brown bag” talks generally involve something to do with interesting projects or new technologies that we might want to use; sometimes we invite speakers from outside Pivotal, sometimes Pivots do all the talking. We nearly always invite folks from outside Pivotal, and we always feed everyone lunch.

Unfortunately, keeping up a schedule of interesting talks takes a not insignificant effort, and most of us here have one or two other things taking up our time. So, sometimes a week rolls around without a scheduled speaker. Or, on rare occasions, a speaker bails or has to reschedule. When this happens we have to cancel our talk for the week, everyone misses a learning opportunity, and no one gets a free lunch.

So, we decided to fill in the gaps with less formal, but still informative, discussions of various topics. We plan to focus on technologies that we use on some of our projects, that some Pivots may know a lot about while others may know less. In short, a chance to spread around some knowledge.

This past week we had our first of these discussions on Solr. I took some notes and will publish some of what we discussed in subsequent posts. Lunch was buffet-style Burmese. Delicious.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Topics

  • agile (781)
  • rails (113)
  • testing (88)
  • ruby (83)
  • ruby on rails (70)
  • jobs (62)
  • javascript (55)
  • techtalk (44)
  • rspec (38)
  • ironblogger (32)
  • productivity (30)
  • activerecord (29)
  • gogaruco (29)
  • git (28)
  • nyc (27)
  • rubymine (26)
  • bloggerdome (23)
  • mobile (22)
  • process (21)
  • pivotal tracker (21)
  • cucumber (20)
  • design (19)
  • jasmine (19)
  • ios (18)
  • webos (17)
  • objective-c (17)
  • android (16)
  • tracker ecosystem (16)
  • palm (16)
  • "soft" ware (16)
  • fun (15)
  • ci (15)
  • cedar (15)
  • rails3 (14)
  • performance (14)
  • bdd (14)
  • gem (13)
  • css (13)
  • tdd (13)
  • selenium (12)
  • goruco (12)
  • bundler (12)
  • meetup (11)
  • railsconf (11)
  • nyc-standup (11)
  • capybara (10)
  • mac (10)
  • mojo (10)
  • chef (10)
  • api (10)
Subscribe to brownbags Feed
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >