Rob Olson's blog
Ask for Help
Passenger Memory Bloat
"We found one of our passenger workers is using around 900MB of memory. Has anyone has problem with Passenger memory usage? We are using REE 1.8.7-2009.10."
Solr Master-Slave Replication
"We are interested in adding automatic failover to our Solr slave when the master fails. What are some strategies for doing this?"
Interesting Things
Git Push --force Blocked
If you find your git push being rejected, even when you use git push -f, it's probably because your git server is configured to not allow non fast-forward pushes. You'll need to change the server configuration to allow them.
spec --timeout
Be careful when running rspec with the --timeout option. When the timeout occurs the test process will be interrupted and it will print out a stack trace for wherever it was executing when it was interrupted. This can lead to a lot of confusion if you do not immediately realize it was the result of timing out and instead think that an exception actually occurred at that point.
Ask for Help
Paperclip Slowness
"In one web request we are collecting the file paths of about 250 objects that have attachments via Paperclip. Unfortunately this is really slow and takes a couple seconds to finish. Does anyone have thoughts on how we could speed this up? Is de-normalizing the file path a reasonable solution?"
Moderation of Solr Search Results
"One of our projects uses Solr and acts_as_solr to provide search results to users. One particular result is showing up far higher than we want. What is the best way to use boosting to downgrade the score of an individual result in Solr?"
Interesting Things
Bike to Work Day
May 13th is Bike to Work Day in San Francisco. We are hoping more people take advantage of this to try biking to work for the first time. To mix things up for those that normally bike to work we are planning a Bike to Lunch.
Probably my favorite feature of the Solr full text search engine and the acts_as_solr plugin is a feature called boosting. Boosting is a great tool that gives you the ability to wield some influence over how the results that are returned are going to be ordered. When boosting is applied properly the quality of the search results appears improve dramatically even though the same results are being returned, just in a different order. There are two different kinds of boosting that you need to be aware of: column boosting and document boosting.
Field Boosting
Field or column boosting allows you to specify that if a query matches on a boosted field, give that more weight than usual. In the app I am working on, I added a field boost to the name attribute because I want results that have the query string in the name to appear before those results that have it somewhere in their description or as a tag. Here is an example of how to do a field boost when using acts_as_solr.
acts_as_solr :fields => [{:name => {:boost => 3.0}}, :description, :tags]
Document Boosting
A document boost should be utilized there is a way of quantifying one result as being better than another result, regardless of the query. For example, there are two entries in my database that both have a tag of "twitter client": Twitterific and Twitterfon. In the iPhone App Store, Twitterfon has a higher popularity rating than Twitterific so I want Twitterfon to appear above Twitterific if someone searches for "twitter client" within the app. To specify document boosting based on the app store popularity field I can pass a Proc object to acts_as_solr (rdoc) and return the member field that holds the popularity rating. A great thing about the Proc object is that I can execute any ruby code inside of it that I want. This is useful if the popularity score is not directly stored in the database and must be calculated on the fly.
acts_as_solr :fields => [:name, :description, :tags],
:boost => Proc.new { |item| item.popularity_score.to_f }
Closing
If you are using Solr at all it is important to be aware of what boosting can accomplish. When using multiple boosts, finding the right boost values to produce the best search results is a bit of black magic. I have found that after achieving "pretty good" results the law of diminishing returns comes into play and slows down progress. With a single boost it is much easier because there is only one variable in play.
