Rob Olson's blog
I occasionally run into a situation with Git where I have modified a file but have no intention of committing the change to the repository. This most often happens with computer specific configuration files. My config/database.yml in Rails projects can spend a lot of time in a dirty state if one of my dev machines has a root mysql password and the other does not.
Git will ignore untracked files that are added to .gitignore files or the .git/info/exclude file. For files that git knows about and is already tracking there is a obscure way to tell git to ignore changes to those files.
git update-index --assume-unchanged config/database.yml
When you have made changes to the file that you want to commit you'll need to execute the inverse (--no-assume-unchanged) for git to acknowledge that the file has changed.
References:
Many people use the ultra popular Paperclip library to handle file attachments in Rails. Unfortunately the Paperclip documentation does not cover how to stub out calls to ImageMagick in your test suite. Without the proper stubs in place a test suite that uses Paperclip will take much, much longer to run.
In the grease your suite presentation by Nick Gauthier it has a slide titled Quickerclip that describes what needs to be done to spend up Paperclip in tests, basically you need to keep it from shelling out to ImageMagick. Alas, the presentation does include code for how to achieve Quickerclip.
As the presentation shows Paperclip.run is the method that needs to be changed. The first parameter passed to Paperclip.run is the ImageMagick command be executed. Paperclip uses the identify and convert commands. The identify command is used to determine the dimensions of an image. The convert command is the really heavy one that does image manipulation and thumbnail generation. Here is a redefinition of Paperclip.run with sensible behavior for tests.
module Paperclip
def self.run cmd, params = "", expected_outcodes = 0
case cmd
when "identify"
return "100x100"
when "convert"
return
else
super
end
end
end
class Paperclip::Attachment
def post_process
end
end
Redefining post_process in Paperclip::Attachment is an optional additional optimization. In Paperclip, post_process eventually calls Paperclip.run("convert") and by short-circuiting the method earlier in the chain we save a few cycles.
Ask for Help
Passenger Memory Bloat
"We found one of our passenger workers is using around 900MB of memory. Has anyone has problem with Passenger memory usage? We are using REE 1.8.7-2009.10."
Solr Master-Slave Replication
"We are interested in adding automatic failover to our Solr slave when the master fails. What are some strategies for doing this?"
Interesting Things
Git Push --force Blocked
If you find your git push being rejected, even when you use git push -f, it's probably because your git server is configured to not allow non fast-forward pushes. You'll need to change the server configuration to allow them.
spec --timeout
Be careful when running rspec with the --timeout option. When the timeout occurs the test process will be interrupted and it will print out a stack trace for wherever it was executing when it was interrupted. This can lead to a lot of confusion if you do not immediately realize it was the result of timing out and instead think that an exception actually occurred at that point.
Ask for Help
Paperclip Slowness
"In one web request we are collecting the file paths of about 250 objects that have attachments via Paperclip. Unfortunately this is really slow and takes a couple seconds to finish. Does anyone have thoughts on how we could speed this up? Is de-normalizing the file path a reasonable solution?"
Moderation of Solr Search Results
"One of our projects uses Solr and acts_as_solr to provide search results to users. One particular result is showing up far higher than we want. What is the best way to use boosting to downgrade the score of an individual result in Solr?"
Interesting Things
Bike to Work Day
May 13th is Bike to Work Day in San Francisco. We are hoping more people take advantage of this to try biking to work for the first time. To mix things up for those that normally bike to work we are planning a Bike to Lunch.
Happy Star Wars Day!
Ask for Help
"While Apache is serving a large static file it becomes slow to serve other requests. We think this may be an Apache configuration issue. Any suggestions?"
Interesting Things
Enable-pthreads Headache
In Ruby 1.8 the --enable-pthreads build option will dramatically slow down your program, as documented here and here. Do not enable it unless you need it, which is unlikely.
Don't need a primary key? Think again
You might think that you can get away with not having a primary key on a table and just rely on a database index for lookups. This is very dangerous because MySQL will no longer be able to store the records on disk sorted by primary key. Not having this ordering becomes an issue if you want to operate on the records in batches. For instance, normally you ask for all the records having an id between 0 and 1000. Because they are stored by primary key these 1000 records will be in a group on the disk and the lookup will be quick. When doing the same thing with an index instead of a primary key, and a primary key does not exist, the records will be in scattered locations on the disk and the hard drive will have to do many seeks to access them all. The time to do query will be orders of magnitude greater.
This morning Yehuda Katz posted a response to my previous post, Technique for extending a method from a module, showing how a more modular organization of the Person class would allow for a solution that does not require a crazy meta programming hack. The idea is that by extracting the method we want to decorate into an ancestor class, Ruby makes it a lot easier to do what we want.
Previously I was aware that there were other ways I could structure the host class to make the module's job easier but I did not try that because but I was writing the code with the knowledge that I would only be in control of one side of the equation, the module. The host class was going to be written by the end-user of the Rubygem the module was to be packaged in. Since I did not want to try dictate how the end-user structured the host class I ended up adding a lot of complexity to the module. The goal became how to write the module in a such a way that the class would "just work" upon including Teacher without requiring any additional steps to be taken. Asking the user to create an AbstractPerson class that contained their initialize method and then creating a subclass felt like an obtrusive request to make through a README that would ultimate negatively impact the user's experience with the library.
Shortly after I put that blog post up I got this tweet from Josh Susser:
Update: Read the follow-up post Second thoughts on initializing modules
I was recently presented the problem of appending to the initialize method from a module that was being included. To do this it would need to override the class's initialize method with my own but keep the functionality of the original initialize method.
Whenever I need to do something in Ruby that I know will require some experimentation I like to move outside of my application and reproduce the problem in a simple way. For this problem I created a Person class that mixes in a Teacher module.
module Teacher
def initialize
puts "initializing teacher"
end
end
class Person
include Teacher
def initialize
puts "initializing person"
end
end
The goal is to get the following output when a Person object is created:
> Person.new
initializing teacher
initializing person
The basic program fails as expected; Teacher.new prints "initializing person" because Person's initialize is trumping Teacher's. Our immediate goal is to replace Person's initialize with Teacher's but in a way that preserves the original initialize method. By using alias_method we can create a copy of the original initialize method that we can call later.
Probably my favorite feature of the Solr full text search engine and the acts_as_solr plugin is a feature called boosting. Boosting is a great tool that gives you the ability to wield some influence over how the results that are returned are going to be ordered. When boosting is applied properly the quality of the search results appears improve dramatically even though the same results are being returned, just in a different order. There are two different kinds of boosting that you need to be aware of: column boosting and document boosting.
Field Boosting
Field or column boosting allows you to specify that if a query matches on a boosted field, give that more weight than usual. In the app I am working on, I added a field boost to the name attribute because I want results that have the query string in the name to appear before those results that have it somewhere in their description or as a tag. Here is an example of how to do a field boost when using acts_as_solr.
acts_as_solr :fields => [{:name => {:boost => 3.0}}, :description, :tags]
Document Boosting
A document boost should be utilized there is a way of quantifying one result as being better than another result, regardless of the query. For example, there are two entries in my database that both have a tag of "twitter client": Twitterific and Twitterfon. In the iPhone App Store, Twitterfon has a higher popularity rating than Twitterific so I want Twitterfon to appear above Twitterific if someone searches for "twitter client" within the app. To specify document boosting based on the app store popularity field I can pass a Proc object to acts_as_solr (rdoc) and return the member field that holds the popularity rating. A great thing about the Proc object is that I can execute any ruby code inside of it that I want. This is useful if the popularity score is not directly stored in the database and must be calculated on the fly.
acts_as_solr :fields => [:name, :description, :tags],
:boost => Proc.new { |item| item.popularity_score.to_f }
Closing
If you are using Solr at all it is important to be aware of what boosting can accomplish. When using multiple boosts, finding the right boost values to produce the best search results is a bit of black magic. I have found that after achieving "pretty good" results the law of diminishing returns comes into play and slows down progress. With a single boost it is much easier because there is only one variable in play.
During the process of upgrading a project from Rails 2.2.2 to Rails 2.3.2 several of our tests were breaking with the error:
Missing host to link to! Please provide :host parameter or set default_url_options[:host]
This error was most commonly occurring in model specs where we had mixed in ActionController::UrlWriter in order to get access the named routes (e.g. invitation_path) inside of the model class. I believe this change in a behavior is the result of this patch to Rails but I am not certain. Interestingly the code falls apart in the tests but it still works fine within the browser.
With the assistance of Adam Milligan we were able to find an acceptable way to handle setting the default_url_options in the test environment.
# app/models/invitation.rb
class Invitation < ActiveRecord::Base
include ActionController::UrlWriter
...
end
# spec/models/invitation_spec.rb
describe "Invitation" do
before(:all) do
Invitation.default_url_options[:host] = 'localhost'
end
after(:all) do
Invitation.default_url_options[:host] = nil
end
...
end
As I wrap up I want take a moment a properly shame myself for generating urls in the model. There is definitely a good argument that you should not be using named_routes in your models and I am eager to agree. Rails makes it hard to do for a reason and if you find yourself ever explicitly including UrlWriter take a step back and think the problem over. You may find yourself needlessly going down the wrong path and a different approach is in order.
One property of the Ruby object model and object oriented programming in general is that a subclass of an object automatically inherits all of the methods of its superclass. Classes can further expand the number of methods available by mixing in a Module, or several.
Because of mixins and subclassing even a class that has declared just a few methods can actually have hundreds of methods on it. In Ruby, all classes subclass Object by default which declares a hefty 45 methods, guaranteeing you to have at least that many. Out of the box in 1.8.7, a Ruby String object has 176 instance methods. If you are programming on top of the Rails framework, ActiveSupport adds 98 methods bringing the total to 274!
On numerous occasions I have needed to see what methods are available on an object I am working with I will type the following in irb.
myobject.methods - Object.instance_methods
This prints out a large array of instance methods with the methods inherited from Object removed from the list. This is useful but what if the object I am working with mixed in several modules and I am left with a list of over a hundred methods? It would be great to view which Class or Module each method came from. Well, actually there's a gem for that.™
Looksee
Looksee is a new gem by George Ogata that examines the method lookup path of any object. To use it add require 'looksee/shortcuts' to your ~/.irbrc. This will add a lp ("lookup path") method to your irb environment. When passed an object lp prints out a colored display showing where each of an object's methods lives.
