Joseph Palermo's blog



Joseph PalermoJoseph Palermo
RabbitMQ, AMQP gem, and EventMachine
edit Posted by Joseph Palermo on Friday July 31, 2009 at 10:42AM

I recently had a chance to work with RabbitMQ and the AMQP gem.

The first problem we ran into with subscribing to a queue was if we forcefully kill a client, a large number of messages would disappear from the queue (far more than we had processed on the client). I'll get to the "why" of this in a minute, but the solution was simply to turn on acknowledgments (or acks) for our messages which was something we knew we wanted to do anyway.

So after we turned on acks and started processing items from the queue we noticed that the number of items in the queue was not actually going down even though we were correctly sending the ack when we were done processing.

The AMQP gem uses EventMachine under the covers to manage the connections to the RabbitMQ server. It turned out that when you subscribe to a queue, it is a one time thing. You establish a connection and that is it. The server then sends you messages from that queue over the socket. RabbitMQ pre-fetches messages for you, meaning it crams a bunch of data over the socket and doesn't wait for you to ask for more, it notices that you've read data off the socket, and pushes more to you.

The repercussions of this in the EventMachine world is a major blockage of data. EventMachine has an internal loop where it goes through registered sockets, and processes all the data off any sockets that are ready to be read from before it continues its loop. The server was basically keeping the socket full, so EventMachine would only complete a internal loop after we processed a full socket of data, and then it would get blocked on the next loop since the server has already filled the socket up again.

This means that all of our acks were sitting inside of EventMachine waiting for the loop to continue so they could be sent out. It also explains why when we weren't using acks we were losing messages. The server had sent them to our socket and they were waiting to be processed and by killing the process we lost that data.

My first reaction was that the AMQP gem should be pulling all the data off the socket and caching it locally, then processing a single record off of that cache every time the EventMachine loop ran. This of course won't work because as soon as we empty the socket, RabbitMQ is just going to fill it up again (until we have all the messages from the queue in our local cache).

So the solution? RabbitMQ 1.6 has an option to set a pre-fetch limit. So we simply set the pre-fetch limit to 1, and our EventMachine loop runs nice and fast now. You'll want to tweak your pre-fetch limit depending on how long it takes to process each message. If you can churn through a hundred messages a second, you probably won't even notice this problem and the prefetching will help you, but if it takes you a few seconds (or minutes) per message, you'll wonder why things aren't popped off the queue for several minutes (or hours).

Joseph PalermoJoseph Palermo
How do you use named_scopes?
edit Posted by Joseph Palermo on Wednesday July 29, 2009 at 09:51AM

You may have heard of some problems we've had with changes to named_scope in Rails 2.3.

The basic change is that when chaining named scopes together, their scoping does not apply only to the finder class, but also to any lambdas evaluated farther along the named scope chain.

So given a User class with a friends association (pointing at other Users) with the following named_scopes:

named_scope :named_bob, {
  :conditions => {:name => 'bob'}
}

named_scope :second_degree_friends, lambda{|user|
  user_friends = user.friends
  second_degree_friend_ids = user_friends.collect{|u| u.friend_ids}
  {
    :conditions => {:id => second_degree_friend_ids.flatten}
  }
}

These two calls are no longer the same.

User.second_degree_friends(user_sam).named_bob

User.named_bob.second_degree_friends(user_sam)

The first call does what we expect (giving us all of user_sam's second degree friends who are named bob. But the second call actually gives us something different. Because the named_bob scope comes first in the chain, when it evaluates the lambda for second_degree_friends, it applies it in the scope of all previous named scopes. So our call to the user.friends association is actually scoped with the additional condition of :name => 'bob', which is probably not what we want in this case.

You can see the lighthouse ticket where I claim this should not be the default behavior of named scopes. But my question right now is, "How do you use named scopes?"

I tend to use them in a composable manner, especially in search objects. I take a base finder such as User or User.friends and then I pass it down to a add_conditions or add_sort method. Inside those methods, they add on any other named scopes they need to and return the new finder object. So inside of this chain, you never really know what finders have been applied already, but in the past, you didn't need to know because the same named_scope with the same parameters always gave you the same conditions.

Often there will be one search object that inherits from another, say for instance LocationUserSearch < UserSearch that adds geo targeted searching on top of UserSearch. In these cases, we can just create our own add_conditions method, call super and tack on any new conditions that we need. Since conditions and joins are merged in scopes, this normally works out great.

Do you use named scopes in a composable way such as this? Or do you only combine them in a known way and might benefit from having the accumulated scope applied to the lambda?

Feel free to add your comments to the lighthouse ticket too.

Joseph PalermoJoseph Palermo
Standup 03/06/2009
edit Posted by Joseph Palermo on Saturday March 07, 2009 at 07:42AM

Interesting Things

  • If you're using ack in project for textmate, be sure to edit your .ackrc file to include any non-standard file types you're using.

  • A project had a dramatic speed up in their test suite by mocking out ActionMailer in tests. Something to consider if your tests cause a lot of email side effects.

  • field_named wasn't working for us when using Webrat to drive Selenium. Our fork with the simple fix can be found here.

Joseph PalermoJoseph Palermo
Standup 03/05/2009
edit Posted by Joseph Palermo on Friday March 06, 2009 at 12:09AM

Interesting Things

  • Giving your fake acts_as_fu model the same name as an actual model you have can lead to very obscure test failures. For those not in the know, acts_as_fu gives you the ability to test your model extensions directly by creating a fake model in your tests and mixing your extensions into it.

  • A few people have been using Paperclip to manage their attachments and have found it easier to integrate than Attachment_fu.

Joseph PalermoJoseph Palermo
Standup 03/04/2009
edit Posted by Joseph Palermo on Wednesday March 04, 2009 at 10:20PM

Interesting Things

Integer("008") != "008".to_i
  • The to_i method is what you want, unless you want exceptions or octal numbers.

  • Somebody needed help constructing a named_scope where they could reference the count of an associated has_many association. There was some grumbling about using :joins and :group (and if you do this, be sure not to call count on the scope itself without also doing a :select => 'DISTINCT primary_key'). The winning solution was to just put a counter_cache on the association and use the denormalized column instead.

Joseph PalermoJoseph Palermo
Standup 03/03/2009
edit Posted by Joseph Palermo on Wednesday March 04, 2009 at 12:20AM

Interesting Things

  • Somebody was seeing mongrels hang when using an older copy of the S3 gem. It turned out the older version had the option for persistent connections defaulting to true. Setting :persistent => false or using a newer version that has false as the default fixed their problem

  • One of our sites was seeing a unbalanced distribution of requests despite the fact that the load balancer was evenly distributing connections. One host typically had 2x the traffic of the others, and it would switch every few hours to be a different host. It turned out to be the Google crawler, which uses a keepalive, getting stuck on a single host and making a lot of requests. The load balancer is only able to balance TCP connections, which Google is only using a single one of. The likely solution will be haproxy or something similar in front of the hosts to better distribute traffic.

Joseph PalermoJoseph Palermo
Standup 03/02/2009
edit Posted by Joseph Palermo on Monday March 02, 2009 at 07:39PM

Interesting Things

  • Tired of refreshing your page to view changes in your CSS? Erik Hanson has a bookmarklet you can use without refreshing your page. See it on his blog.

  • There is a beta version of the Selenium 1.1.15 gem that includes the latest selenium-server.jar (1.0 beta-2). This fixes some problems with using Firefox 3. You can get the gem here, and you can read the details here.

We had an odd bug last week where we ended up with different results after we had eager loaded an association vs loaded directly.

There are apparently two issues with :has_one :through, one of which also applies to :has_many :through.

So given:

class Person
  has_many :friendships
  has_one :best_friend, :through => :friendships, :conditions => "friendships.best = 1"
end

If you do a Person.find(:all, :include => :best_friend), the best_friend that gets preloaded is not necessarily one that has a "friendship.best = 1"

This is due to a bug in the association preloading code that doesn't pass down the finder options, so any :conditions or :order are completely ignored. This problem is easy to fix, just a one line change, but it then exposes another problem.

This problem applies to both :has_many :through and :has_one :through associations. The problem is that the :through association is loaded separately from the :has_one or :has_many association. So it first loads :friendships, and then when it tries to load :best_friend, it doesn't have the table it needs for the :conditions and explodes.

Our current work around is basically putting the conditions on the :through association, although sometimes you need to create a new association just for that which is certainly not idea, especially if you plan on accessing the :through model after it has been loaded.

The way to fix it in Rails is unfortunately a rewrite of how the :through associations are eager loaded.

You can see the lighthouse ticket here

There is also a couple of messages on the Rails Core group

Joseph PalermoJoseph Palermo
Standup 08/11/2008
edit Posted by Joseph Palermo on Monday August 11, 2008 at 04:33PM

Interesting Things

  • If you have a "target" method on your model, things will get a bit weird when you try to access this method through an association. Since associations have their own "target" method, you actually need to call assocation.target.target, or probably better, don't create methods called "target".
  • Since Time.now always returns the time for the local timezone, if you use it in your fixtures, but then have your app running under a different time zone, the times in your fixtures will be incorrect. Use the active support helpers such as 0.days.ago instead, or if you have a timezone configured in your environment, you can use Time.zone.now

Ask for Help

"How can I test the route helpers in RSpec? If I'm passing a complex set of options to a helper I'd like to test that it's giving me what I expect."

Nobody had any serious suggestions, although many humorous testing scenarios were mentioned.