Joseph Palermo's blog



Joseph PalermoJoseph Palermo
RabbitMQ, AMQP gem, and EventMachine
edit Posted by Joseph Palermo on Friday July 31, 2009 at 10:42AM

I recently had a chance to work with RabbitMQ and the AMQP gem.

The first problem we ran into with subscribing to a queue was if we forcefully kill a client, a large number of messages would disappear from the queue (far more than we had processed on the client). I'll get to the "why" of this in a minute, but the solution was simply to turn on acknowledgments (or acks) for our messages which was something we knew we wanted to do anyway.

So after we turned on acks and started processing items from the queue we noticed that the number of items in the queue was not actually going down even though we were correctly sending the ack when we were done processing.

The AMQP gem uses EventMachine under the covers to manage the connections to the RabbitMQ server. It turned out that when you subscribe to a queue, it is a one time thing. You establish a connection and that is it. The server then sends you messages from that queue over the socket. RabbitMQ pre-fetches messages for you, meaning it crams a bunch of data over the socket and doesn't wait for you to ask for more, it notices that you've read data off the socket, and pushes more to you.

The repercussions of this in the EventMachine world is a major blockage of data. EventMachine has an internal loop where it goes through registered sockets, and processes all the data off any sockets that are ready to be read from before it continues its loop. The server was basically keeping the socket full, so EventMachine would only complete a internal loop after we processed a full socket of data, and then it would get blocked on the next loop since the server has already filled the socket up again.

This means that all of our acks were sitting inside of EventMachine waiting for the loop to continue so they could be sent out. It also explains why when we weren't using acks we were losing messages. The server had sent them to our socket and they were waiting to be processed and by killing the process we lost that data.

My first reaction was that the AMQP gem should be pulling all the data off the socket and caching it locally, then processing a single record off of that cache every time the EventMachine loop ran. This of course won't work because as soon as we empty the socket, RabbitMQ is just going to fill it up again (until we have all the messages from the queue in our local cache).

So the solution? RabbitMQ 1.6 has an option to set a pre-fetch limit. So we simply set the pre-fetch limit to 1, and our EventMachine loop runs nice and fast now. You'll want to tweak your pre-fetch limit depending on how long it takes to process each message. If you can churn through a hundred messages a second, you probably won't even notice this problem and the prefetching will help you, but if it takes you a few seconds (or minutes) per message, you'll wonder why things aren't popped off the queue for several minutes (or hours).

Joseph PalermoJoseph Palermo
How do you use named_scopes?
edit Posted by Joseph Palermo on Wednesday July 29, 2009 at 09:51AM

You may have heard of some problems we've had with changes to named_scope in Rails 2.3.

The basic change is that when chaining named scopes together, their scoping does not apply only to the finder class, but also to any lambdas evaluated farther along the named scope chain.

So given a User class with a friends association (pointing at other Users) with the following named_scopes:

named_scope :named_bob, {
  :conditions => {:name => 'bob'}
}

named_scope :second_degree_friends, lambda{|user|
  user_friends = user.friends
  second_degree_friend_ids = user_friends.collect{|u| u.friend_ids}
  {
    :conditions => {:id => second_degree_friend_ids.flatten}
  }
}

These two calls are no longer the same.

User.second_degree_friends(user_sam).named_bob

User.named_bob.second_degree_friends(user_sam)

The first call does what we expect (giving us all of user_sam's second degree friends who are named bob. But the second call actually gives us something different. Because the named_bob scope comes first in the chain, when it evaluates the lambda for second_degree_friends, it applies it in the scope of all previous named scopes. So our call to the user.friends association is actually scoped with the additional condition of :name => 'bob', which is probably not what we want in this case.

You can see the lighthouse ticket where I claim this should not be the default behavior of named scopes. But my question right now is, "How do you use named scopes?"

I tend to use them in a composable manner, especially in search objects. I take a base finder such as User or User.friends and then I pass it down to a add_conditions or add_sort method. Inside those methods, they add on any other named scopes they need to and return the new finder object. So inside of this chain, you never really know what finders have been applied already, but in the past, you didn't need to know because the same named_scope with the same parameters always gave you the same conditions.

Often there will be one search object that inherits from another, say for instance LocationUserSearch < UserSearch that adds geo targeted searching on top of UserSearch. In these cases, we can just create our own add_conditions method, call super and tack on any new conditions that we need. Since conditions and joins are merged in scopes, this normally works out great.

Do you use named scopes in a composable way such as this? Or do you only combine them in a known way and might benefit from having the accumulated scope applied to the lambda?

Feel free to add your comments to the lighthouse ticket too.