Rajan AgaskarRajan Agaskar
Standup 12/30/08
edit Posted by Rajan Agaskar on Tuesday December 30, 2008 at 05:23PM

Interesting Things

  • Ruby Hash is really, really, really fast

If you're building a data structure and you need it to be perfomant, Ruby Hash comes highly recommended from Steve Conover. If you're doing a dance and you need it to be awesome, I highly recommend the Robot. Or maybe the Cabbage Patch.

  • Counter cache, fixtures, and invalid data

Invalid counter cache data can cause unexpected behavior. For example: size() returning a bad count or associations asserting they're empty when they aren't. In this case, the invalid counter cache data was caused by bad or missing fixture values, a situation that was not caught out by the debugger. With this in mind, it may be useful to resort to puts/p statements if you suspect the counter cache is the source of the problem.

  • Count or Size methods may return incorrect values from associations or named scopes using GROUP BYs

When calling 'count', or 'size' on an association, Rails replaces the select of the actual query with a COUNT(*), and strips GROUP BY statements. This can cause the returned count to differ from the actual number of records. A simple (and expensive) workaround is to use .length, which will force the association to be loaded and then return its count. A better method is to pass a :select value to count which selects a COUNT(DISTINCT(foo)) where foo is the column you are grouping by. It is worth nothing that COUNTing DISTINCT records is much less of a performance hit then actually returning their values, so the resulting query is faster than you might expect.

  • first and last on has_many associations

This has been previously mentioned in this space, but as we're on the topic of unexpected ActiveRecord behaviors, it's worth reiterating. If you have model Foo, which has many Bars, calling foo.bar.first will always go to the database. This means, for example, that the following statements will not have the expected result:

foo.bar.first.some_value = 'baz' foo.bar.first.save

You would normally expect this to set some_value on foo.bar.first to 'baz' and then save it, but the foo.bar.first object that has some_value is blown away by the foo.bar.first.save statement, which again retrieves the first object from the database (and then saves it). last behaves in a similar manner. A workaround is to always load the results of first or last into an variable and then work with it. In other words:

my_foo = foo.bar.first my_foo.some_value = 'baz' my_foo.save

For a much more thorough treatment of this subject, please see Frederick Cheung's post First, foremost, and [0].

ActiveRecord::BaseWithoutTable is very handy for when you want ActiveRecord validatioons on a model that does not have a corresponding table (for example, a feedback form).

Comments

  1. Kyle Kyle on December 30, 2008 at 06:58PM

    I find Ruby hashes painfully slow. i.e:

    kyle-maxwells-macbook:~ kyle$ time ruby -e "h={}; 1.upto(1000000) {|i| h[i.to_s] = i }"
    
    real    0m2.276s
    user    0m2.190s
    sys 0m0.077s
    kyle-maxwells-macbook:~ kyle$ time perl -e "\$h={}; for(\$i=0; \$i<1000000; \$i++){ \$h[\$i + ''] = \$i;}"
    
    real    0m0.385s
    user    0m0.356s
    sys 0m0.025s
    
  2. Joseph Palermo Joseph Palermo on December 30, 2008 at 07:26PM

    Kyle -

    MRI Ruby itself is painfully slow.

    I think the statement that ruby hashes are fast, meant relatively within ruby.

  3. Steve Conover Steve Conover on January 06, 2009 at 06:08AM

    I'll pay 6x for Ruby

  4. Adam Milligan Adam Milligan on January 10, 2009 at 06:32PM

    @Kyle,

    I take your point, however your example doesn't tell the whole story. Hash table performance depends primarily on the speed of the hash function and collision management. I notice you convert the integer keys to strings before insertion; I'm guessing this is because the trivial hash function for integer keys didn't provide as significant a performance disparity. However, how do you know that the slowdown for Ruby isn't in the #to_s function, rather than the hash function for strings?

    Also, hash table insertion, retrieval, and deletion are all guaranteed average time O(1), but the constant modifier for each may be significantly different. I notice you benchmarked only insertion.

    In any case, Joseph made the important point; Ruby's strengths don't lie in performance.