Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

Hacking a subselect in ActiveRecord

Pivotal Labs
Wednesday, October 29, 2008

This week, Damon and I were doing a performance optimization for some slow queries. The most performant solution involved denormalizing some data into a join table and doing a subselect to get the ids of the records we wanted. Not rocket science, but also a bit ugly to construct the SQL by hand. Our solution was to cheat a tiny bit and use an ActiveRecord internal method to generate the SQL for us.

def favorite_posts(options={})
  subselect = Favorite.send(
                :construct_finder_sql,
                  :select => "post_id",
                  :conditions => {:blog_id => self.id},
                  :order => "published_at DESC",
                  :limit => options[:limit] || 10, :offset => options[:offset])
  Post.find(:all, :conditions => "posts.id IN (#{subselect})", :order => "published_at DESC")
end

That code uses the private method Favorite.construct_finder_sql to generate the following SQL:

SELECT * FROM posts WHERE posts.id IN (
    SELECT post_id FROM favorites WHERE blog_id = 42 ORDER BY published_at DESC LIMIT 10 OFFSET 10
  ) ORDER BY published_at DESC

The Ruby may look like more code than the SQL, and in that form it is… but if you go the hack up a string route, once you start using string operations or interpolation to deal with the variable parts of the query it gets ugly pretty fast. Using the ActiveRecord code to put it all together keeps it nice and clean, and even makes sure things are sanitized and quoted properly too.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

9 Comments

  1. Ben Hughes says:

    Nice technique! In general I would like to see more either in Rails core or in plugins that aide with SQL subqueries. There are many advanced situations (and some not so advanced) that really benefit from the use of subqueries. Even a lot of situations where a traditional join *could* be used are much more elegantly expressed using subqueries and since MySQL optimizes on-the-fly where necessary it usually doesn’t matter which way you specify it.

    Of course if the ActiveRecord ORM had constructs that produced subqueries it wouldn’t work with all databases.

    October 30, 2008 at 3:26 pm

  2. Mark Wilden says:

    If I understand correctly, this query is selecting rows from favorites according to some criteria, then selecting the matching rows from posts. I would’ve thought this would be solved by a simple join, not a subquery.

    Also, why sort the subquery?

    October 30, 2008 at 6:17 pm

  3. Josh Susser says:

    Mark: Using the excellent postgresql query plan explainer, we experimented with several different approaches. This is the one that worked best given the constraints of our schema and size of data (< 50K favorites, >1M posts). To be brief, doing a join has to look at a lot of rows in the posts table. The subselect means you can use an efficient index in one table to determine the posts you need, then selecting by id is very fast. Often a join is what you want, but one size doesn’t fit all.

    October 30, 2008 at 6:26 pm

  4. Mark Wilden says:

    Hmm. A query of the form

    SELECT * FROM favorites JOIN posts — etc.

    should definitely not look at a lot of posts. It should find the favorites that meet the criteria (using whatever indexes it can), then match those few rows against the posts. Perhaps PostgreSQL was confused – maybe its statistics needed updating, I dunno. But a subquery should not be necessary for this. Of course, there’s often a difference between “should” and “is.” :)

    October 30, 2008 at 8:03 pm

  5. Josh Susser says:

    Mark:

    The difference between theory and practice: In theory there’s no difference, while in practice there is.

    October 30, 2008 at 8:13 pm

  6. Reid MacDonald says:

    Ben, I had the same idea and went ahead and began writing a plugin to add subselects to ActiveRecord:

    http://github.com/latimes/subselector/tree/master

    I posted about it this morning here:

    http://geminstallthat.wordpress.com/2008/11/01/subselector-moneypenny/

    November 1, 2008 at 5:05 pm

  7. Brendan Baldwin says:

    I wrote some code that allows you to use association names as hash keys and say things like:

    Order.find(:all, :conditions => {
    :line_items => {
    :product => {
    :name => “Sham-Wow”
    }
    }
    })

    Or using dynamic finders…

    Order.find_all_having_line_items_having_product_having_name(“Sham-Wow”)

    etc–

    Haven’t updated it in a while, but its probably still working with Rails 2.3.x

    http://github.com/brendan/entrails/tree/master

    August 18, 2009 at 1:04 pm

  8. Josh Susser says:

    @brendan: I did something similar to allow using belongs_to associations in dynamic finders, but punted when I ran into dealing with polymorphic associations. Did you manage to work that out? I couldn’t tell from a rather quick inspection of your code.

    August 18, 2009 at 1:46 pm

  9. ravi says:

    What does the construct_finder_sql() function look like? Does it simply output a SQL string?

    May 10, 2013 at 6:06 pm

Add New Comment Cancel reply

Your email address will not be published.

Pivotal Labs

Pivotal Labs

Recent Posts

  • Does the set of all sets contain itself?
  • Standup 3/8/2012
  • Standup 3/7/2012
Subscribe to Pivotal's Feed

Author Topics

riddles (1)
agile (167)
capistrano (2)
rails (26)
movember (1)
git (10)
railsdoc (1)
object-design (1)
bdd (3)
cucumber (3)
linkedin (1)
oauth (1)
ruby (17)
tdd (2)
lvh.me (1)
rails 3.1.1 (1)
selenium (6)
homebrew (1)
mysql (5)
rvm (1)
sproutcore (1)
paperclip (2)
pry (1)
amazon (1)
heroku (1)
rails3 (2)
jasmine (3)
design (3)
process (12)
productivity (8)
learning (1)
olin (1)
migrations (2)
mongodb (2)
devise (2)
javascript (13)
rubymine (4)
ipad (1)
whurl (1)
head.js (1)
pairing (2)
tools (4)
pair programming (1)
rspec (10)
rspec2 (1)
ruby19 (1)
incubation (3)
startup (5)
api (1)
presenter (1)
vanna (1)
pivotal tracker (5)
capybara (1)
fakeweb (1)
webmock (1)
intern (1)
ruby on rails (25)
meetup (1)
textmate (1)
testing (20)
solr (4)
nyc-standup (11)
community (1)
opensource (3)
activerecord (4)
chrome (1)
mp4 (1)
activeresource (1)
flash (3)
neo4j (1)
nginx (1)
rsoc (1)
meta programming (1)
agile standup (7)
government (3)
webos (4)
xss (1)
jquery (1)
bundler (2)
ci (3)
gems (5)
postgresql (1)
geminstaller (1)
gemcutter (1)
cloud (2)
rack (2)
refraction (1)
gem (5)
refactoring (1)
validations (1)
webrat (1)
engine-yard (1)
firefox (2)
jsunit (1)
mongrel (2)
thin (1)
unicorn (1)
facebook (1)
rubygems (5)
jruby (1)
actioncontroller (1)
rails 2.3 (1)
palmpre (1)
autotest (1)
mac (2)
hosting (1)
goruco (11)
database (3)
railsconf (11)
gogaruco (4)
deployment (4)
github (1)
ie (1)
ajax (1)
intellij (1)
json (1)
asset packaging (1)
polonium (1)
character encoding (1)
utf-8 (1)
test (3)
civics (1)
hpricot (1)
rake (3)
sms (1)
unicode (1)
iphone (1)
java (1)
safari (1)
memory leaks (1)
rr (3)
editor (1)
css (1)
nyc (3)
performance (5)
fun (5)
enterprise rails (1)
health (1)
new and cool (1)
general (2)
treetop (1)
errors (1)
stack (1)
trace (1)
cache (1)
cookies (1)
freesoftware (1)
conferences (1)
development (1)
driven (1)
proxy (1)
caching (1)
peertopatent (1)
languages (1)
rest (2)
rubyforge (1)
sake (1)
file (1)
upload (1)
constants (1)
osx (1)
terminal (1)
pairprogramming (2)
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >