Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

Monthly Archives: September 2008

Pivotal Labs

New York Standup 9/30/2008

Pivotal Labs
Tuesday, September 30, 2008
  • What’s the best way to import a million records into a postgres database via ActiveRecord (which is needed to implement some application-specific logic)? We anticipate waiting a second (or so) between inserts to avoid slowing down the production database (which is under load, almost entirely reads). If there is any ActiveRecord feature which helps batch together inserts, noone knew about it. As for generally how long this will take (estimates range from 9 to 27 hours), and what the load on the production database will be, we planned on answering that with a trial run of a small number of these records.

  • We’re thinking of having capistrano deploy to two demo servers, one particularly aimed at showing to prospective users of our application, and the other mostly for story acceptance. The former would be hosted at a hosting company; the latter an internally run machine. Several people reported they have done this on their projects, and the problems were minor, mostly having to do with whether the deployed location (/u/apps/whatever or some such) is different on the two machines (the solution would be to use the capistrano variables, but tracking down all the places that need to do that could be an issue).

  • Erector tip of the day: in a Rails project, you can put a file (named edit.rb or edit.html.rb) in your view directory, and Rails/Erector will find the template implicitly (as it would for ERB, HAML, etc). It is not necessary to explicitly call render from your controller method.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Pivotal Labs

Standup 09/30/2008: svn log slowness; tech talk videos

Pivotal Labs
Tuesday, September 30, 2008

Interesting Things

  • svn log takes a long time when you have recently committed a large number such as 5000 files.
    Specifically, this happens when you have path based security. Here’s a link from the subversion documentation that mentions this

    All of this path-checking can sometimes be quite expensive, especially in the case of svn log. When retrieving a list revisions, the server looks at every changed path in each revision and checks it for readability. If an unreadable path is discovered, then it’s omitted from the list of the revision’s changed paths (normally seen with the –verbose option), and the whole log message is suppressed. Needless to say, this can be time-consuming on revisions that affect a large number of files

  • Check out the new links at pivotallabs.com

    • Talks lists all the tech talks that happen here at Pivotal such as Blaine Cook’s Fire Eagle talk. If you have some cool technology and would like to give a talk, contact us at techtalks@pivotallabs.com
    • Who lists all the great people who work here
  • 0 Shares
  • Share on Facebook
  • Share on Twitter
David Stevenson

Pivots patch rails: named_scope with the :joins can cause table aliasing issues

David Stevenson
Monday, September 29, 2008

In order to accomplish some advanced search functionality, we’ve added a lot of named_scopes to our User model. This seems like a good idea, and well within the intended use for named_scopes. Unfortunately, we ran into issues with our :joins. We have a separate User and Profile model, but our advanced search scopes often needed both to make decisions. So we had some scopes that look like this:

class User
  named_scope :verified {
    :conditions => {:email_verified => true}
  }

  named_scope :answered_questions {
    :join => "INNER JOIN profiles ON profiles.user_id = users.id " +
                 "INNER JOIN answers ON answers.profile_id = profiles.id"
  }

  named_scope :with_name { lambda { |name|
    :join => "INNER JOIN profiles ON profiles.user_id = users.id",
    :conditions => ["profiles.name LIKE ?", "%#{name}%"]
  } }
end

Using these named_scopes, we wanted to dynamically construct a finder that would return the results the user was interested, such as: User.verified or User.answered_questions or even User.verified.answered_questions.with_name('Joseph'). The last scope caused issues, unfortunately, with table aliasing. The query ended up joining in the profiles table twice, in exactly the same way without renaming the table, so mysql rejects the query.

The easiest solution to this problem was to use only the hash form for :join clauses, such as :join => :profile. Rails correctly merges multiple consecutive join scopes that use hashes. If you need to use string joins (such as a LEFT JOIN rather than an INNER JOIN) or put a condition directly on your join, then merging goes out the window and the hashed form is immediately converted to a string and all consecutive joins are “merged” by appending them together.

We started by manually aliasing our scopes, but in some cases we were concerned about the amount of duplicate data this was causing in our queries.

We thought about creating a dependency framework for named_scopes, such that you could have a single :profile scope that other scopes were dependent on and it would only ever get added once. This seemed really difficult because of the way the with_scopes are constructed by named_scopes, there was no good place to keep track of these dependencies, and it would still cause problems if you had a manual with_scope, or :join in your find.

Finally we decided that rails fundamentally lacked the capability to deal with duplicate joins, and that we should solve this problem. It seemed a good solution was to allow :join options to take an array of strings as follows:

  named_scope :answered_questions {
    :join => ["INNER JOIN profiles ON profiles.user_id = users.id",
                 "INNER JOIN answers ON answers.profile_id = profiles.id"]
  }

Now calling User.answered_questions.with_name('Joseph') will create three values in a :join array, two of which are identical and will be uniq’d out. The downside to this approach is that each value in the :join array has to be string identical, or it will not be properly uniq’d.

So if you are mixing hash style :profile joins with string joins of the same table you need to be careful you match the rails generated syntax. We mostly use string style joins to avoid this issue.

Here’s the ticket the we filed and patched:
1077-chaining-scopes-with-duplicate-joins-causes-alias-problem

It has been commited and will roll out with rails 2.2. Since then we have filed two more issues related to :join and :include:

  • 1078-using-include-assoc-and-join-assoc-leads-to-alias-issue
  • 1104-references_eager_loaded_tables-should-search-tables-in-join-clauses

We hope to patch these two as well!

Joseph & David

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Adam Milligan

Solr demystified

Adam Milligan
Sunday, September 28, 2008

As I mentioned in this post, we’ve decided to set aside some of our weekly brown bags to spread around some knowledge on different technologies via a relatively informal presentation/discussion format. This past week we talked a bit about Solr.

This post covers much of what we discussed, ranging from the introductory to the somewhat arcane. If you’re a seasoned Solr user, this may not have much for you. But, you never know.

Solr: wtf?

For people who have never used Solr (me, for instance), I’ll start with the obvious question: what is it? At its most basic, Solr simply provides a web interface to the Lucene search engine. It’s written in Java and runs as a servlet inside a servlet container such as Tomcat or Jetty. The example application included in the distribution package includes Jetty, so you can get up and running relatively easily. You use Solr by sending your requests in the form of XML over HTTP; the responses also contain XML.

For those of you looking for sense in the world, I’m sorry: Solr isn’t an acronym, and to our knowledge doesn’t stand for anything in particular. It’s just a name with a vowel shortage.

You can find the home page for Solr here, a wiki for discussion of all things Solr here, and tutorial to get you started here. Finally, you can download the distribution (the current release version is 1.3.0) here.

But, why?

Let’s say you’re working on a site to help people find a physician. Users of this site might care about location, age, or gender of each physician. Your site might include how many pending malpractice suits each physician has, how patients have rated their bedside manner, or what magazines they stock in their waiting rooms. As a good citizen of the web community, you want to provide your users the ability to search for any combination of these criteria. You have all the information sitting in your database, so you should be able to search it, right?

Sure, no problem, but in order to ensure quick response times you’ll want to add indices on the columns in your physicians table. But, which indices to add? If your table has columns for age, gender, and rating, and you want to allow users to search on any combination of fields, then you need three indices to match all searches:

  1. age, gender, rating
  2. rating, age, gender
  3. gender, rating, age

Keep in mind that indices match from left to right, and will only match on columns included in the query. Thus, if you allow searching on another column you’ll need to have eight indices:

  1. age, gender, rating, mortality rate
  2. age, gender, mortality rate, rating
  3. age, rating, mortality rate, gender
  4. age, mortality rate, gender, rating
  5. gender, rating, mortality rate, age
  6. gender, mortality rate, age, rating
  7. rating, mortality rate, age, gender
  8. mortality rate, age, gender rating

So, we quickly discover that we need n! / (n – 1) indices to search n columns, and this doesn’t take into account range queries. This could quickly get out of hand; Solr to the rescue.

Solr will build your indices for you based on the columns you tell it you want to search, it will keep these indices up to date as you add or change records, and it will do it fast.

More accurately, Lucene will do these things for you. However, Solr allows you to put Lucene on its own server that your application talks to via HTTP. This way all of your production servers can share the same Solr server, keeping searches consistent for all instances of your application.

Next up, performance and such.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Adam Milligan

The resurrection of the free lunch

Adam Milligan
Sunday, September 28, 2008

At Pivotal we generally have a talk each Wednesday that regards some topic of professional interest. These “brown bag” talks generally involve something to do with interesting projects or new technologies that we might want to use; sometimes we invite speakers from outside Pivotal, sometimes Pivots do all the talking. We nearly always invite folks from outside Pivotal, and we always feed everyone lunch.

Unfortunately, keeping up a schedule of interesting talks takes a not insignificant effort, and most of us here have one or two other things taking up our time. So, sometimes a week rolls around without a scheduled speaker. Or, on rare occasions, a speaker bails or has to reschedule. When this happens we have to cancel our talk for the week, everyone misses a learning opportunity, and no one gets a free lunch.

So, we decided to fill in the gaps with less formal, but still informative, discussions of various topics. We plan to focus on technologies that we use on some of our projects, that some Pivots may know a lot about while others may know less. In short, a chance to spread around some knowledge.

This past week we had our first of these discussions on Solr. I took some notes and will publish some of what we discussed in subsequent posts. Lunch was buffet-style Burmese. Delicious.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Dan Podsedly

October 's NYC Ruby Happy Hour is next Wednesday

Dan Podsedly
Friday, September 26, 2008

Next week is the first Wednesday of October, and that means another New York City Ruby Happy Hour, sponsored by Outside.in and Pivotal.

Where: Outside.in, 20 Jay St Suite 1019 (10th Fl), Brooklyn, NY
When: 7-9PM, Wednesday September 3rd
Who: If you’re a developer who uses Ruby and would like to meet some other Ruby folks, toss around ideas, or just have a few beers, we welcome you with open arms!

There will be pizza, beer and wii-based entertainment for everyone. Click here for more details, and to RSVP.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
David Stevenson

Standup 09/26/2008: proxy_target vs load_target

David Stevenson
Friday, September 26, 2008

Interesting Things

  • When defining an extension to an association, you can access the loaded association data through proxy_target. If the data hasn’t been preloaded/loaded when you call this method, it will return []. If you’d like to manually load the target, you can call load_target, and you can call loaded to determine if the proxy data has been loaded. For most situations, however, you can rely on the association to load itself when necessary by calling methods on self as follows:
has_many :people do
  def bad_people
    self.select {|person| person.bad? }
  end

  # exact same situation as 'bad_people', but 2x worse code
  def good_people
    load_target unless loaded?
    proxy_target.select {|person| !person.bad? }
  end
end
  • There’s no good way to use CSV fixtures and has_and_belongs_to_many associations, in such a way that they are easily understandable and editable by non-technical people. Foxy fixtures solved a lot of issues with fixtures, but those advantages only work with YAML fixtures. Hence, if you have a HABTM situation, you’re stuck building a lot of rows of CSV referencing arbitrarily chosen IDs across several different files.
  • 0 Shares
  • Share on Facebook
  • Share on Twitter
Pivotal Labs

Sunday Sunday (3 would be cliché)

Pivotal Labs
Thursday, September 25, 2008

This Sunday I am co-presenting at toorcon. The talk that I am a part of is titled Owning telephone entry systems (aka why you shouldn’t sleep so well), and will be presented by Josh Brashars and myself. If you’re in the greater San Diego area and have time to kill at a security conference, stop by.

I’m releasing a ruby web app as a part of the talk. It’s build in sinatra, which has been really fun to play with so far — shout out to nakajima for the quick ramp-up to the frank of ruby’s rat-pack at nycrb! The code is on github, and I plan to write about my impressions of sinatra here in the future. No promises, though.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
David Stevenson

Standup 09/24/2008: Why does my JVM crash running SOLR?

David Stevenson
Thursday, September 25, 2008

Interesting from yesterday

The difference between new-style-includes (rails 2.1+) and old-style-includes in rails is the size of the query. In the old style, rails selects all the data from all the tables in a single query, using some crazy renames that look like this:

SELECT users.id AS t1_r1, users.name AS t1_r2,
profiles.id AS t2_r1, ...
FROM users
LEFT OUTER JOIN profiles ON profiles.user_id = users.id

This can get really bad if you :include multiple has_many associations, because the number of rows multiplies rapidly! In the new-style-includes, ActiveRecord does one SELECT per table like so:

SELECT * FROM users
SELECT * FROM profiles WHERE user_id IN (1,2,3,4,5,6)

More queries, but each one returns a small number of rows, and overall is a big performance improvement. The problem comes when you add :conditions that reference tables you :include. The new style will attempt to write this query:

SELECT * FROM users WHERE profiles.gender = 'M'
# ERROR - no table profiles!

So, you can make all your includes faster as long as you don’t have any :conditions, :order, or :select clauses that select from tables other than the base finder table. In our case, we hardcoded this check to always use the new-style-includes, manually ensuring that we don’t fall into these failing situations.

Ask for Help

“Why does my JVM seg. fault when running SOLR?”

Virtual machines should never segmentation fault! It’s probably a JVM/OS/library issue, so check try a different version of the JVM and check that it has all it’s proper dependencies. Alternatively, try a different VM entirely.

“Is there a way in Excel to ‘reshape’ 2D data?”

If you have an NxM matrix in Excel, you can transpose it to a MxN matrix easily. But if you want to convert it to a (M/2)x(N*2) through a reshaping you’re probably on your own. You could open it in ruby and reshape the arrays that way…

  • 0 Shares
  • Share on Facebook
  • Share on Twitter
David Stevenson

Standup 09/23/2008: Disabling pre-rails-2.1 style :include

David Stevenson
Tuesday, September 23, 2008

Interesting Things

  • If your HTTP header’s HTTP_CLIENT_IP is not equal to HTTP_X_FORWARDED_IP, then rails 2.1 and above will consider it an IP spoofing attack and throw an exception! This is bad news for some traditional Apache->Mongrel setups. Solution is probably to change the apache HTTP headers, but we’re wondering exactly why this is a security problem for rails (and why they would break compatibility with the default apache setup from way back when)?
  • Be careful when using validates_uniquess_of with :case_sensitive => true AND a unique index at the database level. If your database is case insenitive, then rails will approve the uniqueness, but the database will fail the insert. Solution: be sure to use a collation type for the unique column that is case sensitive (such as binary in mysql).
  • Rails 2.1+ :includes are way better than pre-2.1, but they are less compatible with conditions. Hence, rails falls back on the old style. Here’s when it might legitimately fall back:
User.find(:all, :include => :profile, :conditions => "profiles.gender = 'M'")

Because we reference the included table profiles in the :conditions, rails has no choice but to construct one giant query to fetch Users and their profiles, rather than a separate query. Here’s a case when it guesses wrong:

User.find(:all, :include => :profile,
  :joins => "INNER JOIN comments ON comments.user_id = users.id",
  :conditions => "comments.approved = 1")

Because the conditions references a table that is not users, rails thinks it has to fall back to the old include style… but it’s wrong! Here’s how we tricked ActiveRecord into always using rails 2.1+ includes (note that we had to fix up a few queries that were referencing :inlcuded tables in :conditions to make this work):

module ActiveRecord::Associations::ClassMethods
  private
  def references_eager_loaded_tables?(options)
    false
  end
end
  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Topics

  • agile (780)
  • rails (113)
  • testing (88)
  • ruby (83)
  • ruby on rails (70)
  • jobs (62)
  • javascript (55)
  • techtalk (44)
  • rspec (38)
  • ironblogger (32)
  • productivity (30)
  • activerecord (29)
  • gogaruco (29)
  • git (28)
  • nyc (27)
  • rubymine (26)
  • bloggerdome (23)
  • mobile (22)
  • process (21)
  • pivotal tracker (20)
  • cucumber (20)
  • jasmine (19)
  • design (18)
  • ios (18)
  • webos (17)
  • objective-c (17)
  • android (16)
  • palm (16)
  • "soft" ware (16)
  • fun (15)
  • tracker ecosystem (15)
  • ci (15)
  • cedar (15)
  • rails3 (14)
  • performance (14)
  • bdd (14)
  • gem (13)
  • css (13)
  • tdd (13)
  • selenium (12)
  • goruco (12)
  • bundler (12)
  • meetup (11)
  • railsconf (11)
  • nyc-standup (11)
  • capybara (10)
  • mac (10)
  • mojo (10)
  • chef (10)
  • api (10)
Subscribe to Community Feed
  1. 1
  2. 2
  3. 3
  4. →
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >