Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

New York Standup 9/30/2008

Pivotal Labs
Tuesday, September 30, 2008
  • What’s the best way to import a million records into a postgres database via ActiveRecord (which is needed to implement some application-specific logic)? We anticipate waiting a second (or so) between inserts to avoid slowing down the production database (which is under load, almost entirely reads). If there is any ActiveRecord feature which helps batch together inserts, noone knew about it. As for generally how long this will take (estimates range from 9 to 27 hours), and what the load on the production database will be, we planned on answering that with a trial run of a small number of these records.

  • We’re thinking of having capistrano deploy to two demo servers, one particularly aimed at showing to prospective users of our application, and the other mostly for story acceptance. The former would be hosted at a hosting company; the latter an internally run machine. Several people reported they have done this on their projects, and the problems were minor, mostly having to do with whether the deployed location (/u/apps/whatever or some such) is different on the two machines (the solution would be to use the capistrano variables, but tracking down all the places that need to do that could be an issue).

  • Erector tip of the day: in a Rails project, you can put a file (named edit.rb or edit.html.rb) in your view directory, and Rails/Erector will find the template implicitly (as it would for ERB, HAML, etc). It is not necessary to explicitly call render from your controller method.

  • 0 Shares
  • Share on Facebook
  • Share on Twitter

10 Comments

  1. Strass says:

    Regarding your 2nd point, I usually see this kind of instances as new steps in the production chain. That’s why I use the capistrano multistage extension (gem install capistrano-ext) to define those new steps (possibly with their own environment files).

    September 30, 2008 at 7:35 pm

  2. Steve C says:

    Is there some non-AR way of loading records into postgres that would meet your needs? I’m thinking of some equivalent of the mysql “load data infile”, that loads mass amounts of data 20x faster than any alternative.

    September 30, 2008 at 8:06 pm

  3. Steve C says:

    re: erector, I’d say “it’s not necessary to use implicit templates, you can just call render directly”. ha ha.

    September 30, 2008 at 8:08 pm

  4. Dan Kubb says:

    Have you thought about using DataMapper to handle the inserts instead of ActiveRecord? As of the most recent [DataMapper benchmarks](http://gist.github.com/10735) DM is about 2x faster than AR when inserting records and performing most other operations.

    DataObjects would likely even be faster still, since it is what DM uses under the hood to communicate with Postgres. It should be the fastest Ruby RDBSM driver available at the moment — faster than what AR uses, including the recently released Neverblock drivers, and it works with Ruby 1.8.

    September 30, 2008 at 9:02 pm

  5. Karl says:

    RE: #1–
    Could you load the records on a copy of your production db on a local machine, and after all is done then do a export/import into the production machine? At least this way, if something goes wrong, there is much less of a chance of it munging up your production data. Not that that has ever happened me.

    September 30, 2008 at 9:13 pm

  6. Chad Woolley says:

    re #2 – yep, Strass is right. That’s what multistage was made for. Put all the differences in config/deploy/.rb

    Also, the story acceptance environment should be deployed after every CI build. On our projects, we already do that for a “local” localhost environment (check out Sandbox), so it should be straightforward to do the same thing for a “demo” (vs staging?) environment.

    September 30, 2008 at 9:34 pm

  7. Jonathan says:

    You guys seen http://www.jobwd.com/article/show/31 ? If possible, batch the inserts into groups of 10 or 100. In my simple tests, ar-extensions is at least 3 times faster.

    September 30, 2008 at 11:37 pm

  8. Chris Kilmer says:

    In response to your large dataset import question, we used the acts_as_importable plugin with great results. The plugin allows to you do pretty much everything as usual (validations, column discovery, SQL-escaping, etc…) except that instead of saving to the db, the plugin creates a sql bulk import file which you can load into the db at your leisure.

    Sooooo much faster.

    Now, we were using MySQL. Not sure about the Postgres support, but it might be worth looking into.

    October 1, 2008 at 12:47 am

  9. David Stevenson says:

    We use ar-extensions extensively to handle our large data imports. No problems to speak of, :validate => false is a useful option that speeds things up if you are okay skipping validations.

    ar-extensions also adds some really neat finder support.

    October 1, 2008 at 12:50 am

  10. Toby Matejovsky says:

    http://rubypond.com/articles/2008/06/18/bulk-insertion-of-data-with-activerecord/ discusses ar-extensions for multiple inserts per query. (http://www.continuousthinking.com/tags/arext)

    October 1, 2008 at 2:36 am

Add New Comment Cancel reply

Your email address will not be published.

Pivotal Labs

Pivotal Labs

Recent Posts

  • Does the set of all sets contain itself?
  • Standup 3/8/2012
  • Standup 3/7/2012
Subscribe to Pivotal's Feed

Author Topics

riddles (1)
agile (167)
capistrano (2)
rails (26)
movember (1)
git (10)
railsdoc (1)
object-design (1)
bdd (3)
cucumber (3)
linkedin (1)
oauth (1)
ruby (17)
tdd (2)
lvh.me (1)
rails 3.1.1 (1)
selenium (6)
homebrew (1)
mysql (5)
rvm (1)
sproutcore (1)
paperclip (2)
pry (1)
amazon (1)
heroku (1)
rails3 (2)
jasmine (3)
design (3)
process (12)
productivity (8)
learning (1)
olin (1)
migrations (2)
mongodb (2)
devise (2)
javascript (13)
rubymine (4)
ipad (1)
whurl (1)
head.js (1)
pairing (2)
tools (4)
pair programming (1)
rspec (10)
rspec2 (1)
ruby19 (1)
incubation (3)
startup (5)
api (1)
presenter (1)
vanna (1)
pivotal tracker (5)
capybara (1)
fakeweb (1)
webmock (1)
intern (1)
ruby on rails (25)
meetup (1)
textmate (1)
testing (20)
solr (4)
nyc-standup (11)
community (1)
opensource (3)
activerecord (4)
chrome (1)
mp4 (1)
activeresource (1)
flash (3)
neo4j (1)
nginx (1)
rsoc (1)
meta programming (1)
agile standup (7)
government (3)
webos (4)
xss (1)
jquery (1)
bundler (2)
ci (3)
gems (5)
postgresql (1)
geminstaller (1)
gemcutter (1)
cloud (2)
rack (2)
refraction (1)
gem (5)
refactoring (1)
validations (1)
webrat (1)
engine-yard (1)
firefox (2)
jsunit (1)
mongrel (2)
thin (1)
unicorn (1)
facebook (1)
rubygems (5)
jruby (1)
actioncontroller (1)
rails 2.3 (1)
palmpre (1)
autotest (1)
mac (2)
hosting (1)
goruco (11)
database (3)
railsconf (11)
gogaruco (4)
deployment (4)
github (1)
ie (1)
ajax (1)
intellij (1)
json (1)
asset packaging (1)
polonium (1)
character encoding (1)
utf-8 (1)
test (3)
civics (1)
hpricot (1)
rake (3)
sms (1)
unicode (1)
iphone (1)
java (1)
safari (1)
memory leaks (1)
rr (3)
editor (1)
css (1)
nyc (3)
performance (5)
fun (5)
enterprise rails (1)
health (1)
new and cool (1)
general (2)
treetop (1)
errors (1)
stack (1)
trace (1)
cache (1)
cookies (1)
freesoftware (1)
conferences (1)
development (1)
driven (1)
proxy (1)
caching (1)
peertopatent (1)
languages (1)
rest (2)
rubyforge (1)
sake (1)
file (1)
upload (1)
constants (1)
osx (1)
terminal (1)
pairprogramming (2)
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >