Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

Standup 05/20/2009: "Merging Large Data Sets – after_commit"

Jonathan Barnes
Wednesday, May 20, 2009

Ask for Help

“What are some good ways to merge large business location data sets?”

There was a bunch of input including the following:

  • You should create a scoring of how close the matches are.
  • Good admin merge tools are worth the effort to create.
  • Normalizing the data prior to the merge (i.e. pass the addresses through the USPS API to turn [Av Ave Avenue] => Ave)
  • Humans do this best, outsource or Mechanical Turk it.

Interesting Things

  • The after_commit plugin allows you to hook events to after the transaction commits. This is really useful when kicking off threads that expect to have access to the data in the database. Note: using after_save can cause you to have a race condition if the other thread attempts access to the data before the original thread has a chance to commit the transaction.

  • If you are storing a marshaled object in the database, you should make that field a blob type, it is smaller to store and if you leave it as a text or varchar you can corrupt the binary data you are storing in there. If you don’t have a choice about field types you should at least base64 encode the marshaled data before storing it.

ctrl+z

RE: “NewRelic Side Effects?” from 05/19/2009 Standup

  • It seems that NewRelic was not the cause of the problem but helped in exacerbating the problem by holding the transaction open long enough to create a race condition that still shows up when the system is put under enough load. To fix our problem we moved the trigger that launches the background process from and after_save to an after_commit see plugin. We also re-added NewRelic.
  • 0 Shares
  • Share on Facebook
  • Share on Twitter

Add New Comment Cancel reply

Your email address will not be published.

Jonathan Barnes

Jonathan Barnes
Los Angeles

Recent Posts

  • Standup 08/13/2010: (css3 nth-child pseudo selector bug in safari)
  • Standup 08/12/2010: (Text encoding issues)
  • Standup 08/11/2010 (using VCR to mock out external dependencies and much more)
Subscribe to Jonathan's Feed

Author Topics

agile (11)
profiling (1)
rubymine (1)
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >