Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

Standup 11/10/2009: Facebook ids, Rails association bug, Carnegie Mellon lecture

Tyler Schultz
Tuesday, November 10, 2009

Ask for Help

“We’re getting id’s from facebook that are overflowing INT columns in mysql tables. Should I use BIGINT to accommodate these ids? Is this wasteful?”

Use string columns instead. VARCHAR columns will only use as much space as the id needs.

“Are VARCHAR columns slower to join than INT columns?”

Not in mysql. Strings do not have join performance penalty compared to integers.

Interesting Things

  • Edward was a guest lecturer for a Carnegie Mellon University course where students are learning to do Agile development. Course work involves pair programming, TDD, and using Ruby on Rails. The talk was well received and Edward is invited to come back next year!

  • There is an apparent rails bug (2.3.2) when building associated objects and then saving them.

class User < ActiveRecord::Base
 attr_accessor :bool
 before_validation_on_create :set_bool

 def set_bool
   self.bool = true
 end
end

class Child < User
 belongs_to :adult
 before_validation_on_create :create_adult

 def create_adult
   if adult.nil?
     self.build_adult()
   end
 end
end

class Adult < User
end

specify "should work" do
 child = Child.create!()
 child.bool.should be_true
 child.parent.bool.should be_true #fails
end
  • 0 Shares
  • Share on Facebook
  • Share on Twitter

10 Comments

  1. Simon Russell says:

    I’m obviously missing something here, but isn’t storing ids in VARCHARs going to be wasting space for all ids > approx 2^27? (BIGINTs are 8 bytes.) Also, doesn’t mysql store VARCHARs quite inefficiently if you are using multibyte encodings (meaning that nearly all ids will be inefficiently stored)?

    (unless of course you’re encoding the numbers differently into the strings.)

    November 10, 2009 at 2:44 pm

  2. Coda Hale says:

    > VARCHAR columns will only use as much space as the id needs.

    A BIGINT is a 64-bit integer. It will always and only take 64 bits. The smallest number which isn’t representable by an INTEGER is 2147483648, which stored as a VARCHAR will take 80 bits.

    So in a best-case scenario a VARCHAR is 25% bigger. In a worst-case scenario (unsigned BIGINT) a VARCHAR is 150% bigger.

    Use a BIGINT.

    November 10, 2009 at 2:49 pm

  3. TK says:

    > Strings do not have join performance penalty compared to integers.

    Is this really true? I haven’t tested recently but in the past I have found that ints join faster than varchars.

    November 10, 2009 at 3:16 pm

  4. Steve Conover says:

    So the strings-aren’t-slow-in-joins comment was mine, based on a conversation I had with a MySQL DBA (who I consider an authority) from a couple years ago.

    Here’s a recent post:

    [http://www.andrewrollins.com/2009/06/21/mysql-join-performance/](http://www.andrewrollins.com/2009/06/21/mysql-join-performance/)

    that would suggest that TK is right, Strings indeed result in a mild performance hit (I don’t think those numbers would scare me off. I’d still be interested in a test that went into the millions of rows).

    But I’m going to put this question to Peter and Vadim over at the MySQL Performance Blog and see what they have to say. Maybe we can settle this once and for all (for 5.x at least…).

    -Steve

    November 10, 2009 at 7:06 pm

  5. jacob swanner says:

    I’m guessing it’s just a typo in the example, unless I’m missing something…

    Child belongs Adult, but then you call child.parent

    November 10, 2009 at 8:29 pm

  6. Kevin Tsai says:

    I was one of the students at CMU during Edward’s lecture. It was a very inspirational talk and made me appreciate Extreme Programming.

    November 10, 2009 at 10:05 pm

  7. Coda Hale says:

    Steve, there are no wins for representing big numbers in MySQL using VARCHAR. It takes up more space, even if it has comparable performance in a JOIN.

    On a 64-bit server (which all MySQL servers should be), comparing two 64-bit integers is a single CPU opcode. Even if MySQL did a bit-wise comparison (which it doesn’t — strings are compared using MySQL’s collation settings) it would still require two 64-bit comparisons.

    It takes up more space and is at best a statistical draw in terms of performance.

    November 11, 2009 at 12:40 am

  8. Erin Staniland says:

    If you use strings you’re safe against any future changes in ID format. Using integers is presuming that the ID will always be an integer of some sort.

    http://simonwillison.net/2009/Jun/9/twitpocalypse/#c46139

    November 11, 2009 at 1:27 am

  9. Steve Conover says:

    Coda – agreed

    November 11, 2009 at 7:36 am

  10. Steve Conover says:

    Baron Schwartz from Percona says:

    Steve,

    Maybe this will answer your question.

    [http://www.mysqlperformanceblog.com/2007/06/18/using-char-keys-for-joins-how-much-is-the-overhead/](http://www.mysqlperformanceblog.com/2007/06/18/using-char-keys-for-joins-how-much-is-the-overhead/)

    If you’re unsure it matches your situation, you can write a benchmark to test.

    In general, I would say this is a minor issue unless you KNOW that your
    application has performance problems and you KNOW that it’s due to slow
    joins. The key to getting real performance improvement without wasting time
    is to know these things, so you focus your effort where you can get real
    benefit from it.

    Regards,
    Baron

    …which is just good general advice.

    November 11, 2009 at 8:19 am

Add New Comment Cancel reply

Your email address will not be published.

Tyler Schultz

Tyler Schultz
San Francisco

Recent Posts

  • [Standup][SF] 09/21/12: Get older faster
  • Ah, ah, ah. Now, that was silly. Wouldn’t you agree, my bats? Ah, ah, ah.
  • [SF][Standup] 09/19/12: ActiveAdmin vs. RailsAdmin… Arrrggg!
Subscribe to Tyler's Feed

Author Topics

rails (5)
time (2)
css (1)
heroku (2)
javascript (1)
activerecord (1)
android (3)
robolectric (1)
tdd (1)
agile (7)
amazon (1)
default scope (1)
hudson (1)
rspec (1)
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >