Ask for Help
“We’re getting id’s from facebook that are overflowing INT columns in mysql tables. Should I use BIGINT to accommodate these ids? Is this wasteful?”
Use string columns instead. VARCHAR columns will only use as much space as the id needs.
“Are VARCHAR columns slower to join than INT columns?”
Not in mysql. Strings do not have join performance penalty compared to integers.
Interesting Things
Edward was a guest lecturer for a Carnegie Mellon University course where students are learning to do Agile development. Course work involves pair programming, TDD, and using Ruby on Rails. The talk was well received and Edward is invited to come back next year!
There is an apparent rails bug (2.3.2) when building associated objects and then saving them.
class User < ActiveRecord::Base
attr_accessor :bool
before_validation_on_create :set_bool
def set_bool
self.bool = true
end
end
class Child < User
belongs_to :adult
before_validation_on_create :create_adult
def create_adult
if adult.nil?
self.build_adult()
end
end
end
class Adult < User
end
specify "should work" do
child = Child.create!()
child.bool.should be_true
child.parent.bool.should be_true #fails
end
I’m obviously missing something here, but isn’t storing ids in VARCHARs going to be wasting space for all ids > approx 2^27? (BIGINTs are 8 bytes.) Also, doesn’t mysql store VARCHARs quite inefficiently if you are using multibyte encodings (meaning that nearly all ids will be inefficiently stored)?
(unless of course you’re encoding the numbers differently into the strings.)
November 10, 2009 at 2:44 pm
> VARCHAR columns will only use as much space as the id needs.
A BIGINT is a 64-bit integer. It will always and only take 64 bits. The smallest number which isn’t representable by an INTEGER is 2147483648, which stored as a VARCHAR will take 80 bits.
So in a best-case scenario a VARCHAR is 25% bigger. In a worst-case scenario (unsigned BIGINT) a VARCHAR is 150% bigger.
Use a BIGINT.
November 10, 2009 at 2:49 pm
> Strings do not have join performance penalty compared to integers.
Is this really true? I haven’t tested recently but in the past I have found that ints join faster than varchars.
November 10, 2009 at 3:16 pm
So the strings-aren’t-slow-in-joins comment was mine, based on a conversation I had with a MySQL DBA (who I consider an authority) from a couple years ago.
Here’s a recent post:
[http://www.andrewrollins.com/2009/06/21/mysql-join-performance/](http://www.andrewrollins.com/2009/06/21/mysql-join-performance/)
that would suggest that TK is right, Strings indeed result in a mild performance hit (I don’t think those numbers would scare me off. I’d still be interested in a test that went into the millions of rows).
But I’m going to put this question to Peter and Vadim over at the MySQL Performance Blog and see what they have to say. Maybe we can settle this once and for all (for 5.x at least…).
-Steve
November 10, 2009 at 7:06 pm
I’m guessing it’s just a typo in the example, unless I’m missing something…
Child belongs Adult, but then you call child.parent
November 10, 2009 at 8:29 pm
I was one of the students at CMU during Edward’s lecture. It was a very inspirational talk and made me appreciate Extreme Programming.
November 10, 2009 at 10:05 pm
Steve, there are no wins for representing big numbers in MySQL using VARCHAR. It takes up more space, even if it has comparable performance in a JOIN.
On a 64-bit server (which all MySQL servers should be), comparing two 64-bit integers is a single CPU opcode. Even if MySQL did a bit-wise comparison (which it doesn’t — strings are compared using MySQL’s collation settings) it would still require two 64-bit comparisons.
It takes up more space and is at best a statistical draw in terms of performance.
November 11, 2009 at 12:40 am
If you use strings you’re safe against any future changes in ID format. Using integers is presuming that the ID will always be an integer of some sort.
http://simonwillison.net/2009/Jun/9/twitpocalypse/#c46139
November 11, 2009 at 1:27 am
Coda – agreed
November 11, 2009 at 7:36 am
Baron Schwartz from Percona says:
Steve,
Maybe this will answer your question.
[http://www.mysqlperformanceblog.com/2007/06/18/using-char-keys-for-joins-how-much-is-the-overhead/](http://www.mysqlperformanceblog.com/2007/06/18/using-char-keys-for-joins-how-much-is-the-overhead/)
If you’re unsure it matches your situation, you can write a benchmark to test.
In general, I would say this is a minor issue unless you KNOW that your
application has performance problems and you KNOW that it’s due to slow
joins. The key to getting real performance improvement without wasting time
is to know these things, so you focus your effort where you can get real
benefit from it.
Regards,
Baron
…which is just good general advice.
November 11, 2009 at 8:19 am