Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

PIVOTAL LABS
Standup 11/10/2009: Facebook ids, Rails association bug, Carnegie Mellon lecture

Ask for Help

“We’re getting id’s from facebook that are overflowing INT columns in mysql tables. Should I use BIGINT to accommodate these ids? Is this wasteful?”

Use string columns instead. VARCHAR columns will only use as much space as the id needs.

“Are VARCHAR columns slower to join than INT columns?”

Not in mysql. Strings do not have join performance penalty compared to integers.

Interesting Things

  • Edward was a guest lecturer for a Carnegie Mellon University course where students are learning to do Agile development. Course work involves pair programming, TDD, and using Ruby on Rails. The talk was well received and Edward is invited to come back next year!

  • There is an apparent rails bug (2.3.2) when building associated objects and then saving them.

class User < ActiveRecord::Base
 attr_accessor :bool
 before_validation_on_create :set_bool

 def set_bool
   self.bool = true
 end
end

class Child < User
 belongs_to :adult
 before_validation_on_create :create_adult

 def create_adult
   if adult.nil?
     self.build_adult()
   end
 end
end

class Adult < User
end

specify "should work" do
 child = Child.create!()
 child.bool.should be_true
 child.parent.bool.should be_true #fails
end

Comments
  1. Simon Russell says:

    I’m obviously missing something here, but isn’t storing ids in VARCHARs going to be wasting space for all ids > approx 2^27? (BIGINTs are 8 bytes.) Also, doesn’t mysql store VARCHARs quite inefficiently if you are using multibyte encodings (meaning that nearly all ids will be inefficiently stored)?

    (unless of course you’re encoding the numbers differently into the strings.)

  2. Coda Hale says:

    > VARCHAR columns will only use as much space as the id needs.

    A BIGINT is a 64-bit integer. It will always and only take 64 bits. The smallest number which isn’t representable by an INTEGER is 2147483648, which stored as a VARCHAR will take 80 bits.

    So in a best-case scenario a VARCHAR is 25% bigger. In a worst-case scenario (unsigned BIGINT) a VARCHAR is 150% bigger.

    Use a BIGINT.

  3. TK says:

    > Strings do not have join performance penalty compared to integers.

    Is this really true? I haven’t tested recently but in the past I have found that ints join faster than varchars.

  4. Steve Conover says:

    So the strings-aren’t-slow-in-joins comment was mine, based on a conversation I had with a MySQL DBA (who I consider an authority) from a couple years ago.

    Here’s a recent post:

    [http://www.andrewrollins.com/2009/06/21/mysql-join-performance/](http://www.andrewrollins.com/2009/06/21/mysql-join-performance/)

    that would suggest that TK is right, Strings indeed result in a mild performance hit (I don’t think those numbers would scare me off. I’d still be interested in a test that went into the millions of rows).

    But I’m going to put this question to Peter and Vadim over at the MySQL Performance Blog and see what they have to say. Maybe we can settle this once and for all (for 5.x at least…).

    -Steve

  5. I’m guessing it’s just a typo in the example, unless I’m missing something…

    Child belongs Adult, but then you call child.parent

  6. Kevin Tsai says:

    I was one of the students at CMU during Edward’s lecture. It was a very inspirational talk and made me appreciate Extreme Programming.

  7. Coda Hale says:

    Steve, there are no wins for representing big numbers in MySQL using VARCHAR. It takes up more space, even if it has comparable performance in a JOIN.

    On a 64-bit server (which all MySQL servers should be), comparing two 64-bit integers is a single CPU opcode. Even if MySQL did a bit-wise comparison (which it doesn’t — strings are compared using MySQL’s collation settings) it would still require two 64-bit comparisons.

    It takes up more space and is at best a statistical draw in terms of performance.

  8. If you use strings you’re safe against any future changes in ID format. Using integers is presuming that the ID will always be an integer of some sort.

    http://simonwillison.net/2009/Jun/9/twitpocalypse/#c46139

  9. Steve Conover says:

    Coda – agreed

  10. Steve Conover says:

    Baron Schwartz from Percona says:

    Steve,

    Maybe this will answer your question.

    [http://www.mysqlperformanceblog.com/2007/06/18/using-char-keys-for-joins-how-much-is-the-overhead/](http://www.mysqlperformanceblog.com/2007/06/18/using-char-keys-for-joins-how-much-is-the-overhead/)

    If you’re unsure it matches your situation, you can write a benchmark to test.

    In general, I would say this is a minor issue unless you KNOW that your
    application has performance problems and you KNOW that it’s due to slow
    joins. The key to getting real performance improvement without wasting time
    is to know these things, so you focus your effort where you can get real
    benefit from it.

    Regards,
    Baron

    …which is just good general advice.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *