John PignataJohn Pignata
Converting Rails application data from MySQL to PostgreSQL
edit Posted by John Pignata on Monday November 23, 2009 at 06:07PM

One of our projects had a pending chore in Tracker to move its backend to PostgreSQL from MySQL. This project has about a quarter of a million rows of production data and around a hundred tables in its schema which needed to be exactly migrated into PostgreSQL.

Forklifting the data proved more complicated than expected due to incompatibilities in the two DBMS' syntax such as in the way string escaping worked, how booleans were represented and a bunch of other small but painful differences. Despite MySQL's mysqldump utility including a command-line option to write statements in PostgeSQL format, it became clear that it wasn't going to be simple to create a repeatable procedure to do this work across our environments.

There's a bunch of information out there about how to approach this problem but none felt right. Most are multi-step manual procedures that require altering a dump file using sed or perl and others require the data to be loaded into an intermediary database and massaged prior to import. After testing some of these approaches, Todd and I decided to timebox ourselves to an hour to test the viability of a Ruby script using the DBI gem to move the data. We came up with:

require 'dbi'
require 'dbd/mysql'
require 'dbd/pg'

begin
  mysql = DBI.connect("DBI:Mysql:source:localhost", "username", "password")
  postgres = DBI.connect("DBI:Pg:destination:localhost", "username", "password")

  mysql.select_all("SHOW TABLES") do |table|
    next if ['schema_migrations', 'sessions'].include?(table.to_s)
    select = mysql.execute("SELECT * FROM #{table}")
    columns = select.column_names.map { |key| "\"#{key}\"" }.join(', ')
    placeholders = (['?'] * select.column_names.size).join(', ')
    insert = postgres.prepare("INSERT INTO #{table} (#{columns}) VALUES(#{placeholders})")
    select.each { |row| insert.execute(*row) }
    insert.finish
  end
rescue DBI::DatabaseError => e
  puts "Error #{e.err}: #{e.errstr}"
ensure
  mysql.disconnect if mysql
  postgres.disconnect if postgres
end

Our antiquely Perl-like script worked better than we expected — our application started right up with all of its data intact.

Has anybody out there encountered this need before? What kinds of solutions did you come up with?

Comments

  1. Michael Siebert Michael Siebert on November 23, 2009 at 11:02PM

    we had almost the same need for our app and after some poking around, i wrote ar_dbcopy gem which worked really welll for a db even bigger than what you outlined. code is at github.com/siebertm/ar_dbcopy

  2. John Pignata John Pignata on November 24, 2009 at 04:54AM

    Cool, thanks Michael. The link is actually: ar-dbcopy

    Looks like Rama McIntosh also came up with an ActiveRecord-ish solution.

  3. Joe Van Dyk Joe Van Dyk on November 24, 2009 at 09:02AM

    Heroku uses the taps gem to extract data from a mysql database to import into postgresql.

    http://adam.blog.heroku.com/past/2009/2/11/taps_for_easy_database_transfers/

  4. John Pignata John Pignata on November 24, 2009 at 09:15AM

    Whoa. That's awesome. Thanks, Joe. The known issue "Foreign Keys get lost in the schema transfer" seems like a doozey, though.

  5. Trevor Turk Trevor Turk on November 27, 2009 at 04:32AM

    I used the taps gem, too. It worked pretty well, but I was also caught by some nasty encoding issues moving from latin1 to utf8, I believe. This was just for a hobby project, though, so that wasn't a big deal :)

  6. John Pignata John Pignata on November 27, 2009 at 07:34AM

    Trevor -- Cool. In our little hack, we had similar encoding pain that we got around by explicitly setting the MySQL connection to output in UTF8:

    mysql.execute("SET NAMES 'UTF8'")

  7. Nathaniel Bibler Nathaniel Bibler on November 30, 2009 at 12:37PM

    You should be aware that changing from a MySQL-backed Rails app to a PostgreSQL-backed application may have some hidden gotchas depending on your table schemas.

    For example, and this is the most common that I've seen: if you have a string column that limits to 100 characters, the Rails MySQL adapter will invisibly truncate the data for you when saving. However, the PostgreSQL adapter will raise a data too long exception, instead of invisibly truncating the data, so you may have to add a local model validation or override the setter to auto-truncate to mimic the MySQL functionality.

  8. Derek Neighbors Derek Neighbors on December 13, 2009 at 09:36PM

    Nathaniel -- Do you know if there is good information detailing all those little gotchas?

Add a Comment (MarkDown available)