Archive for Uncategorized

Database Character Set Latin1 to UTF8

Update: This post has inaccuracies and should no longer be used as a guide, but you may find latin1 to utf8 useful.

I was long struggling with the problem of migrating some data from a legacy DB to a Rails application. The original character set was in latin1 and the data were all messed up. Further it does not copy exactly when I try to copy via rails to the new model. I finally have a solution that is uglier than the problem.

Source latin1/non-utf database

The source database should be dumped into a file with this…

mysqldump --default-character-set=latin1 --skip-set-charset -u -p dbname > source.sql

Next, replace latin1 in the dumped sql with utf8 using…

sed 's/latin1/utf8/g' dumpfile.sql > source_utf8.sql

Perhaps you can re-create the database again in a different computer, that way you do not lose the original source database or create another database using…

  CREATE DATABASE dbname_utf8 CHARACTER SET utf8 COLLATE utf8_general_ci;

Now dump the data into the new temporary database that we use to migrate from

  mysql -uUSER dbname_utf8 < source_utf8.sql

Copying to the destination

Now the madness, I call it madness because it only works if I precisely did it this way. I kind of understand the reason but not 100% sure.

As an example, in the source database dbname_utf8 you have a table that we will map to a model say OldNews and we have a News model for the rails app and we want to copy the data across. The steps that worked for me are...

  • Create or alter the new table in the rails app, in this case it will be news with utf8 charset columns.
  • Set utf8 as encoding for source (dbname_utf8) and destination databases in your config/database.yml
  • Do not use any Iconv stuff, it made a bigger mess of my data
  • Make a method that will help you to migrate the data across from OldNews to the News
  • Now alter the database.yml to whatever the original default encoding was for the new database or remove the utf8 encoding.

Thats it, the above worked for me after a lots of trial and error. Hope it works for others in similar position.