I was long struggling with the problem of migrating some data from a legacy DB to a Rails application. The original character set was in latin1 and the data were all messed up. Further it does not copy exactly when I try to copy via rails to the new model. I finally have a solution that is uglier than the problem.
Source latin1/non-utf database
The source database should be dumped into a file with this…
mysqldump --default-character-set=latin1 --skip-set-charset -u -p dbname > source.sql
Next, replace latin1 in the dumped sql with utf8 using…
sed 's/latin1/utf8/g' dumpfile.sql > source_utf8.sql
Perhaps you can re-create the database again in a different computer, that way you do not lose the original source database or create another database using…
CREATE DATABASE dbname_utf8 CHARACTER SET utf8 COLLATE utf8_general_ci;
Now dump the data into the new temporary database that we use to migrate from
mysql -uUSER dbname_utf8 < source_utf8.sql
Copying to the destination
Now the madness, I call it madness because it only works if I precisely did it this way. I kind of understand the reason but not 100% sure.
As an example, in the source database dbname_utf8 you have a table that we will map to a model say OldNews and we have a News model for the rails app and we want to copy the data across. The steps that worked for me are...
- Create or alter the new table in the rails app, in this case it will be news with utf8 charset columns.
- Set utf8 as encoding for source (dbname_utf8) and destination databases in your config/database.yml
- Do not use any Iconv stuff, it made a bigger mess of my data
- Make a method that will help you to migrate the data across from OldNews to the News
- Now alter the database.yml to whatever the original default encoding was for the new database or remove the utf8 encoding.
Thats it, the above worked for me after a lots of trial and error. Hope it works for others in similar position.