Archive for August, 2008

Rails NULL Encoding of MySQL Results

If you are not careful, the encoding that you set in database.yml may bring some ill-effects. The encoding: utf8 causes ActiveRecord connection to execute set names = 'utf8' in the MySQL session. This tells MySQL server that the this particular session sends and want to receive everything in utf8 character set.

Even the column stored in ‘latin1’ is converted to ‘utf8’. This takes a considerable hit on the performance, I also have a suspicion that MySQL does not cache results when it is forced to do the conversion.

How to avoid

1. Store data in correct encoding as it needs.

2. By telling MySQL to not convert the results by initaliazing rails with…

ActiveRecord::Base.connection.execute('SET CHARACTER_SET_RESULTS = NULL')


MySQL: Encoding issues – latin1 to utf8 – easy

There are many reasons why you should get the encoding right in the mysql columns. Unfortunately when you started it, the databases default encoding was in latin1 and now you want to convert to utf8 as per your needs. You can easily do it within the mysql.

Lets do this using an example. For this purpose we have a table called news. This used to have default encoding as latin1, so the column title varchar(255) and content text were storing values in latin1.

To change the values from latin1 to utf8 for the two string columns:

Step 1: set the original default encoding to the columns:

alter table news 
       modify title varchar(255) character set 'latin1', 
       modify content text character set 'latin1';

Step 2: now change the column type to binary types:

alter table news 
       modify title varbinary(255),
       modify content varbinary(65536);

Step 3: now convert to utf8:

alter table news 
        modify title varchar(255) character set 'utf8',
        modify content text character set 'utf8';

This is it, you should be all converted from latin1 to utf8. Also remember to pass the original attributes of the those columns again when using modify column.

Comments (1)

Database Character Set Latin1 to UTF8

Update: This post has inaccuracies and should no longer be used as a guide, but you may find latin1 to utf8 useful.

I was long struggling with the problem of migrating some data from a legacy DB to a Rails application. The original character set was in latin1 and the data were all messed up. Further it does not copy exactly when I try to copy via rails to the new model. I finally have a solution that is uglier than the problem.

Source latin1/non-utf database

The source database should be dumped into a file with this…

mysqldump --default-character-set=latin1 --skip-set-charset -u -p dbname > source.sql

Next, replace latin1 in the dumped sql with utf8 using…

sed 's/latin1/utf8/g' dumpfile.sql > source_utf8.sql

Perhaps you can re-create the database again in a different computer, that way you do not lose the original source database or create another database using…

  CREATE DATABASE dbname_utf8 CHARACTER SET utf8 COLLATE utf8_general_ci;

Now dump the data into the new temporary database that we use to migrate from

  mysql -uUSER dbname_utf8 < source_utf8.sql

Copying to the destination

Now the madness, I call it madness because it only works if I precisely did it this way. I kind of understand the reason but not 100% sure.

As an example, in the source database dbname_utf8 you have a table that we will map to a model say OldNews and we have a News model for the rails app and we want to copy the data across. The steps that worked for me are...

  • Create or alter the new table in the rails app, in this case it will be news with utf8 charset columns.
  • Set utf8 as encoding for source (dbname_utf8) and destination databases in your config/database.yml
  • Do not use any Iconv stuff, it made a bigger mess of my data
  • Make a method that will help you to migrate the data across from OldNews to the News
  • Now alter the database.yml to whatever the original default encoding was for the new database or remove the utf8 encoding.

Thats it, the above worked for me after a lots of trial and error. Hope it works for others in similar position.


AppleScript Tools and the GUI

Some applications do not expose API (Dictionary?) and you have no choice than to script-mimic the GUI events. AppleScipt on its own reminds me of COBOL (exaggerated) and scripting GUI is not proving to be easy either. The UIElementInspector, an easy to use app proves to be handly. It gives the full hierarchy of an UI element. You would really appreciate it when your code seems fine but is unable to find the menu item that looks like “Export Image…”, people expect it to be an ellipsis but no they are three periods.

AppleScript tools

Anyway, I am a beginner of AppleScript and I found these tools quite useful:

  • Editor: AppleScript Editor sucks, TextMate wins.
  • Dictionary: To view the exposed APIs appscript tools’ ASDictionary comes handy, which does a decent job of listing all the Objects and Methods in HTML can also show them in Ruby, Python, AppleScript and ObjectiveC terminologies.
  • Language Translator If you appreciated Ruby instead of AppleScript or Python or Objective C for that matter, using ASTranslator may help. Very handy when your google search returned snippets of AppleScript and you are scripting in Ruby.
  • GUI Elements: Of course the above mentioned UIElementInspector when you have no idea what is the hierarchy of a particular element.

Is there any other tool that I should know?


Midnight publishing of objects & MySQL Query Cache

Usually we have to list objects that are published or expired on some day or time, for e.g. in a shop catalog, say you want to make a product available on a particular date and your conditions may look like the following in a Rails model:

  named_scope :available, lambda{
      :conditions=>["products.is_available = 1 and products.available_at <= ?",]

MySQL has a facility to cache the results of queries and respond quickly when asked again. To make use of it, the query should be constant. In the above, the changing every second, you end up asking MySQL with different query every second. In reality, many listings can work without being too sensitive to time. You can publish at midnight without doing much harm. A simple change to or even the immediate hour can result in great performance of your queries:

  named_scope :available, lambda{
      :conditions=>["products.is_available = 1 and products.available_at <= ?",]

Comments (2)

UTF8 BOM, Rails Views & Phantom Space

Some how my XHTML was not formatting properly. Firefox, added phantom spacing around an element, which I was not able to recover with the CSS. I found rails was sending a weird character where it merges the template with the view yield. That turned out to be the UTF8 BOM which was added by my previous text editors to the view files. Textmate does not add this and it wont remove if the file already has one. The worse part is, rails renderer send this to the browser as it is, even if it is in the middle of the XHTML.

The quick fix is to remove the BOM, if your editor does not allow you to do so, a quick google search on removing UTF8 BOM can get you some script to remove BOM.

Comments (1)