Ruby 1.9 and encodings, lament № 1

Yehuda Katz, has done an excellent writeup on Ruby and encodings, but under the heading “Why this is, in practice, a rare problem”, I think he’s explained exactly why, in practice, there is a common problem:

In practice, most sources of data, without any further work, are already encoded as UTF-8. For instance, the default Rails MySQL connection specifies a UTF-8 client encoding, so even an ISO-8859-1 database will return UTF-8 data.

Many other data sources, such as MongoDB, only support UTF-8 data internally, so their Ruby 1.9-compatible drivers already return UTF-8 encoded data.

Your text editor (TextMate) likely defaults to saving your templates as UTF-8, so the characters in the templates are already encoded in UTF-8.

And then Ruby 1.9, unless you explicitly tells it otherwise, defaults to interpreting the file, and all strings defined in it, as ASCII and all that is for nothing.

Leave a Reply