Malkovich? Malkovich Malkovich!

15 07 2013

If you ever logged logged in to a windows computer that uses the Japanese display language you probably saw that the path separator is not the common backslash (\) but the Yen symbol (¥).

Well at least that was what I thought, I recently got a defect that we haven’t localized one of our dialogs correctly and it showed backslash on Japanese OSs. My first thought was that somebody hardcoded ‘\’ instead of Path.DirectorySeparatorChar but a quick look at the code showed that we were using the path as supplied by the OS. This forced me to learn something new which I will now inflict on you.

Since reading Joel’s classic The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)  my understanding was that the Unicode range of [0-127] (aka ANSI or ASCII) was the same the world over. Quote:

Eventually this OEM free-for-all got codified in the ANSI standard. In the ANSI standard, everybody agreed on what to do below 128, which was pretty much the same as ASCII, but there were lots of different ways to handle the characters from 128 and on up…

As appropriate for a post describing the “absolute minimum” this is not the whole story, Michael Kaplan’s post when is a backslash not a backslash taught me that on Japanese OSs a backslash (Unicode U+005c which is 92 – less than 128) it is displayed as Yen (¥), even though it is not the Unicode Yen character (Unicode U+00A5). This means that the path separator is still backslash it’s only displayed as a Yen, it also means that the actual Yen is not a path separator. Since Yen is not a path separator it can be used in file names and the following path can mean several different files:

C:¥¥¥¥¥¥¥¥¥¥¥¥¥.¥The first ¥ must actually be a backslash (and the second can’t be a backslash) which means that the file in question may be any of the following:

c:\¥¥¥¥¥¥¥¥\¥¥¥.¥
c:\¥¥¥\¥¥¥¥\¥¥¥.¥
c:\¥¥\¥¥\¥¥\¥¥¥.¥
c:\¥¥¥¥¥¥¥¥\¥¥¥.¥
c:\¥\¥\¥\¥\¥¥¥¥.¥
... and many more

c:\¥\¥\¥\¥\¥¥¥¥.¥

The same story applies to Korean Won ().

tl;dr how a backslash appears depends on the font you use, the path separator is not localized on Japanese OSs.


A more topical title for this post would probably be Hodor hodor hodor.

Advertisements

Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: