December 18, 2003

Argh!

I swear, text will be the death of me. (If you thought it was objects, nope--they're obnoxious, overblown, and the OO Kool-Ade tastes of almonds, but hardly a full-blown nemesis or anything)

As an example, a page that Kim found.

While on the one hand I do really like the shape and form of the alphabets and writing (no surprise, I'm a font magpie too) the implications in actually processing text in these languages is painful to think of. That's even ignoring the issues of rendering or OCRing these sorts of languages. (One big screaming example--you'll note on that page that the trailing sigma has a separate character in the Unicode set, but it should be treated as a plain sigma for text searching reasons. And imagine what should happen to the sigma character if you substr a string and the last character happens to be a sigma that was, up until a moment ago, in the middle of the word. Then you concat a space and another word for display....)

Posted by Dan at December 18, 2003 01:51 PM | TrackBack (0)
Comments

That page is a good reality check for people who want *all* of Unicode implemented in Perl :-)

Once one has gotten an eyeful of rendering "fun", the next thing to do is to consider _editing_ such text.

Quick, what happens if your cursor is in the middle of a Hebrew word and you press Delete? Backspace? What happens visually versus in "backing store" (the stored text)?

What if you have _mixed_ left-to-right and right-to-left text? You'll end up having things like split cursors.

How about top-to-bottom layouts like traditional Chinese and Japanese?

How about selecting text? Pasting text?

Posted by: jhi at December 19, 2003 12:20 PM

At some point, people will have to give up and admit that some of these writing systems are not "text" -- there are too many parameters to encode, and too many nuances to enter with any kind of keyboard. For better or worse, speakers of these languages who want to use computers will either adopt new alphabets, or compute in English. This sounds culturally insensitive, but I have great faith in people's ability to take the easiest path, and at some point it's easier to change your language or your alphabet than to try to make machines understand it. Or at least, to make them understand it as "text".

Posted by: Sean O'Rourke at January 6, 2004 10:54 PM