It's time to add in case mangling to parrot. In part because I need it, and in part because, well, it's really well past time to be able to reasonably be able to say "Gimme a (lower|upper|title)-case version of this string".
If you're thinking "Why is this not there yet?", well, I'm tempted to quote Written Language Barbie -- case-mangling is hard. I won't, though, because it's not hard, it's just tedious. And it requires a fair amount of thought to set up the frameworks so you can actually do it properly. (Case identification belongs in the character set, while case transformation is a language-specific operation--if you split language and character set operations out the functions then need to be in separate spots)
Luckily for me, one of the times I was up at O'Reilly's Cambridge office with my car I though to grab a copy of Ken Lunde's CJKV Information Processing book. (And a car was almost a requirement--at 1100+ pages you could hurt yourself hauling it around. Besides, it may well count as a deadly weapon so you couldn't take it on the T anyway) Not, mind, because it covers everything that we'd ever possibly want to do (it's only Chinese, Japanese, Korean, and Vietnamese) but because it's a good reference for a number of encodings and character sets that aren't Unicode, which is nice. (And that I have a personal interest in, which is also nice -- while I have no doubt that Arabic, Hebrew, and Cyrillic are fascinating, but not for me) They've also the advantage of being relatively simple, something that Unicode definitely is not. Plus there's the added advantage of having several semi-related character sets handy. (Well, OK, not exactly related as such, but there are well-known transforms amongst at least some of them, and if we can't get the Big5 transforms right it's time to pack it in)
Yes, this means that Parrot will probably get loadable encoding, character set, and language library code in Real Soon Now. With Unicode too, just to be complete, at least if I can get ICU building properly on Debian Linux. (There's something weird going on that makes it fail on a latest-rev Sarge system, though odds are I'll just punt and teach Configure to link against the system ICU if it's installed)
Soon, I'll be able to make a fool of myself in several languages, not just one! Woohoo! :)Posted by Dan at January 19, 2004 05:17 PM | TrackBack (0)