Squawks of the Parrot: June 2004 Archives

June 24, 2004

The internet is full.

And so's my /var partition. 6G just doesn't go as far as it used to.

I apparently made a big mistake back when I set this new server up -- I told Amavis to quarrantine mail it decided was infected rather than flat-out deleting it. (Though I don't auto-reply. Now if we could just get other systems to not do that too...) You never know, right? Viruses used to infect real files, so there was the real possibility of someone sending on a useful file with a virus attached.

I suppose that could still happen, but if it does, well... too bad. I last swamped out the quarrantine directory (checking nothing, honestly) May 10th. 45 damn days ago. In that time I got 6G of identified virus mail in. The directory file was 9.8M and there were who knows how many files in the thing. (I didn't count--there were far, fartoo many files for the shell to wildcard expand, so I blew away the directory and recreated it)

This is why I'm waiting for multi-megabit DSL to get rolled out in the US. Not so everyone can go snag warez, pr0n, or pirated AVIs. (Or even so my FTP, web, and bittorrent servers have reasonable upload speeds) No, it's so I've bandwidth left over after the spam and virus crap...

Posted by Dan at 02:34 PM | Comments (3) | TrackBack

June 22, 2004

Inching towards merengue...

So, I've been working on and off with Python::Bytecode, getting it up to snuff. The basic code itself looks to be a simple translation of the Python unfreezing code, though it's needed some thumping for Python 2.3. (Things have changed sizes since the code was originally written, and some conditional things were a bit dodgy)

I've actually got it fully working, which I wrote about earlier (I think) with one caveat--the code assumes that there's only one code object around at any one time, which isn't generally true. That is, when you disassemble a piece of bytecode you normally have multiple code objects. And a disassembler that only does one code object is sub-optimal. (This, by the way, is why the tests I was doing didn't match what python's dis module showed. It had the same problem, disassembling only one code object, it just chose a different one to disassemble. (First rather than last. Or vice versa, I don't remember)) Anyway, now I'm rejigging the relatively simple internals to be less simple, splitting out the bytecode management chunks from the code object chunks.

Hopefully by the end of the day (or, rather, end of the night, since I shouldn't burn work time on this, though a working python bytecode transcoder would make writing library code easier than doing it in PIR (well, other than the whole 'writing it in Python' thing (more because I don't know the language well enough to use it. I'll save rants about the language proper until such a time as I actually do, since it's kinda stupid to rant about things I don't know anything about, and I prefer to save that for other things... :))) I'll have a working disassembler. At which point we're in a position to write the transcoder, and from there we can fill in the blanks with the library code.

FWIW, if you're following along at home, the library code is by far the biggest potential issue. The bytecode disassembly and translation's not that big a deal, relatively speaking, though the library code is.

I've got to admit, at this point I am nervous about winning, not because I think parrot can't perform but because I'm not sure there'll be time to finish. This has been held up in part by machine problems (my laptop's in the shop again, dammit) but mostly by real world concerns. Work and family's keeping me busy (maybe you can tell your six year old daughter you can't go to the park with her because you need to throw a pie at Guido, but I can't :) so I've had a lot less time to put into this than I thought. On the other hand, we've gotten Leo involved, so things may well progress... rapidly.

The translation code's a single-threaded bottleneck, but once the translator's done we can get more folks involved, since library work can be done multithreaded. And as a side-effect, parrot'll get a good chunk 'o runtime library implemented, and I can't complain about that. One way or another this will get done, even if we don't make OSCON. That'll give us a nice, working python to parrot translator, which isn't a bad thing at all.

Posted by Dan at 10:19 AM | Comments (3) | TrackBack

June 21, 2004

Lambda's moved

Part of the fallout of all the weblogs.com blogs going away. It's now at http://lambda-the-ultimate.org/ so update your bookmarks appropriately.

As a side-effect, it's much, much snappier now, which is cool.

Posted by Dan at 02:25 PM | Comments (0) | TrackBack

Too much talk

So, it turned out that I ran really long last thursday, and didn't get to the talk I'd actually prepared for the night, which was a bit disappointing. Then my laptop died (again, dammit) so I've not been able to get an annotated version of the slides for the second presentation together for everyone to look at (since it's not like I'm going to be giving this presentation anywhere else anytime soon--not too many folks are that interested in some of the details of implementing a virtual machine). When I get the machine back, assuming it works this tine, I'll get that done and a PDF of the slides posted.

As side-effects of the talk, I found that I rather like curried goat. Oh, and the cats are completely unimpressed with the spiffy new laser pointer. (The dog, though, chases it with some glee. Go figure)

Posted by Dan at 09:45 AM | Comments (0) | TrackBack

June 16, 2004

Strings, revisited

So, I finally did the last draft of the bytecode/assembly level string design for Parrot. It was a mixed bag--the per-string language tag is gone (darn!) but national character sets stay (yay!) with a set of "It's all Unicode no matter what you say" string ops thrown into the mix. Like any other engineering task with multiple conflicting requirements and strong proponents of different schemes, it's safe to say that everyone's unhappy with the result, but I think everyone can make do with what we have.

What ultimately resulted, if you don't feel like going and looking up the post in the archives (I'm offline so I don't have access to a URL), is this.

A 'string', for parrot, is a combination of byte buffer and grapheme buffer. (Graphemes are the smallest unit of text representable. They're usually represented by a single integer, but accented characters and some scripts may represent them with more than one integer) Yes, this is a bad idea, but it's how programs deal with them, so we cope. Anyway, programs may look at these strings byte by byte, integer by integer, or grapheme by grapheme. Each string has an encoding (which is responsible for turning the bytes in the underlying buffer to integer code points) and a character set (which is responsible for giving some meaning to those code points) attached to it. Programs can deal with strings either in their 'native' form or as purely unicode data, and if a string isn't unicode, treating as unicode will cause parrot to automatically convert it from whatever form it is to Unicode. (Which makes the "All-Unicode all the time" folks reasonably content)

This duality provides the benefits of delayed (possibly delayed to never) conversion saving CPU time, mmappability of the source text (hey, after all, if it's not Unicode on disk but you never convert it, and are only reading it, why not just map the file into memory and pretend you read it the old-fashioned way?), and the ability to natively manipulate non-Unicode text without having to pretend there are files involved. (Because sometimes you do need to use native character sets without files--if you're generating zip files in-memory, or talking to a database) Plus there's the bonus of not burning conversion time to hoist Latin-n text to Unicode if you really do want to treat it as Latin-n text.

The encoding and character set systems are all pluggable and dynamically loadable as well, so if you don't want to yank in ICU to process your ASCII text, you don't have to. Which is swell for the large number of people who don't want to.

The single most difficult part of this job, by the way, isn't the technical issues. It's the politics. But at least I knew that going in. (Though, honestly, knowing and understanding are two very different things)

Posted by Dan at 10:30 PM | Comments (0) | TrackBack

June 15, 2004

Welcome to the Apocalypse! Please enjoy your stay

Yesterday, I officially stopped failing to embrace Unicode. Today I'm defining basic features of Parrot that are required to be OO. Who knows, maybe tomorrow I'll enjoy programming in C.

If anyone sees a bunch of guys on horses, don't forget to offer 'em some carrots. (And watch out for guys all in black and white smoking cigarettes...)

Posted by Dan at 02:18 PM | Comments (5) | TrackBack

June 14, 2004

Zombie PCs.... not just for mail spam any more

Nope, looks like they're used for blog spam now too. At least it looks that way from my MT logs, as I look at all the stuff that MT-Blacklist has blocked. (Mmmmm, MT-blacklist goodness. Regardless of whatever happens with MT 3.x's license, code, or fortune cookies, there's no way in hell I'm switching from the MT2.661 setup I have now if there's no blacklist equivalent for it)

Dunno whether they're generic web proxies, or blog-spam-specific things. Since the wave hit more than a week ago it's tough to tell for sure, since dynamic IP addresses for dialup machines and whatnot'll have long-since changed. There are a variety of webservers and whatnot running on some of the IP addresses now. (I almost wonder if someone's got some sort of auto-posting web-spam bug XSS exploit. Or something)

Posted by Dan at 08:13 PM | Comments (1) | TrackBack

Starting the long slide to standardization

It's bound to happen, but it's something that almost nobody working on a new project wants to deal with -- standardization. Or productization, or some other -ization, of which there are far too many. But it's that point at which you need to look at things and decide that things have gotten large enough that it's time to say "This Will Not Change" and be done with it. It's got to be done, of course, if you ever want a project to move past the toy stage.

Parrot's been doing this in fits and starts as we go along, though up until now many of the "permanent" decisions (for some fairly variable definition of permanent) have been more design things than implementation things. Most of the opcodes have been pretty permanent, but that's about it. Most of the rest is firm but not really fixed, at least not officially. Today, though.... today we start making things official.

In this case, we're officially mapping out the basic variable types that parrot will ship with. (The guarantees here are for a normal version of parrot--stripped down versions may have fewer of these) Nothing fancy--basic undef/int/float/string/bool PMC types and their array variants, plus some of the types parrot uses internally (such as the environment PMC and ordered hash we use for namespaces and pads) but they need defining, so... they're defined. Up until now folks have been generally using the Perl* variants, but besides being distasteful to some, those classes do more than a basic type ought, so this'll be good there.

If you're following along with docs, these types are defined in PDD 17, Basic Types.

Posted by Dan at 11:29 AM | Comments (0) | TrackBack

Reminder for Thursday: Bring rotten tomatoes!

Yep, I'll be talking to the Boston ACM about parrot. It'll be a two-part talk, first on the basic structure of parrot and then on some of the tools and techniques we use to build the software. Should be interesting, or at least I think so. :)

Posted by Dan at 10:55 AM | Comments (1) | TrackBack

June 01, 2004

Renewing my basic faith in humanity

Though I'm not saying what I have faith in them to do. Still, Oingo Boingo does say it best, don't they? Nasty Habits and Clowns of Death (since, after all, boys will be boys...) Mmmm, clowns.

Posted by Dan at 03:12 PM | Comments (4) | TrackBack