Well, I spent the weekend digging into some of the pending Parrot work, of which there is far too much that I bottleneck. (Important project management tip for folks doing project management--identify those places that one person is a bottleneck for, and dive for those areas as quickly as you can. Places that anyone, or other people, can work on can't get done until you do. And Delegate, dammit!) Anyway, one of those places that's pending is the python bytecode stuff. Yeah, I know, it's not as high a priority as, say, objects or exceptions, but given that Larry's not specified either yet it makes sense to tackle bits that don't block on him. Besides, I really need a handle on Python's object system before I finish Parrot's, since I don't want to be building two different object systems if I don't have to.
The reason for this, of course, is the up and coming Great Python Pie-throwing Contest, coming soon to an OSCON 2004 near you! Or near Portland, Oregon, at least, since I have no idea where you are.
So, after an hour or so of digging through with google and the python.org search facilities, I located some reference docs for python's bytecode. (Link so I can find them again later) Oddly, in the 2.1 maint docs, but not in the 2.2 docs anywhere. (Dunno whether it was an oversight or on purpose) The docs are, as you'd generally expect, fairly terse, as this stuff tends to be, since it hangs on knowledge of a whole lot of other stuff--if you know the other stuff you only need a terse description in the docs, and if you don't then no amount of documentation is likely to help.
Reading through assembly documentation (which bytecode is, really) is always an interesting undertaking. All assemblies have quirks, oddities, and other assorted things that make you go "Hmmm...." built into them, as you find things that aren't done the way you might think. They're often a result of other limits (many older CPU assembly languages were downright odd because of the need to save transistors), the personality of the designer, or hasty decisions, or dead-ends accumulated over time, and finding those is fun.
Python, of course, is no different. The bytecode has a bunch of standard stuff--stack rotates, math, and binary operations--but it's got some other oddities. The four-item stack rotate, for example, is kind of interesting, and a bit unusual. (Generally you have top-two swap and top three rotate, and if more's needed there'll be a general-purpose "rotate N" op) PRINT_NEWLINE, which prints a newline, is also kind of interesting, as having an op dedicated to printing just a newline's not common. (This isn't normally a time-critical operation, but it does have the advantage of being able to emit a platform-native newline, without having to translate a linefeed character or \n escape sequence, so it's a niftyish idea)
Then you get to the places where it became obvious that a limit was hit. The python bytecode apparently mandates 16 bit parameters. That's OK for most things, but the EXTENDED_ARG op, which holds bits 16-31 of the argument for the following op (making it a 32 bit argument split into two pieces) shows those places where it wasn't quite enough. (I'm wondering whether I can safely merge the two args together when I generate parrot bytecode--I can see Really Tricky Code branching to the op following the EXTENDED_ARG op in some cases, either to have a variety of top-16 bits or to take it as either a 16 or 32 bit op, but I'd bet that's not going to happen, as the Python folks tend to violently disagree with such things, which works for me)
You can also see how deeply various data types are welded into the interpreter. The tuple (Basically a fixed-length list of items) and the hash (sorry, dictionary) is pretty fundamental to python, while objects are... well, objects are an afterthought. At least that's the way they look from under the hood. The interpreter does support them, but there's barely more than what Perl has under here. I'm not a Python Guy, so I can't speak for the language support for objects, which may or may not be more significant. (Whether it is depends on how much the person explaining them likes Python) You certainly don't need anything fancy under the hood to support a fancy model--heck, it's all bits fed to the CPU when you get to the end of it--so I don't much care. A simple model actually makes it easier on me.
And, of course, there's the hackish stuff. CMP_OP springs to mind--this is an opcode that takes the name of the comparison operation as a parameter, then looks that comparison name up in a table somewhere and does it. That's... interesting. On the face it seems rather excessively indirect, but as I don't have the python sources at hand I can't say for sure that it's silly. I suppose there could be an extendable comparison table facility to allow the addition of new operators, but that doesn't seem too Pythonish. There's also the LOAD_CLOSURE op which takes a single integer parameter that indexes into two name tables simultaneously--if the offset is within the first table (the
co_cellvars table) it's that name, and if it isn't it's an offset into the second (
co_freevars) name table (after subtracting the number of items in the first name table) and it looks like anything else that references the cell and free variable store does this too. I suppose if nothing else it'll keep you from adding in new cells to things at runtime...
Anyway, looking at this all as a whole, it should be pretty trivial to get parrot to run bython bytecode, at least once the bytecode's preprocessed to turn it into parrot bytecode. At the moment the big issue is how much work should be done on all of it. Parrot, of course, is a register machine, while Python is a stack machine. It'd be dead-simple to have a bunch of python-specific ops that work just on the stack, but that's not the way to speed, honestly. It's even possible (though not particularly likely) that the python interpreter would actually be faster in that case, though honestly I doubt it. Might do it that way anyway, just to see, though. Handling the quirky bits shouldn't be a problem at all.
The bigger question is one of integration. The big reason I'm doing this is to get a chance to hurl a pie at Guido (well, probably Guido) which I've found to be rather motivating, but I'd rather this not end up as a one-off hack. (Ignoring the stack analysis code that'll result from it, though that's hardly trivial) What I want is for this to develop into a real, integrated python bytecode translator and interpreter, so Parrot can execute Python bytecode alongside other Parrot languages. Building a Python compiler to target Parrot would be the best way, and I suppose someone'll do that, but there's a limit to the motivating power of pie. Besides, there's a fair amount of bytecode around, so it's certainly a sensible thing to do.
The timeline, FWIW, is that at the Python Lightning talks at OSCON 2003 Guido and I will formally announce the details of the contest and the rules. (Which still need to be worked out, though we've a rough idea of what we're doing) Sometime by late December 2003 the bytecode for the test program will be generated, and at OSCON 2004 we'll have the actual face-off, followed by pie at 10 paces. (Mmmmm, pie!) While I expect it won't be a hugely exciting thing, I'm sure the video of the pie will haunt someone for years to come. Which should be fun.
(Yeah, yeah, I know... "Them's fightin' words!" Well, duh! Fight's already started :)Posted by Dan at June 16, 2003 02:32 PM | TrackBack (0)