Well, the python lightning talks are over, and I've officially and publicly conceded the contest. In the end we only managed four of the seven tests (and I found that Leo's version didn't do import, so you actually need to do some cut-n-paste to get one of the tests running) and part of a fifth. Guido was quite gracious and didn't pitch a pie at me, and while he disappointed many in the crowd (Perl, Python, and the random spectacle seekers) I do appreciate it.
Anyway, here's what we do have for results.
How do you duplicate them?
First, you need an optimized build of Parrot. By default we don't build parrot with optimizations. Why? Well, we're still deep in development. It's really a pain to debug an optimized build (or a core dump from an optimized build) because of some of the things the optimizer does, so we don't. Sensible for development, less sensible for performance checks. (On the other hand, if your code is snappy without optimizations, that's a cool thing too) So, when you configure pass in the --optimize flag.
Second, you need to do the runs on an x86 linux system. This is especially true when running with the JIT--each architecture has a different set of ops that get fast JITting, with the x86 doing best.
Third, you need to run the tests at least twice for Parrot. The driver harness in CVS compiles the python to bytecode and translates the python bytecode to parrot bytecode the first time things are run. Second and subsequent times use the cached bytecode and skip that step, for a pretty significant speedup.
Finally you do need a mildly thumped version of test b6. One of the things the translator doesn't do right now is python's import, so the bits imported from b5 need to be cut-and-pasted into the source for b6. From there... run the tests a few times. I found there was a fair amount of flux, so the times above are the best of five runs for both parrot and python.
I'm pretty sure the bad times on b3 are from the hack job done to implement python classes. That's one of the things that's going to get fixed up. And things will get fixed up. Finishing the translator is high on the list 'o things to do, one I'm hoping to get done by the end of August.
Well, looks like I'm going to be eating pie. I didn't get my translator done and, while Leo did some damned impressive work, he didn't get finished either (in large part, I suspect, because I didn't get the translator finished back at the beginning of June as I'd originally planned) so I need to scare up an oven for a while. I figure if I'm going to get pasted with a pie, it's going to be a good pie, and I don't like the no-bake Key Lime pies as much as the baked one.
I'm going to work up a post-mortem of the project, so we can work out what went wrong, what went right, and what needs addressing. That should be fun, but if you don't ask "Where the heck did you screw up" for projects that didn't succeed (hell, for projects that did succeed, even) then you'll just make the same damn mistakes again and again. That's dull--I'd much rather make new mistakes, thanks. :)
FWIW, this project is not going to die. We're too close to actually being finished, and it'd be too damned useful to have it working, to let it die now. Who knows, maybe we can manage to get things going for a run at YAPC::EU this year. That'd be cool.
Since I was tired of the fail messages from the CPAN testers, the generic failures on Win32, and needed some reasonable printing for debugging and code analysis, I uploaded Python::Bytecode 2.7 to CPAN. (Or snag it from that link there if you want it)
def bar(): a = 3 print "Foo" print a a = 2 b = 3 if a: print a, b c = a + b print c a.foo = c d = a.foo print d bar()prints
2 3 5 5 Foo 3as it ought to. (Well, OK, except for the whole "a.foo is an error" thing...)
I've uploaded the updated translator. After lunch, it's time for positional and keyword parameters. This may be slow, since I'm doing hash lookups for all the locals, where Python uses array lookups for the LOAD_FAST/STORE_FAST ops. I ought to translate over to using an OrderedHash for the local store and using array access there too.
Update 1: Well, turns out positionals are pretty easy too. That's up and done, so you can pass parameters to functions. Cool.
So here I am, trying to puzzle out how Python handles parameter passing, since passing parameters into functions is, after all, really useful For a while this afternoon I wasreally nervous that they all went in on the stack. That'd be bad, since it'd really have messed up the translator program, which works with the assumptions that the stack starts out clean, can be statically analyzed, and is effectively empty at brach destinations. Needless to say, a variably-sized starting stack kinda kills static analysis.
Luckily... that turned out not to be the case. Plus it turns out that the pie-thon benchmark code only uses two of the four call types, which makes life a bit easier, as I don't have to get CALL_FUNCTION_KW and CALL_FUNCTION_VAR_KW implemented. (Though they'll probably be reasonably simple to do, and I might anyway. Probably not right away though) There's also no mix--it's either all positional or all keyword. (Dunno if it's even legal to mix them, but bluntly I think that's a bad idea anyway, so I'm just as glad to not have to bother with that case)
I also found an interesting implementation feature of Python that ties into one of the bits of the language.
Now Python, as you probably know, doesn't really do lexical variables. You've really got two types--function-local variables and global variables. The locals could be considered lexical if you squinted really hard, but... nah, not really. It does do named parameters, though, unlike Perl's "slam everything into @_" hack, and the named parameters can be passed in by name. They're also considered locals to the function. Now, as I was digging in, I was thinking "how the heck do the parameters get passed in?" Well, it turns out that they don't. What Python does instead is build up a frame (sorry, a dict) that contains all the function's locals, fills them in, and passes that frame into the called function. The caller does this. Which, I've gotta admit, is clever. Granted, for a fully general solution for me it's a major pain in the ass, but I don't need a fully general solution. Not yet, at least. Maybe in a few weeks.
Anyway, this is relatively easy to do. (Relatively...) Each code object is tagged with the number of locals it takes as well as the names of the variables it has, so allocating the hash is simple. For the vararg case the overflow goes into a list which is the last element of the parameter list. Or so I think. I haven't yet found where in the code it handles those overflow things. Gotta be in there somewhere, but I'll get to that when I get to it.
It's odd. I've been digging through ceval.c to figure out what this stuff all does, and I'm finding that, in general, it's almost not worth digging too deeply. Yeah, all of the python engine's grubby secrets are in there but, honestly, I don't care how it does what it does, just what it actually does. Which is to say that I care about what the stack looks like and how the environment behaves, not how the engine flips its bits and does its tests to get there.
Hopefully when I'm done with this functions will all work. Woohoo for that, but I'm waiting until it actually works.
Probably the single biggest annoyance of all this is that the lambda functions, mainly because they're anonymous.
Another hour, another 20 opcodes done. (Though I'm running out of the easy ones) The slice ops may be a bit dodgy, though, as the compiler's relying on parrot ops that aren't checked in yet.
FWIW, this is up to version 2.5. (I see something fetching the 2.2 archive, but I've not looked at what's doing it) If you want to play, install that from CPAN, though it seems to have test issues on some systems that I need to track down at some point, to duck the smoke test messages if nothing else.
This version added support for reading complex numbers, unicode strings, and proper handling (I think) of bignums. With this release all the piethon bytecode can be read without error, though of course that's a big step away from actually doing anything with it.
Okay, trusting that Leo's right and the python engine's good about not leaving things on the stack at label boundaries, I've reworked the architecture of the translator program. A new version's up, and I'll explain later. (Though it's probably reasonably obvious what it does) It won't run, but adding in new python bytecode processing ops is trivial and, happily, diff/patch able. (nudge, nudge :)
I fully expect to be poking at this as I go along, but feel free to do so too.
Well, I've got the laptop back from repair (for the fourth time) and after two days of getting it in order (It went in with a bad 20G drive with 10.3.4 on it, came back with a 30G drive with 10.2 on it, and my full-system backups turned out to be useless. Important safey tip--rsync is useless for any file with a resource fork, including all OS 9 apps, many (if not all) Carbon apps, and a goodly number of data and system files) I'm finally up and running. It's kind of amazing how many update passes the damned thing needed to go from a stock 10.3 install to latest up-to-date. But that's a rant for another time.
Time's really short here for the piethon challenge. Leo's been working on the back end, and Python::Bytecode is mostly working (I'm running into some issues now, dammit), but I need to leave for Oregon next Thursday, so I've about a week to go with full 'Net access. (And I'll probably be painting for a goodly part of the time between me leaving and OSCON) At least I've got evenings (sort of--the work project's behind) and weekends as my wife and kids are heading out early.
Anyway, enough lame excuses. On to the actual info.
Since people are going to play along at home, and I'm not going to turn down any help, I'm putting the translator program up for fetching. If you want to see what it looks like, the thing's at http://www.sidhe.org/~dan/piethon/translator.pl. No, no rsync or CVS access (no time), and it's not going into the parrot repository since that's a bit much (I may change my mind on this) but you can snag it and thump it to see what, if anything, it does.
The translator itself requires just perl with the Python::Bytecode module installed, so it's pretty low-overhead. It only works (for loose values of works) on Python 2.3 bytecode, but that's fine. We may or may not make it work for earlier versions, but we'll deal with that later.
The current scheme for going from python's stack-based system to parrot's register-based one is somewhat simple, since I'm too pressed for time to make it fancier and faster. (Losing more than a month to @!#$!$ machine problems has been a major pain) The scheme is simple. We turn Parrot PMC registers 18-29 into a temporary stack (with an array in P31 as overflow), and add in support to parrot for that being the case. TOS is P18, with the stack tail held in I31, and we've a few new ops to manipulate the stack, basically a fake push and fake pop op. From then on, all the ops that act on the TOS just act on P18, TOS1 is P19, and standard register manipulation ops go from there. Stack shifts require a memmove, which isn't free, so this isn't without its costs, but it's pretty simple to handle.
I'll start putting together (or, rather, reputting together, since I had one which got lost, but it was pretty wrong anyway) a table of python to parrot op translations for folks to look at and comment on, though it'll be going to the (soon to be renamed) perl6-internals list as well.
This version should, theoretically, be complete and ready to use. It'll properly disassemble (or, rather, make disassemblable) bytecode files with multiple code objects, patches up the dopey constants and variable name errors it had, and even has tests for python 2.2 and 2.3 format bytecode. It won't generate python bytecode (and the answer to "Will it?" is "Sure, as soon as you write that bit" :), but it should now be sufficient to use as a disassembler for the piethon contest.
That's step one. Leo's working on step 3, the back end parts. Now all I need to do is get step 2, the cross-assembler, written...
Since my machine is making what can reasonably be described as Bad Noises, I figure I'd best get this put up while I still can. This, then, is Python::Bytecode v2.1, a variant of Simon Cozens' version on CPAN. The difference here is that it knows about Python 2.3 bytecode and doesn't know about version 2.1 and 2.2 bytecode. (Though it claims it does)
This version has several issues. First, the tests fail, because I switched out the test bytecode files. (The tests actually work, just the data they're looking at is different than what they should be looking at) Second, it's convinced that there is a single code object in each bytecode file, which is definitetly Not True.
The first issue's mine, the second's the original module's issue. (Basically the bytecode object and the bytecode file object are the same thing, which isn't right--each file can have multiple bytecode objects in it) Anyone wanting to take a shot at fixing it while my machine gently weeps would be much appreciated.
So, I've been working on and off with Python::Bytecode, getting it up to snuff. The basic code itself looks to be a simple translation of the Python unfreezing code, though it's needed some thumping for Python 2.3. (Things have changed sizes since the code was originally written, and some conditional things were a bit dodgy)
I've actually got it fully working, which I wrote about earlier (I think) with one caveat--the code assumes that there's only one code object around at any one time, which isn't generally true. That is, when you disassemble a piece of bytecode you normally have multiple code objects. And a disassembler that only does one code object is sub-optimal. (This, by the way, is why the tests I was doing didn't match what python's dis module showed. It had the same problem, disassembling only one code object, it just chose a different one to disassemble. (First rather than last. Or vice versa, I don't remember)) Anyway, now I'm rejigging the relatively simple internals to be less simple, splitting out the bytecode management chunks from the code object chunks.
Hopefully by the end of the day (or, rather, end of the night, since I shouldn't burn work time on this, though a working python bytecode transcoder would make writing library code easier than doing it in PIR (well, other than the whole 'writing it in Python' thing (more because I don't know the language well enough to use it. I'll save rants about the language proper until such a time as I actually do, since it's kinda stupid to rant about things I don't know anything about, and I prefer to save that for other things... :))) I'll have a working disassembler. At which point we're in a position to write the transcoder, and from there we can fill in the blanks with the library code.
FWIW, if you're following along at home, the library code is by far the biggest potential issue. The bytecode disassembly and translation's not that big a deal, relatively speaking, though the library code is.
I've got to admit, at this point I am nervous about winning, not because I think parrot can't perform but because I'm not sure there'll be time to finish. This has been held up in part by machine problems (my laptop's in the shop again, dammit) but mostly by real world concerns. Work and family's keeping me busy (maybe you can tell your six year old daughter you can't go to the park with her because you need to throw a pie at Guido, but I can't :) so I've had a lot less time to put into this than I thought. On the other hand, we've gotten Leo involved, so things may well progress... rapidly.
The translation code's a single-threaded bottleneck, but once the translator's done we can get more folks involved, since library work can be done multithreaded. And as a side-effect, parrot'll get a good chunk 'o runtime library implemented, and I can't complain about that. One way or another this will get done, even if we don't make OSCON. That'll give us a nice, working python to parrot translator, which isn't a bad thing at all.
As I've been digging through the python bytecode decoder I found out where things were going "wrong" with it. As it turns out they weren't, there were just a few sensible but incorrect assumptions made in the decoding code. (Mainly that there's only one code object, which turns out not to be the case) It also explains why dis.py and Python::Bytecode were giving different answers--dis.py decodes the first code object in the file while Python::Bytecode got the last one. The fix is a touch convoluted, but not too bad to do.
Oh, and I've got about half the python bytecode instructions having direct parrot equivalents after doing a bit of stack-to-register decoding. The easy ones, of course, but you've gotta start somewhere. (I think I'm going to have to implement a good chunk of extended-precision math for this, since that's part of the python core. Parrot's got some that needs some swamping out, so I'll probably start there)
Or at least some of it.
The hacked-up version of Python::Bytecode will now load up the bytecode files that python 2.3.3 generated for the parrot benchmark tests. Of course, they're all wrong, but... first things first, I expect.
I spent a chunk of this weekend working on the Pie-thon conversion setup. Since folks seem interested in this sort of thing (I'm not sure whether it's as a record of success or abject failure, so we just won't go there right now :) I figure I ought to note down what's going on--it's not like it'll make much difference one way or the other with the contest, so there's no harm as I see it.
There's also a new category for this if you're into that sort of thing. (Official trackback ping URL is http://www.sidhe.org/~dan/blog/mt-tb.cgi/253 according to MT) No RSS feed just for it. (And yes, I realize the "WTHI" feed is actually a glommed-together category feed, and I need to fix that too)
Anyway, here's where we stand so far.
I've been working with Simon Cozens' Python::Bytecode module as my starting point--it disassembles python bytecode to a readable text stream, pretty much the same way that the dis module in the python distribution does. Seemed to be a good place to start, given I know far more perl than python.
The module, unfortunately, only comes knowing about the v2.1 and 2.2 bytecode formats. Things have changed a bit for v2.3 and, while I've got a version I hacked up to try and load v2.3 bytecode, there's been no joy there so far--it gets to the magic number identification OK, but the bytecode past that it doesn't recognize, and a hexdump shows it being a bit different from 2.2 bytecode. Some poking around in the Python core has been instructive, but not enough that I've been able to get it going. (I expect that once I can get it to get to the actual code it'll work, but...) There may be a few extra constants in there now that I look at the hex dumps again. Hrm. Going to have to poke in the python source some more. (I now see the value of editors with function name completion, FWIW)
In the mean time, I've been taking a look at the dis module instead. This ships with python and provides disassembly services along with a command-line interface that'll take python source, compile it, and provide the disassembled bytecode. I've gotten a bit further with this, though it's got a serious limitation that's getting in the way--it only disassembles the first function in the source, and doesn't dump the constants out in a useful way. (Not that this is a bad thing as such. It's still quite useful) Still, a place to start.
I've also got the start of a conversion scheme from python bytecode ops to parrot ops. This has the additional issue of having to do stack to register scheme translation, which could be a big deal in hand-rolled code but I expect it won't be a problem here. I think I may well do at least one intermediate representation, though, rather than a direct conversion. We'll see.
I'll post the code for this stuff as I get it going.