Well, looks like I'm going to be eating pie. I didn't get my translator done and, while Leo did some damned impressive work, he didn't get finished either (in large part, I suspect, because I didn't get the translator finished back at the beginning of June as I'd originally planned) so I need to scare up an oven for a while. I figure if I'm going to get pasted with a pie, it's going to be a good pie, and I don't like the no-bake Key Lime pies as much as the baked one.
I'm going to work up a post-mortem of the project, so we can work out what went wrong, what went right, and what needs addressing. That should be fun, but if you don't ask "Where the heck did you screw up" for projects that didn't succeed (hell, for projects that did succeed, even) then you'll just make the same damn mistakes again and again. That's dull--I'd much rather make new mistakes, thanks. :)
FWIW, this project is not going to die. We're too close to actually being finished, and it'd be too damned useful to have it working, to let it die now. Who knows, maybe we can manage to get things going for a run at YAPC::EU this year. That'd be cool.
Since I was tired of the fail messages from the CPAN testers, the generic failures on Win32, and needed some reasonable printing for debugging and code analysis, I uploaded Python::Bytecode 2.7 to CPAN. (Or snag it from that link there if you want it)
def bar(): a = 3 print "Foo" print a a = 2 b = 3 if a: print a, b c = a + b print c a.foo = c d = a.foo print d bar()prints
2 3 5 5 Foo 3as it ought to. (Well, OK, except for the whole "a.foo is an error" thing...)
I've uploaded the updated translator. After lunch, it's time for positional and keyword parameters. This may be slow, since I'm doing hash lookups for all the locals, where Python uses array lookups for the LOAD_FAST/STORE_FAST ops. I ought to translate over to using an OrderedHash for the local store and using array access there too.
Update 1: Well, turns out positionals are pretty easy too. That's up and done, so you can pass parameters to functions. Cool.
So here I am, trying to puzzle out how Python handles parameter passing, since passing parameters into functions is, after all, really useful For a while this afternoon I wasreally nervous that they all went in on the stack. That'd be bad, since it'd really have messed up the translator program, which works with the assumptions that the stack starts out clean, can be statically analyzed, and is effectively empty at brach destinations. Needless to say, a variably-sized starting stack kinda kills static analysis.
Luckily... that turned out not to be the case. Plus it turns out that the pie-thon benchmark code only uses two of the four call types, which makes life a bit easier, as I don't have to get CALL_FUNCTION_KW and CALL_FUNCTION_VAR_KW implemented. (Though they'll probably be reasonably simple to do, and I might anyway. Probably not right away though) There's also no mix--it's either all positional or all keyword. (Dunno if it's even legal to mix them, but bluntly I think that's a bad idea anyway, so I'm just as glad to not have to bother with that case)
I also found an interesting implementation feature of Python that ties into one of the bits of the language.
Now Python, as you probably know, doesn't really do lexical variables. You've really got two types--function-local variables and global variables. The locals could be considered lexical if you squinted really hard, but... nah, not really. It does do named parameters, though, unlike Perl's "slam everything into @_" hack, and the named parameters can be passed in by name. They're also considered locals to the function. Now, as I was digging in, I was thinking "how the heck do the parameters get passed in?" Well, it turns out that they don't. What Python does instead is build up a frame (sorry, a dict) that contains all the function's locals, fills them in, and passes that frame into the called function. The caller does this. Which, I've gotta admit, is clever. Granted, for a fully general solution for me it's a major pain in the ass, but I don't need a fully general solution. Not yet, at least. Maybe in a few weeks.
Anyway, this is relatively easy to do. (Relatively...) Each code object is tagged with the number of locals it takes as well as the names of the variables it has, so allocating the hash is simple. For the vararg case the overflow goes into a list which is the last element of the parameter list. Or so I think. I haven't yet found where in the code it handles those overflow things. Gotta be in there somewhere, but I'll get to that when I get to it.
It's odd. I've been digging through ceval.c to figure out what this stuff all does, and I'm finding that, in general, it's almost not worth digging too deeply. Yeah, all of the python engine's grubby secrets are in there but, honestly, I don't care how it does what it does, just what it actually does. Which is to say that I care about what the stack looks like and how the environment behaves, not how the engine flips its bits and does its tests to get there.
Hopefully when I'm done with this functions will all work. Woohoo for that, but I'm waiting until it actually works.
Probably the single biggest annoyance of all this is that the lambda functions, mainly because they're anonymous.
Another hour, another 20 opcodes done. (Though I'm running out of the easy ones) The slice ops may be a bit dodgy, though, as the compiler's relying on parrot ops that aren't checked in yet.
Well, unless you're reading Phil Dick, at least.
Just sent Python::Bytecode 2.6 to CPAN (knowing, bluntly, that I broke the tests) because the translator now works, at least for simple programs. Woohoo!
My first victim program was simple--the test shipped with Python::Bytecode. So this:
a = 2does, in fact, print
b = 3
if a:
print a, b
c = a + b
print c
2 3Go figure. I'm not sure which is more bizarre--the fact that it works, or the fact that I actually wrote correct Python code. (By the time this is all done I'm going to have to crank out a few pages of INTERCAL code to compensate.:)
5
I'm by no means done--assuming I've gotten it right, only 41 of the 110 ops are implemented, so I'd darned well better code faster, but...
Update 1: Make that 44 ops. Attribute get/set/delete is in, though we succeed in spots that Python throws errors.
Update 2: Make that 49 ops. Unary stuff and iterator getting, since it's easy. May have to check on setting an immutable true and false value, rather than a Boolean of the right truth value. We'll see if it matters.
Update 3: Make that 51 ops. PRINT_x_TO ops. And now it's past time for bed
FWIW, this is up to version 2.5. (I see something fetching the 2.2 archive, but I've not looked at what's doing it) If you want to play, install that from CPAN, though it seems to have test issues on some systems that I need to track down at some point, to duck the smoke test messages if nothing else.
This version added support for reading complex numbers, unicode strings, and proper handling (I think) of bignums. With this release all the piethon bytecode can be read without error, though of course that's a big step away from actually doing anything with it.
Okay, trusting that Leo's right and the python engine's good about not leaving things on the stack at label boundaries, I've reworked the architecture of the translator program. A new version's up, and I'll explain later. (Though it's probably reasonably obvious what it does) It won't run, but adding in new python bytecode processing ops is trivial and, happily, diff/patch able. (nudge, nudge :)
I fully expect to be poking at this as I go along, but feel free to do so too.
Well, I've got the laptop back from repair (for the fourth time) and after two days of getting it in order (It went in with a bad 20G drive with 10.3.4 on it, came back with a 30G drive with 10.2 on it, and my full-system backups turned out to be useless. Important safey tip--rsync is useless for any file with a resource fork, including all OS 9 apps, many (if not all) Carbon apps, and a goodly number of data and system files) I'm finally up and running. It's kind of amazing how many update passes the damned thing needed to go from a stock 10.3 install to latest up-to-date. But that's a rant for another time.
Time's really short here for the piethon challenge. Leo's been working on the back end, and Python::Bytecode is mostly working (I'm running into some issues now, dammit), but I need to leave for Oregon next Thursday, so I've about a week to go with full 'Net access. (And I'll probably be painting for a goodly part of the time between me leaving and OSCON) At least I've got evenings (sort of--the work project's behind) and weekends as my wife and kids are heading out early.
Anyway, enough lame excuses. On to the actual info.
Since people are going to play along at home, and I'm not going to turn down any help, I'm putting the translator program up for fetching. If you want to see what it looks like, the thing's at http://www.sidhe.org/~dan/piethon/translator.pl. No, no rsync or CVS access (no time), and it's not going into the parrot repository since that's a bit much (I may change my mind on this) but you can snag it and thump it to see what, if anything, it does.
The translator itself requires just perl with the Python::Bytecode module installed, so it's pretty low-overhead. It only works (for loose values of works) on Python 2.3 bytecode, but that's fine. We may or may not make it work for earlier versions, but we'll deal with that later.
The current scheme for going from python's stack-based system to parrot's register-based one is somewhat simple, since I'm too pressed for time to make it fancier and faster. (Losing more than a month to @!#$!$ machine problems has been a major pain) The scheme is simple. We turn Parrot PMC registers 18-29 into a temporary stack (with an array in P31 as overflow), and add in support to parrot for that being the case. TOS is P18, with the stack tail held in I31, and we've a few new ops to manipulate the stack, basically a fake push and fake pop op. From then on, all the ops that act on the TOS just act on P18, TOS1 is P19, and standard register manipulation ops go from there. Stack shifts require a memmove, which isn't free, so this isn't without its costs, but it's pretty simple to handle.
I'll start putting together (or, rather, reputting together, since I had one which got lost, but it was pretty wrong anyway) a table of python to parrot op translations for folks to look at and comment on, though it'll be going to the (soon to be renamed) perl6-internals list as well.
This version should, theoretically, be complete and ready to use. It'll properly disassemble (or, rather, make disassemblable) bytecode files with multiple code objects, patches up the dopey constants and variable name errors it had, and even has tests for python 2.2 and 2.3 format bytecode. It won't generate python bytecode (and the answer to "Will it?" is "Sure, as soon as you write that bit" :), but it should now be sufficient to use as a disassembler for the piethon contest.
That's step one. Leo's working on step 3, the back end parts. Now all I need to do is get step 2, the cross-assembler, written...
Of a sort, at least. I got Python::Bytecode working properly (Leo pointed out that my original hacked version messed up the constants) as well as getting it in a state to handle multiple code objects in a single bytecode file.
You can get the preliminary perl module at http://www.sidhe.org/~dan/Python-Bytecode-2.2.tar.gz for fooling with. Note that the docs haven't been updated, and it still doesn't do python 2.1 or 2.2 bytecode even though it claims to do so. Nor is it yet on CPAN, though I expect it will when I finish making my changes presentable and get them off to Simon.
FWIW, the rule is "Check for $foo->can('disassemble')" to see if a thing (like, say, a constant in an object's constant list) can be disassembled. If so, it's a python code object. I expect there'll be a less bad option there at some point. :)
Swell.
So here I am, stuck with no non-work computer access again. I'm going to be down for another week, at least, and depending on how things go I may well not get it back in time for OSCON. (I leave on the 22nd for an off-line week doing house repair) And there's no way in hell I'm going to be able to do anything significant on the pie-thon challenge between now and then, and I've not gotten fuck all done the past month on it because of this. It's likely I'm not going to have my OSCON presentation in a presentable shape either because of this.
I am so looking forward to this. I get to catch pie (unless Leo comes through, which is more than I can honestly ask for) and stand in a room and do shadow puppets for 45 minutes in front of a screen with a big "Shadow puppets and lack of slides brought to you courtesy of Apple Computer!" Oh, and any attempt to convince the folks at work that the crapulent windows laptops that the sales folks have should be replaced with apple gear is shot to hell too--while the windows machines are utter pieces of crap, at least if they do go back for repair they get fixed the first time. (And no, nobody really cares that Apple only screwed up the repair twice, not three times)
Fun. Or... not. Not, bluntly, that there's a whole lot to be done about it at this point, other than save up my nickels to cover the round of drinks I'm going to owe Guido, and the eventual replacement of the iBook the next time it dies. I'm thinking something in an IBM thinkpad may well be the way to go here, unless Apple somehow makes really good on this. I don't think Steve's Reality Distortion Field's that good, though I could be surprised.
Since my machine is making what can reasonably be described as Bad Noises, I figure I'd best get this put up while I still can. This, then, is Python::Bytecode v2.1, a variant of Simon Cozens' version on CPAN. The difference here is that it knows about Python 2.3 bytecode and doesn't know about version 2.1 and 2.2 bytecode. (Though it claims it does)
This version has several issues. First, the tests fail, because I switched out the test bytecode files. (The tests actually work, just the data they're looking at is different than what they should be looking at) Second, it's convinced that there is a single code object in each bytecode file, which is definitetly Not True.
The first issue's mine, the second's the original module's issue. (Basically the bytecode object and the bytecode file object are the same thing, which isn't right--each file can have multiple bytecode objects in it) Anyone wanting to take a shot at fixing it while my machine gently weeps would be much appreciated.
Hence, the call. Got people skills? Can you organize? Competently *design* code? Got free time? Work with people who are, well, Larry? Cool, this is for you. You don't have to be a star programmer, nor a parser or compiler whiz, though that certainly won't hurt. Enthusiasm's what you need, as there are people willing to help you out with the rest.
There are plenty of things you don't have to worry about. You don't have to worry about the language design--that's Larry's job. Nor do you have to worry about the interpreter engine--that's Dan's job. What you'll have to do is get the Perl 6 compiler module, and its standard library, designed and implemented.
It's a big job, but someone has to do it, and that someone could be you. (As a bonus, you get your own Secret Perl Cabal membership card and Decoder ring!) If you're interested, make your pitch to our esteemed, and mostly sane, Perl 6 manager-type person Allison Randal, at allison@perl.org.
Good luck!
Or, rather, my comp copy of Perl 6 Essentials, 2e. Well, now it's Perl 6 and Parrot Essentials 2e, but you get the idea. Maybe the 3e will just be Parrot Essentials. You never know...
And my laptop is back being re-re-repaired. Alas, they got it right this time it seems, so now I need to dig through all the stuff that's been pending because I've not had easy computer access out of work. (And yep, this means the annotated parrot talk slides, along with a bunch of other stuff)
It's amazing how much this stuff builds up when you're not looking. (My mail pile's near-scary here)