May 17, 2004

Let the thumping begin

I spent a chunk of this weekend working on the Pie-thon conversion setup. Since folks seem interested in this sort of thing (I'm not sure whether it's as a record of success or abject failure, so we just won't go there right now :) I figure I ought to note down what's going on--it's not like it'll make much difference one way or the other with the contest, so there's no harm as I see it.

There's also a new category for this if you're into that sort of thing. (Official trackback ping URL is http://www.sidhe.org/~dan/blog/mt-tb.cgi/253 according to MT) No RSS feed just for it. (And yes, I realize the "WTHI" feed is actually a glommed-together category feed, and I need to fix that too)

Anyway, here's where we stand so far.

I've been working with Simon Cozens' Python::Bytecode module as my starting point--it disassembles python bytecode to a readable text stream, pretty much the same way that the dis module in the python distribution does. Seemed to be a good place to start, given I know far more perl than python.

The module, unfortunately, only comes knowing about the v2.1 and 2.2 bytecode formats. Things have changed a bit for v2.3 and, while I've got a version I hacked up to try and load v2.3 bytecode, there's been no joy there so far--it gets to the magic number identification OK, but the bytecode past that it doesn't recognize, and a hexdump shows it being a bit different from 2.2 bytecode. Some poking around in the Python core has been instructive, but not enough that I've been able to get it going. (I expect that once I can get it to get to the actual code it'll work, but...) There may be a few extra constants in there now that I look at the hex dumps again. Hrm. Going to have to poke in the python source some more. (I now see the value of editors with function name completion, FWIW)

In the mean time, I've been taking a look at the dis module instead. This ships with python and provides disassembly services along with a command-line interface that'll take python source, compile it, and provide the disassembled bytecode. I've gotten a bit further with this, though it's got a serious limitation that's getting in the way--it only disassembles the first function in the source, and doesn't dump the constants out in a useful way. (Not that this is a bad thing as such. It's still quite useful) Still, a place to start.

I've also got the start of a conversion scheme from python bytecode ops to parrot ops. This has the additional issue of having to do stack to register scheme translation, which could be a big deal in hand-rolled code but I expect it won't be a problem here. I think I may well do at least one intermediate representation, though, rather than a direct conversion. We'll see.

I'll post the code for this stuff as I get it going.

Posted by Dan at May 17, 2004 11:15 AM | TrackBack (0)
Comments

In case you don't know where to look, Python/marshal.c contains the C code for reading Python bytecode and creating objects from it or taking existing objects (imported .py files) and writing the proper .pyc file. Include/opcode.h has all the opcodes as C macros. Python/compile.c is obviously the compiler. There is also opcode.py and symtable.py in the Lib directory which are undocumented but might be of help.

Please feel free to ask for any info if you need it; I owe you since your "What the Heck..." entry on types helped inspire my masters thesis (in progress) on type inferencing built-in types of local variables in Python.

Posted by: Brett at May 17, 2004 10:56 PM

Aha, marshall.c is what I was looking for--thanks. That should make it easier to tease out the bits I need. I'll fire you off some mail if I hit a snag again.

(FWIW, you're the second person I know of that's doing a master's thesis on type inferencing with Python)

Posted by: Dan at May 18, 2004 12:34 PM

Is the other one Starkiller? Or is it someone else entirely?

Posted by: Brett at May 19, 2004 06:55 PM

Yep, it's Starkiller. Looks pretty interesting, though I've not yet finished reading it.

Posted by: Dan at May 20, 2004 09:06 AM

Yeah, it seems to do what it set out to do; type infer an entire Python program for eventual conversion to C++ sans two or three cases which are not common. The self paper it is based on is a good read (but most of the papers on self done at Stanford and Sun are as I have discovered).

But my thesis is for type inferencing built-in types for local variables. That way it can be integrated directly into the compiler and not break any semantics of Python by requiring all code to be present at compile-time, etc. Also makes my life much easier. =)

Posted by: Brett at May 23, 2004 06:20 PM

See also Python::Serialise::Marshal on CPAN.

I use it for successfully migrating Mailman configs.


Posted by: muttley at June 15, 2004 02:15 AM