April 29, 2004

Multimethod dispatch for fun and profit

Sometimes the more things change the more they stay the same. Other times the more they stay the same the more they change.

Anyway, for those of you keeping track at home, yesterday we just officially gave in and declared that all non-load/store operations on PMCs are always and unconditionally dispatched through Parrot's binary-op multimethod dispatch system. Before you had do actually ask for it, now you get it whether you want it or not.

Multimethod dispatch, as you might remember, is a way of finding which function to run based not just on the function name but also the types of the parameters to the function. So rather than having a single add function, there's a whole list of add functions each with a separate function signature, and when we need to do an add we find the one that's got a signature closest to what we actually have.

Why, then, go MMD?

Well, to start with, we were pretty close to it already. Perl 5 and perl 6 both define their operator overloading functions as mostly MMD. (Perl 6 does it completely, while perl 5 just does it mostly, kinda, sorta) The base object class in parrot does MMD, and we want to make it so Python and Ruby objects descend from it. (Which is doable) So basically... most of the PMC types parrot deals with do MMD.

The way we were doing it was to have the PMC's vtable function do the dispatch if it wanted it. That had some problems, though, the biggest of which is that it set up a continuation barrier. Because of the way Parrot's implemented, continuations can't cross a parrot->C or C->Parrot boundary--that is, once we leave bytecode and enter C, or call into C from bytecode, we set up a wall that a continuation can't cross. (That's because we can't save and restore the system stack) So... if you overloaded addition in bytecode, you could take a continuation just fine, but as soon as you left the vtable function that continuation becomes unusable. Hardly a huge deal, but still... annoying. Also, this is a little slow. Not a lot slow, mind, but we dispatch to the vtable function and then to the mmd function, and if we're doing this 90% of the time anyway then that vtable function dispatch is just wasted overhead.

With MMD as the default we get to change some of that.

First, of course, we cut out the extra overhead in dispatch, since we skip the vtable dispatching altogether.

Second, since we dispatch directly from the op functions to the MMD function, we can check and see if it's a bytecode or C function, and if it's a bytecode function dispatch to it as if it were any other sub, so continuations taken will live longer and will be more usable. (Why you'd take a continuation from within an operator overloading function I don't know, but that's not my call)

Third, we get to shrink down the PMC vtable. Now it's only got get & set functions and some informational and metainformational functions in it. That makes instantiating classes a bit faster, since each class gets its own vtable. (Though if class creation time's a limiting factor in your code then, well, you're gonna scare me)

Fourth, we lift the artificial limit on the number of operators one can do MMD on. Before, you could only really do it on the operations that had functions in the vtable and, while there was a pretty comprehensive list, it was still pretty fixed. Now, well... there's nothing to stop you from adding another table to the list. As long as you've got a default function to call as an absolute fallback, you're fine. This lets us add in the hyper ops as overloadable without having to screw around with expanding the vtables, and that's cool.

There's still a lot of work to be done to make this work out--the vtables have to be chopped down, the ops need to use MMD exclusively, and we need to rejig the PMC preprocessor to take what were vtable functions and turn them into default functions for the PMC type, but... well, this can be done mostly automatically, so it's doable.

This would probably be a good place to plug Eric Kidd's paper Efficient Compression of Generic Function Dispatch Tables, so I will. Useful stuff, that.

So now Parrot's got continuations as the basis for most of its control flow and does MMD for all variable ops... Makes you wonder what's next. (If it turns out we use Lisp as an intermediate language, with annotated Sexprs (encoded in XML) as our AST, you'll know we've crossed over into the Twilight Zone...)

Posted by Dan at 03:01 PM | Comments (1) | TrackBack

Beware clashing Evil

GCC on x86 doesn't, by default, align functions on any sort of boundary. This can make for interesting fun if one assumes that it does. (Especially since -O2 and above do align the functions)

-falign-functions=n would be the gcc switch you'd want if you had some reason to care. (where n means "on an n byte boundary") You probably shouldn't, but...

Posted by Dan at 12:54 PM | Comments (1) | TrackBack

April 27, 2004

Mmmm, torrents!

So, I've been fiddling around with BitTorrent and all the variant clients. And... I like it. A lot. The shared downloading and easily resumable downloads make it a keen thing indeed.

It turns out that BitTornado, one of the variant clients, does apply a global rate cap if you're serving multiple files up. This is good as Bittorrent clients'll eat all the available bandwidth, and then some, if given the chance. Not a huge problem for client-only systems, or systems with big upstream pipes, but for me with a smallish (192K) upstream pipe, well... running multiple torrents wouldn't be an option without this. With it I can throw all the torrents being served into a single directory and know that the 14K rate cap is applied to all the uploads in aggregate. Woo, and, I might add, hoo!

I've already set up a few torrents for some of the big files on the server (Jane Irwin's Vogelein.com is hosted here, and the MP3s on her resources page are available via bittorrent) and I think I'm going to set some up for the prebuild VMS perl distributions. While there may not be a VMS bittorrent client, most everyone's got a windows, linux, or Mac box handy, and it'll mean less load compared to ftp or web downloads of the things. That's not a bad thing.

Pity the protocol's slightly too heavy-weight to work for small files--it'd be cool if this could be automatically integrated into browsers to distribute more stuff automatically.

Posted by Dan at 03:33 PM | Comments (3) | TrackBack

April 26, 2004

Bits keep falling from the sky

I've been fooling around with BitTorrent lately. Nothing big, just fiddling to see how it all works and playing around a bit. (Yeah, OK, I admit it--I've been snagging manga scanslations off the shub-internet :) Definitely a cool idea, though the reference implementation lacks some things that'd make it otherwise handy. Things like a global rate limit and client cap, handy for those of us with full-time internet connections that don't mind being seeds for a directory full 'o stuff, but don't have that much bandwidth on hand. (I don't at all mind letting a torrent client run for a few days or weeks, as long as it doesn't get in the way) It's no good for the single biggest hit on the bandwidth here (the VMS perl distributions) for lack of VMS bit torrent clients, but...

Still, I've been looking around--it's a good alternative to anonymous FTP, and I figure someone has to have dealt with this by now--not everyone with a wad of stuff to be master server for has a monster pipe and sufficient IP addresses to use the basic client. That search has demonstrated one of the biggest drawbacks to peer-to-peer systems. No, it's not really badly mangled English mistranslations of Japanese. Rather, it's the lack of good server software.

Yeah, I know, it's peer-to-peer! There are no servers. Or... not, since there are. But nearly all the development is towards end-user programs with gooey GUIs and many whizzes and bangs, leaving those of us who just want to run a faceless daemon for these things sort of out of luck.

So far, the only thing I've found is BitTornado, which looks to wrap a WxWindows GUI on top of an upgraded python underpinning with better rate control and management, so I'm going to give that one a shot in a bit, as soon as SNET^WSBC un-breaks the state's DSL system and see how it looks. Hopefully it'll do what I want it to.

Now if someone just writes a BitTorrent client for VMS that can be distributed as a standalone executable... (And yeah, even if the python core works on VMS, that means someone'll have to download two huge distributions, rather than one, since neither the perl nor python distributions are at all small...)

Posted by Dan at 06:41 PM | Comments (5) | TrackBack

April 21, 2004

Romeo, Romeo, wherefor art thou, Romeo

Some days the Shub-internet gives, and other days it takes away. Today's a giving day.

So, in case your day hasn't been sufficiently surreal, I pass along to you... Shakespeare, the programming language.

exeunt omnes

Posted by Dan at 01:03 PM | Comments (2) | TrackBack

April 19, 2004

Why parrot in production?

A good question, and hopefully this is a good answer.

This is a response, of sorts, to some of the feedback generated by the Parrot compiler article. (And yep, it's likely that only one of the core parrot guys could write the article, which is why I did--now everyone's got a good chance at writing compilers for parrot :)

The compiler article didn't really get into why my work project's targeting parrot, just that it was doing so. No big surprise, since the opening bits were really meant as a way to draw people into the article as well as give the idea that it's both possible and reasonable to write a compiler for an old, crusty language as a way to transition to something that sucks much less than what you presently have. There are a lot of folks stuck with limited Domain-Specific Languages, or a code base written in some antique dialect of something (usually BASIC or PL/I), or some custom 4GL whose whiz-bang features neither whiz nor bang any more, and while you can just dump the whole wad and move to something new, well... that's expensive, very disruptive, and risky. More than one shop has found that moving a currently working "legacy" system to The Next Great Thing is a big and very expensive step backwards.

Not so say that I don't like moving to something new, but the crusty sysadmin in me wants a solid transition path, good fallback contingency planning, and a Plan B, if not a Plan C, just in case. It's often much better to transition the underlying infrastructure first, then refactor, rewrite, or just plain shoot the code in the nasty source language at your leisure. It's also a lot less work in the short term in many cases, which lets you move over to the new system quickly. Often in these cases the problem isn't the language, no matter how crappy it might be. Instead it's the runtime limitations--memory, database, screen handling, or whatever--that really get in the way, so the old crap language can work just fine, at least for a long time, if you can relieve the underlying runtime pressures.

Anyway, the explanation of the work project.

As I said in the article, our big issue was the database library for this language--standard ISAM DB with file size limits that made sense at the time but are starting to bite us, with no real good way to fix them. We had to move to something else, or we'd ultimately be in a lot of trouble.

The first plan was to write a compiler for DecisionPlus that turned it into Perl code. We'd run the conversion, have a big mass of somewhat ugly perl code, then shoot the original source and be done with it. All the code would then be Perl, we could work with it, and refactor it as we needed to so it didn't suck. (You can hold the snide comments on perl's inherent suckiness, thanks :) We fully expected the result to look somewhat nasty--hell, the language had no subs, the only conditional statement was if/then/else, and control flow was all gosubs and gotos. To labels. All variables were global, there was no scope at all, and, well... ewww. In a big way.

I was brought on because it's rumored I've got reasonably good perl skills, and I've done compiler work, so... I set in.

The initial compiler was written in perl because, well, it's a nice language for text processing if you like it, and it's got a lot of powerful tools (in the form of CPAN) handy. Yeah, I could've done it all in C with a yacc grammar, but it's a lot less effort to get that level of pain just smacking myself with a hammer. The first cut of the compiler didn't take too long, as these things go. A few months and I had something serviceable that would run against the whole source repository and work. There were still some issues with the generated code, but nothing bad.

Unfortunately... the output code was nasty. Not last act of Oedipus Rex nasty, but still... what I had to do in the perl code to maintain the semantics of the DecisionPlus source resulted in awfully ugly code, even with some reasonable formatting by the compiler. Lots of little subroutines (so gosub/return would work), lots of actual gotos, and because of the little subs lots of scopes all over the place to make refactoring painful. One thing I hadn't counted on was the extent of the spaghetti in the code--since there wasn't any syntax to restrict things, control flow was an insane mess full of bits and pieces done by people writing Clever Code. (Like labels that were sometimes goto-d and sometimes gosub-d, with exit paths decided based on if statements checking global variables)

There was another issue as well--typed data. DecisionPlus has a type system rather more restrictive than perl's, including length-restricted strings and bit-limited integers. And, while these things caused no end of obscure bugs, we had to be bug-for-bug compatible because there was code that used these behaviours to their advantage. To get that with perl meant using tied variables to impose the load/store semantics and overloaded operators to get the binary operations correct. Unfortunately ties and overloads are two places where perl 5 is really, really slow. Taking a look at what'd be needed to make this work, it became pretty clear there'd be a lot of overhead, and that the result would be potentially performing badly.

So, we gave a shot, and it became clear that the primary goal, getting an editable perl version of the source, wasn't feasible, and even using perl as a target for the compiler would be sub-optimal. That's when Plan B came in.

Plan B, if you hadn't guessed, was to target Parrot.

Now, I was actually pretty nervous about this -- I was not sure we were ready for prime time with parrot. It seemed like a good idea, but... we weren't sure. Really not sure. I've done the sysadmin thing, and I've been in the "it must never, ever go down" environment, and I know enough to make a sober risk assessment. And, doing so, the answer was... maybe.

Importantly, though, even if things failed, we wouldn't be wasting all our time. We'd still get a more robust and mature compiler, and a better idea of the issues involved with compiling the language, so... we set a very aggressive goal. If I could make it, we'd know parrot was up to the task, and if not, we'd go to Plan C with a better understanding of the problems and a compiler architecture ready to retarget to another back end. Plus we'd have shaken out many of the issues of switching to a new database system (Postgres rather than an ISAM setup) and screen handling system. (I spent a fair amount of time teasing out escape sequences from primitive character function databases, poring over VT220 and xterm escape sequence manuals, and back-translating them to curses functionality. Now that was fun...)

It worked out, of course. We wouldn't be at this point (writing the article and all) if it hadn't, though it was touch and go for a bit. Still, I beat the deadline by a day and two hours, which was cool. And a lot of bugs in parrot were shaken out, and some functionality prompted, because of this, which was good too--always good to have a real live application.

Oh, and Parrot got a mostly working Forth implementation too. So it was a win all around. :)

Posted by Dan at 07:04 PM | Comments (1) | TrackBack

April 15, 2004

Mmmm, article goodness

Looks like the article on writing a compiler that targets Parrot has hit OnLamp. Probably worth taking a look.

Posted by Dan at 10:57 PM | Comments (1) | TrackBack

April 08, 2004

Taking out the trash

So, I went to the talk on unified garbage collection theory Monday. I'd like to say it was breathtakingly innovative, filled with impressive insights, but honestly it was pretty mundane. There wasn't anything new, though it wasn't a waste of time.

The executive summary, for those who're interested, is that Refcounting and Tracing, as means of garbage collection, are really the same thing. Or, rather, opposites of one another--refcounting finds the dead stuff and tracing finds the live stuff. Which is better for you depends on the application and your requirements. Slides for this talk are on the web.

Even though the talk wasn't particularly revolutionary in itself, nor did it really break any new ground, the measure of success of any talk is what you got out of it--whether what you got out of a talk was actually in the talk is secondary. :) By that standard, I was happy to have gone. It was also nice to get a chance to wander around the (only semi-finished) interior of the new CSAIL complex. (Which I shall now, and forever more, think of as the Theodore Geisel Memorial Complex, even though that isn't its name. Jokes about the building are pretty old by now, I expect)

Anyway, here's a barely edited set of notes from the talk. Things in italics are actual talk notes, the rest are things that occurred to me while I was taking the notes. And yeah, some of the things I noted down I did know beforehand, but I tend to have a spotty memory for details so I figured it was worth noting down to keep 'em in the front of my brain.

  • I want tracing without stop the world, dammit. Pity I can't have it. :(
  • Can we hijack the MMU to map pages as read-only to halt mutating threads? Not particularly portable, but...
  • Wedge in mutating macro or function or something.
  • Real time goes ref count, but I wonder if that's truly viable. Unbounded (well, count of object) collection time... (Note that this isn't true if you use a deferred collection list, as you only take out as many objects as you have time for)
  • Oooh, a Lisp jab. Keen. :)
  • Refcount commonly tosses count mutation for putting things on the system stack
  • No-mutation costs on tracing tends to go away as you throw in write barriers and such
  • Copying collectors with object structures would be really nice.
  • Trace collector finds live things, refcount collector finds dead things
  • ref and trace are flipsides. Trace goes from live, refcount goes from dead
  • RC starts with ref too high and goes lower, trace starts too low and goes higher
  • Sweep tends to take a lot of time in tracing collectors. Depressingly true. (Fewer things in general help)
  • Wonder if its worth having a used list as well as a free list, so sweep only sweeps the used list. When a PMC is allocated its put on the used list, though the used list explicitly isn't part of the root set. Probably only worthwhile if we hit a highwater mark at some point.
  • Nice transition from trace to GC on slide. (Check to see if the powerpoint slide set has this) Non-incremental ref count, though.
  • Interesting note in multiprocessor systems--can't count on atomic and universally visible refcount coherence, so scheduling inc and dec operations are a good thing. Throw a thread at the scheduling buffer.
  • Low pause time stuff -- refcount with occasional trace to find cyclic garbage
  • Add in mutex and condwait on PMC allocation? Or for mutator thingie?
  • Generational heap with lists of generation objects?
  • Dammit, this'd be easier if we could move PMC pointers
  • We can do this with sufficient application of memory. Blows the hell out of cache, though
  • Refcount is good for low mutation rates, tracing for high mutation rates
  • Ulterior ref counting; Trace the nursery, refcount the mature space
  • Distributed refcount - trace on node, refcount across nodes
  • Makes a concerted effort to evaluate GC schemes based on time and space use
  • Would it be reasonable to macro the hell out of all the fundamental GC operations and then wedge in things to experiment with schemes of collection? Probably. Worth getting in right now before it's too late.
  • Investigate ways to stop the world, with a reasonably cheap fallback scheme if we can manage it.
  • New playroom in the building sucks. Too much through traffic.
  • Some formulas on space and time usage. In the powerpoint. (Want to instrument Parrot to allow for expermientation and cost estimates)
  • All realistic collectors are a trace/refcount hybrid
  • Cost of refcount isn't just cost to mutate the referenced refcount, but also the cascade through the dead objects
  • Worth investigating a more sophisticated DOD model. Getting the infrastructure in would be well worth it.
  • Count in the size of the metadata for real benchmarks. (Need to do this with the quota mechanism)
  • If you've got pluggable GC, then linking a collector type to an app, probably with some sort of dynamic feedback profiling info, would be really useful. The problem there is that it guarantees a minimum cost to pointer mutation, since at least some of the schemes require pointer mutation tracking. I suppose you could have multiple cores that get dynamically loaded in.
  • Split the heap into "has pointer" and "doesn't have pointer" memory. Could leverage the MMU for it with write barrier stuff.
  • Check costs of statistics gathering.
  • Check to see if circuit theory algorithms tie in any. (Comment from a guy in the audience--something on "thevinin and norton equivalence", nodes and branches)

I'm not sure what the net result of all this will be, other than I really want to get at least macros wedged into parrot to intercept anything GC/DOD related. While full runtime plugability has some required overhead that we may not want to pay for the flexibility, we can do it in source and just macro-out the stuff we don't need for a particular scheme. Instrumenting Parrot to get more stats would be a useful thing, to be sure.

Posted by Dan at 03:14 PM | Comments (1) | TrackBack