Squawks of the Parrot: June 2005 Archives

June 30, 2005

WCB: Full bytecode metadata

One of the things that was on the list 'o things to be put into parrot was fully annotatable bytecode. This would allow you, if you were wondering, to associate any number of sections of metadata to your bytecode, and within those metadata segments associate any data you like to individual positions in the bytecode.

Or, more simply, your code can say "given I'm at offset X from the start of my bytecode segment, what data of type Y is associated with it?"

This facility was going to go in to support proper error message handling, so compilers could associate source line numbers and source line text with the bytecode generated for that source -- this way you could have code like:

  getmeta P1, [segment_offset,] "line_number"
  getmeta P2, [segment_offset,] "filename"
  getmeta P3, [segment_offset,] "source_text"
  getmeta P4, [segment_offset,] "column"

to get the line number, filename, source text, and column offset for either the current position or the position at the given offset (if it's passed in). Assuming, of course, that there are line_number, filename, source_text, and column metadata segments attached to the current bytecode. (This actually makes me think that exceptions should send along as part of themselves a thingie that represents the point in the bytecode that code can query for metadata. Ought to be able to do it for any continuation as well, for walking back up the call stack)

This is one of those things that I'd really prefer to be an object method thing, since I'm not picturing this ever being particularly speed-critical. (Yes, hell has frozen over -- I'm actually advocating an OO solution) The problem here is that when we need this information we likely don't have an object, or are operating at a level below the object system, so we're stuck welding the capabilities into the interpreter's bytecode system itself.

There was also going to be a corresponding setmetadata op to allow annotating a bytecode segment, though that would generally not be used by anything but a compiler module. (And odds are, given what we were seeing with folks writing compilers, that would mean adding directives to the assembler or PIR compiler, and defined annotations to the AST)

Would've been nicely useful for attaching all that info that compilers want to attach, and that runtimes like to have, without fluffing out the actual executed bytecode.

Posted by Dan at 04:12 PM | Comments (0) | TrackBack

June 27, 2005

Hey, look, free time!

Go figure. Anyway, since my free time is a bit more copious than it used to be, I decided to dust off an old module and fix it up a bit. (Well, OK, free time and the fact that I'm using it in a talk wednesday morning) So now, after a bit of work, Devel::Size is about as close to done as it's going to get. The only way it could get any done-er is to delve into perl's format code, something I'd really rather not do.

Devel::Size 0.61, it's on CPAN (or at least on its way), and it now knows about everything there is to know about code refs. Yeah, it means that you can now see exactly how much space a code ref ultimately uses, including the actual optree, stashes, and the stuff hanging off them.

Posted by Dan at 04:27 PM | Comments (0) | TrackBack

June 24, 2005

A moment of sanity

Dangermouse is out on DVD. You've got to know that there's something essentially right with the world when this is true.

Posted by Dan at 09:24 AM | Comments (1) | TrackBack

June 21, 2005

Ya think?

Yeah, OK, I'm succumbing to the effects of the dreaded BlogLinkDisease, but... from ScienceBlog: Sleep studied as potential insomnia treatment.

Update: Ah, damn, they fixed it. Pity, it was funny while it lasted.

Posted by Dan at 01:56 PM | Comments (1) | TrackBack

June 20, 2005

Allocating registers is fun!

And, it turns out, as I remarked earlier, not that big a deal. I just did a full rebuild of the $WORK sourcebase. 468 individual programs, ~520k lines of source (about 14.3M of it), 175M of PIR, generating 164M of bytecode, and not a function or subroutine in the whole lot. Well, in the original sourcebase, at least.

Total compilation time? 185 CPU minutes. Which is less than it took to compile just the biggest program before I wrote the allocation code. (It's just slightly more than half the time that it took to compile that one monster program)

Interestingly, I never once had to do anything at all extraordinary, or even very interesting, with the register allocation code. Nothing. In all that code, some of it pretty damned ugly to make up for the primitiveness of the language, I never once ran out of registers, and I was restricting myself to just R16-31, so as not to collide with the parameter passing bottom half, even though I could've for a lot of the code. Hell, I didn't have to write any spilling code, since I never had to spill.

Yes, this is a bit of a petty gloating post. (Nyah!) A moment of weakness. I'll make up for it later by going into detail on what you too can do with your own register allocation code and a machine model that actually makes it easy.

But now? Well... Nyah! :-P

Posted by Dan at 04:54 PM | Comments (0) | TrackBack

WCB: Notifications

One of the things planned for parrot was a full notification system. That is, Parrot would support being able to register code that could be called when a PMC was read from, written to, destroyed, when a class instantiated a new PMC, or when an element was inserted into or removed from an aggregate. (Along with a few other things -- methods being added or removed from a class, the MMD tables changing, a PMC being invoked...) Once you register a notification for an action on a thing or class of thing, your notification code gets called whenever that action happens.

The whole point of notifications is to allow you a chance to get your code into the guts of parrot and get access to events you might not otherwise see, and do things in response to those events. There were three big drivers for this one.

First, there's some perl history. One of the common requests in perl 5 (well, not entirely uncommon, at least) is to be able to make the symbol table a tied variable. While there are a number of reasons people want to do this, the biggest is for debugging and monitoring -- they want to know when variables are added or accessed. This is not an unreasonable thing to want to do. Unfortunately it can't be done in perl 5.

Second, there's ruby. Ruby has a number of hooks wedged into it that allow you to install callback functions (well, methods) into a variety of places. These callbacks aren't special-cased -- they're just spots in the various classes that someone added a "Hey, it'd be nice to be notified if X happens" spot you could override. Useful things, and I wanted more.

Finally, there's the issue of instance variables (or slot variables, or attributes, or properties, depending on your language of choice). You know, those things that every object of a particular class has which every language calls by the name some other language uses for something completely different? Right, them. The problem with those is that, to be efficient, you really want to allocate them all as one big wad at the time your object is created, so you can access the instance variables as an array. That's great, until you find that you've added a new instance variable to a class with instantiated objects. With fully static, compiled languages that just can't happen, but with perl, python, and ruby, well... you can do all sorts of things to classes. When code adds (or removes) an instance variable from a class, you need to stop the world and rejig all the instantiated objects.

So... three general cases wherein something happens in the engine and some sort of watching code needs to do something. Three separate systems is awfully wasteful, and leaving people to bodge up an ad-hoc solution (or, more likely, a dozen or so ad-hoc solutions, if history is any guide) goes against one of Parrot's basic philosophies: "If everyone's going to reinvent a wheel, we might as well just provide the damn wheel as part of the stock system"

Hence the notification system. One global system that all this funnels through. If you do X, for any one of a myriad of values of X, then all the watching functions for X get called. The nice thing here is that you can have one unified system with a single interface so you don't need to call functions to register some callbacks, subclass classes for other callbacks, and set global symbols for other callbacks. Instead, one big system, one way to deal with it, fewer hassles to worry about.

Since we're integrating this all together I'll add that this is all was to done, on the callback side of things, with parrot's event handling system. That is, when a notification happened it just fired off an event (potentially a very high priority event) and probably put into the event queue for later processing. Some notifications would be handled immediately, either because they were very high priority (altering object structures, for example), or could be refused (I can't think of anything in particular here, but there's no reason that a callback couldn't decide that some internal notification wasn't allowed), or had to be finished before the notification could be processed (if you were monitoring an object for destruction) so notifications aren't just a set of internal events, but pretty darned close.

On the end of the world doing the monitoring there were going to be a number of different means of monitoring, depending on what was getting watched. A lot of it would be done with vtable method overrides, some with a set of permanent monitoring queues, and a handful with special-purpose checking code.

It would've been pretty darned swell. Ah, well, maybe next time.

Posted by Dan at 10:35 AM | Comments (2) | TrackBack

June 18, 2005

WWIT: Calling conventions

If you've looked you might have noticed that Parrot's calling conventions are somewhat... heavyweight. Not particularly bad as these things go (they're actually very similar to the conventions you see on systems with lots of registers such as the Alpha or PPC) but still, heavier than folks used to only stack-based systems are used to.

As a recap for those not intimately familiar with parrot's calling conventions (as they stood a while ago at least -- things may have changed) the first eleven of each type of argument (PMC, String, Integer, and Float) go into registers 5-15 of the appropriate register type. The count of parameters in each register type go into integer registers 1-4, Int register 0 gets a true/false value noting whether this is a prototyped call or not (meaning that non-PMC parameters are being passed in basically), P0 gets the sub PMC being invoked put in it, P1 holds the return continuation (this can be filled in automatically for some invocation ops), P2 holds the object the method's being invoked on (if this is a method call), P3 holds an array with any extra PMC parameters if there are more than 11, and S0 holds the name of the sub you're calling (since subs may have multiple names)

Seems complex, doesn't it?

Let's think for a moment before we go any further. When calling a function, what do you need to have? Of course, you need the parameters. You need to have a place to return to. There has to be some indication of how many parameters you're passing in. (At least with perl-like languages, where the parameter list is generally variable-length) You need some handle on the thing you're calling into. Per introspection requirements perl imposes, you need to know the name of the function you're calling, since a function may have several names you need to know which name you're using when making the call, and if it's a method call you need the name of the method you're calling so you can look it up. If you're calling a method on an object you need the object. (And you thought this was going to be simple...)

The only required elements for a sub call are the count of PMC parameters, the prototyped indicator (which you would, in this case, set to unprototyped), the sub PMC, and the sub name. The parameters themselves aren't required since you don't actually have to have any. The return continuation can be autogenerated for you if you so choose, so it's not on the list.

So. Sub name, Sub PMC, prototype indicator, and parameter count. Not exactly onerous, and unfortunately required. No way around that. The biggest expense you're going to have is shuffling some pointers and constants around. (And, while I admit I resent burning time, it's hard to get too worked up about four platform natural sized integer moves per sub call, one of which, the sub PMC, can potentially be skipped if you fetch it out of the global store into the right spot in the first place)

The extras are just that -- extras. If you choose to do a prototyped call you need to fill in the counts for the other arg types. If you choose to not take advantage of automatic return continuation creation you need to create one and stick it in the right spot. If you've got way too many parameters, you need to put them into the overflow array. That's it, though.

The first thing anyone does when they look at this is want to start chopping things out. The problem is that there's really nothing to cut out. You can't chop out the object for method calls, that's kinda needed. You can't chop out the PMC for the sub being called, since you need a place to go. You can't skip using PMCs for subs for a number of reasons, which warrant their own topic, so I'll put that in a separate WWIT entry. You can skip the parameter count if you have functions with fixed parameter signatures (which we don't) or if you use a container that keeps count for you, which just pushes the cost off somewhere else (and ultimately makes calling more expensive, since you then need to move parameters out of the container and into registers). You could skip the whole prototyped thing, but in that case you either always use parameter counts or lose the ability to have non-PMC parameters. You can't chop the sub name out, since then you can't properly introspect up the stack to find the caller names (as any particular sub PMC could have multiple names) You can't chop out the return continuation since you need a place to return to when you're done. You can't chop out... well, we've run out of things to consider chopping out, and the best we've managed is to potentially change how the actual parameters are passed, but that doesn't make things cheaper or easier, it just shifts the cost and adds a little extra overhead.

Aren't engineering trade-offs fun?

Oh, and you can't even count on the sub you're calling being singly or multiply dispatched, so you have to leave the dispatching entirely up to the sub/method PMC being invoked. The HLL compilers can't emit code that assumes one or the other dispatching method. ('Specially since the method may change from invocation to invocation of a subroutine, as code elsewhere screws around with the definition of a sub)

Posted by Dan at 09:42 PM | Comments (4) | TrackBack

June 17, 2005

Some days ya gotta wonder...

I dunno, I just wonder some times about people's abilities to read. I really, really do.

It looks like yet another one of those "Everyone can have a webpage that makes the old geocities stuff look downright professional!" web hosts has found the background directory on the server here. (http://www.sidhe.org/backgrounds/ if you really care. I certainly don't) This time around it's myspace. Last time it was xanga, and I lose track back before that.

The index page is clearly tagged at the top with "make a local copy of anything you want and do not link to images on sidhe.org because I will fsck with you" and yet... still they link. And linking gets them this image: which is undoubtedly not the image they're looking for. (And if it is they oughta use the bare link and save me the hassle of putting in the redirect. If you want to use that image then go for it, good luck, you're welcome to my bandwidth for that one. Oh, and I suggest that you try not using white text, 'kay?)

Mostly it looks to be teenage girls and who knows, given how popular yaoi stuff is with 'em maybe it's what they want. Somehow I'm thinking the punk/metal skater dudes are probably looking for something else. (And again, if they're not, well... dark text, guys, it works better)

Ah, well, I suppose it gives me a chance to help make people's web surfing experience just a little more surreal.

Posted by Dan at 04:43 PM | Comments (1) | TrackBack

June 16, 2005

WCB: Loadable opcode libraries

Extensibility (to an extreme, perhaps) had always been one of the design goals of Parrot. This was on purpose -- if we learned nothing from history it's that people will take whatever you've got, break out the mutagens and gamma ray projectors, and have at it, because there's just no way you can anticipate everyone's needs in the future. So, rather than try and do that (we just looked at their needs in the present) we left a bunch of really big "WEDGE CLEVER THINGS IN HERE" spots into parrot.

Loadable opcode libraries are one of those spots.

A loadable opcode library is, basically, a library of opcode functions which are not built into parrot. The intention was that you could have a bunch of these sitting on disk as part of parrot's library, and load them on demand. (Either explicitly with code or, more likely, have parrot automatically load them for you based on metadata in the bytecode files) This dovetails nicely with the view that most of the opcode functions are just library functions with fast-path calling conventions. It also makes it possible to keep parrot's in-memory footprint as small as possible -- if you don't need the math or networking libraries, for example, you won't load them in, and don't pay the startup cost for them. (And yes, even if they're already in memory for other processes, there is a cost associated with loading them into your process)

What would you use them for?

Well, there were three big use cases.

First, there's the 'ancillary opcode/runtime library' case. The transcendental math ops were in this case -- they'd look like they were in the base set, but they'd only get loaded in if your bytecode used them, otherwise they wouldn't.

Second was the 'extras for languages/alternate bytecode loaders' case. That is, if you were a language compiler writer and you found that there were some operations that you needed that were essentially fundamental you could package them up into an opcode library and make sure code you emitted loaded the library up. (Again, probably using the metadata embedded in the bytecode files) This does require that the libraries be available to whoever gets your bytecode files, but that's not really a big deal -- this isn't going to be too common, and I don't think it's particularly onerous to have to install the, say, Prolog runtime libraries to run programs that have Prolog components. The same thing goes if you're writing an alternate bytecode loader -- it may well be a lot easier for the JVM/.NET/Z-machine bytecode loader to have a full library of JVM/.NET/Z-machine ops in an op library and just use those instead of recompiling the from the source bytecode to parrot's bytecode.

Third was the fast extension function case. This is one where your extension module explicitly declared that a number of its functions were actually opcode functions rather than traditional parrot functions. This was supposed to be fully supported. It wasn't supposed to be the general case, of course, since in general there's too much uncertainty around perl / python / ruby programs to do this as a regular optimization, but if you explicitly declare that a function is fixed at load time and can't be changed, well... that's OK.

Additionally, and very importantly, the list of opcodes was supposed to be per-sub. That is, rather than having one big table of opcode functions, mapping opcode number to function pointers, each subroutine would have its own table. (A table that might be shared, of course -- there's no point in having separate tables if they're all the same) This is a requirement for precompiled bytecode libraries to work, since it'd be really bad if you had a global opcode table but two separate bytecode libraries that each used separate extra opcode function libraries that mapped to the same opcode numbers. (That could be avoided by rewriting the bytecode when we load it, which we don't want to do, or by having a global opfunc registry, which we don't want either. Doing it this way is safest and easiest overall)

Posted by Dan at 08:40 PM | Comments (0) | TrackBack

June 14, 2005

Fie on the register allocator

One of the things that was plaguing me with $WORK_PROJECT was the interaction of parrot's register allocator with some of my... degenerate code. (Assuming you consider a single subroutine with 1.6M of source text and 20K+ temps degenerate. I certainly do) On the fast machine at the office it topped out at around a gig of memory consumed and somewhere around 360 minutes of CPU time. Needless to say... ick. Far from acceptable, and nearly all the time's in the register allocator.

I'd taken a shot at patching that up a while back, but ran into some issues. (Entirely internal to my compiler) I left the infrastructure in place, though, and this week I dove into it again. Took all of a day and a half to patch up properly. Now I use no virtual registers at all.

Now my big program takes 32 minutes to run through parrot to generate bytecode, and I think most of that time is due to bugs in the current register allocator (since that's where the time's spent, though there are no registers that need allocating at all). I may well toss the PIR entirely and generate pasm directly. Right now the only PIR features the compiler's using are function pre/postamble generation, function call generation, and easy keyed access, and all that's being generated in single subroutines. (I abstracted it all out, so changing, say, the function call code emission is simple -- change it in one spot in the compiler and every function call is fixed up) Switching to PASM generation's no big deal, and ought to get me a damned significant compilation time speedup, since PASM is just bytecode turned to text.

What're the takeaways here?

Parrot's register allocator as it currently stands gets degenerate pretty quickly
Completely bypassing parrot's register allocator isn't that big a deal if you're writing a compiler
Parrot's register count (32 of each type, with 16 of those not touched by the calling conventions) is sufficient for all my needs with space to spare (I never needed more than 13, and the Evil Program has some twisted code in it)
PIR isn't actually all that useful to a compiler, though it is tremendously useful for hand-written code

Point 4 was the most surprising of the lot. I really expected to get more of a win from PIR for the compiler, but the only advantage it offered, register spilling, turned out to be both not much of an advantage (because of how quickly the code turned the spiller degenerate) and not at all troublesome to completely ignore.

I should sit down and write up how parrot looks as a compiler target, as I'm the only person with a significant compiler targeting it with any time actually spent doing it. (And let me tell ya -- DecisionPlus ate nearly two years of my life and I wouldn't mind them back... :) Some of the design changes that were proposed to parrot when I cut myself off were a bit off-target, at least compared to my experience. I'll probably discuss that as well with Chip come YAPC::NA.

Posted by Dan at 10:16 PM | Comments (4) | TrackBack

That new contact info over there --->

As part of the whole "stepping away from Parrot" thing, I've unsubscribed from all the parrot and perl related mailing lists (including the cabal list), I'm not logging into perlmonks or use.perl, and I'm not doing IRC any more. Maybe someday in the future, but for right now... nope.

If you need to get in touch with me, e-mail works and I've usually got an AIM client (AdiumX -- quite nice) running, so you're welcome to ping me that way. No promises I'll answer -- I'm wrapping up the $WORK_PROJECT and looking for a new one -- but it never hurts to try.

Posted by Dan at 10:29 AM | Comments (4) | TrackBack

June 13, 2005

WCB: Overridable opcodes

One of the things originally planned for parrot was the capability of overriding the functions attached to most of the opcodes at runtime, lexically scoped. That is, for any particular block (or, more likely, subroutine or method) your code could change the definition of, say, the tangent opcode, or the read opcode.

That sounds silly, doesn't it? I mean, to be able to change, at runtime, the basic components of the interpreter. That's insane!

Or not. First, because you're not allowed to override them all. The basics (any of the ops that can be JITted) are fixed, so you can't go changing how bsr or exit works. Second, remember, as far as parrot is concerned, an opcode is just a low level function with a fixed signature and fast calling scheme. That's it. Nothing at all fancy. "Opcodes" are just a combination of core engine functions and basic (but extendable) runtime library. (Since you do, after all, want your low-level runtime library functions to be as fast to call as absolutely necessary)

Sure, you could, if you want, use the more generic function call scheme to call those functions instead of making them callable using the opcode function mechanism, but that just means that the function calls are slower. (As parrot doesn't have a faster call scheme than the one opcode functions use. Even if you chopped bits out of the current calling conventions there's still more overhead, and you can't get rid of it) Somehow just doesn't make sense to me...

Anyway, on top of private opcode definitions (in those cases where you want to have your own ops) this allows for a lot of instrumentation to be applied. With the exception of that core set that you can't change, everything else is potentially up for grabs, and that means that if you do want to get in the way of how code executes (lexically!) you can. While this certainly isn't something you'd want to do a lot, actually being able to do it can come in handy. (Granted, in those situations you hope you're never in. Alas those are the ones you inevitably end up dealing with)

The cost is that you can't JIT those ops, nor can you inline their bodies in the switch or computed goto cores. This is generally an acceptable cost, since the ops which you do this with are ones that you don't execute often enough for the performance penalty to be offset by the potential utility of overriding the ops. (Especially since this will most often be done with runtime library functions, in which case they're probably not JITtable anyway, and even with the slowdown from the indirect low-level function call it's still faster than a parrot function call)

Posted by Dan at 04:05 PM | Comments (3) | TrackBack

WWIT: Fast interpretation

Parrot is, as an interpreter goes, pretty damn fast. Not as fast as it could possibly be, a lot faster than many interpreters at what it does, and can be faster still. (Heck, you can enable optimizations when building Parrot and get a good boost -- they're off right now since it's a pain to pull a core file for an optimized binary into a debugger and do anything useful with it) A lot of thought went into some of the low-level design specifically to support fast interpretation.

There are a couple of reasons.

The first, and probably biggest (though ultimately not the most important) is that I thought that building a cross-platform JIT was untenable. That turned out not to be the case, at least partly. Building a framework to allow this isn't as big a deal as I thought. That doesn't mean you get a JIT everywhere, though. (You want to write a Cray JIT?) Getting a functional JIT on a platform and maintaining it is definitely a big undertaking and, like any other all-volunteer project, Parrot's engineering resources are limited and somewhat unreliable. Getting an interpreter working relatively portably was a better allocation of those resources, leaving the JIT an add-on.

The second, and more important by far, reason is one of resource usage. The rest of this entry's about that.

Perl, Python, Ruby, and PHP are used heavily in server-based environments, and if you've ever done that you know they can be... slow. Oh, not all the time, and there are ways around the slowdown, but... slow. Slower by a factor of 200 in some cases. (Though if your code's that much slower it's normally a sign that you're really not playing to your language's strengths, but sometimes you can't do that) Needless to say, you'd prefer not to have that hit -- you want to go faster. I mean, who doesn't?

The normal answer to that is "JIT the code". It's a pretty good answer in a lot of cases. Except... except it's amazingly resource-heavy. First there's the CPU and transient memory cost of JITting the code in the first place.There are things you can do to make JITting cheaper (Parrot does some of those) but still... it's a non-trivial cost. Second, JITting the code turns what could be a shared resource (the bytecode) into a non-shared one. That's a very non-trivial cost. Yes, in a single-user system it makes no difference. In a multi-user system it makes a huge difference.

As a for example, $WORK_PROJECT has a parrot executable that's around 2M of bytecode. Firing it up without the JIT takes 0.06 seconds, and consumes about 10M of memory per-process. (15M total, but 5M is shared) Firing it up with the JIT takes 9 seconds and consumes 100M of memory per-process. (106M total, with 6M shared) On our current production machine we have 234 separate instances of this program running.

Needless to say, there's no way in hell we can possibly use the JIT. The server'd need somewhere around 23G of memory on it just for this one application alone. (Not to mention the memory needed for the other 400 programs being run, as when I last checked there were a bit over 600 total parrot-able programs running) The only way to make this feasible is to interpret the bytecode. (And is another reason for us to have directly executable bytecode, since it means we can mmap in the bytecode files and share that 2M file across those 200+ processes, and fire up Really Fast since it's already in memory and we don't have to bother reading in all 2M of the file anyway, just fault in the bits we need) Note that this isn't exclusively a parrot issue by any means -- any system with on-the-fly JITting (including the JVM and .NET) can suffer from it, though there are things you can do to alleviate it. (Such as caching the JITted code to disk or having special OS support for sharing the code, which general-purpose user-level programs just can't count on being able to do)

(Alternately you could use this as an argument for generating executables. That'd be reasonable too, assuming your program is amenable to compilation that way, which it might not be)

Then you've got the issue of enforcing runtime quotas and other sundry security issues. Some of these need to be done as part of regular interpretation (that is, you need to wedge yourself in between ops) and if you don't interpret quickly, well... you're doomed. It still isn't blazingly fast, as there's a cost involved you can't get around, but you can at least minimize that cost.

So what does fast interpretation get you? It gets you a performant portable engine, it gets you some significant resource savings, and it allows for fast security. Not a bad thing, overall. And good reasons to not be JIT-blind.

Posted by Dan at 03:46 PM | Comments (2) | TrackBack

June 11, 2005

WWIT: All those opcodes

This is one that comes up with some frequency -- why the hell does Parrot have all those opcodes. It's wasteful!

Bullshit. What it is is fast.

Opcode functions have two points. The first it to provide basic functionality. The second is to provide a fast function call interface. We'll take the second point first.

Parrot's got three official ways to call a function. The first is with the basic parrot function call. You fill in the registers with your parameters, set the parameter counts, and invoke the sub PMC. Not slow, but not horribly fast either. People tend to gripe about that, but it is, bluntly, the lowest-overhead general purpose solution that we could get. Perl puts a lot of requirements on function calls, as do the other dynamic languages, and providing that information takes a little work. That's fine, it's not that big a deal. Languages also are under no obligation to respect the calling conventions for any function or subroutine that's not exposed as a parrot-callable function. That is, if you're writing a Java compiler, say, and don't like parrot's overhead then... don't respect our calling conventions. Or, more reasonably, internally use your own, whichever's best, and then provide versions with shim wrappers that do respect the conventions for other languages to call.

The second way to call is as a method. That's got all the overhead of a function call plus the search for the actual thing being called. This is not a problem, that's what you want with method calls -- it's what you asked for, after all. Given the dynamism inherent in perl, python, ruby and friends, there's no way around it.

The unofficial way is with the bsr/ret pair. Only suitable for internal functions, it's still quite fast and damned useful. There's no reason your compilers shouldn't use this internally where needed -- it's handy.

The fourth way is the opcode function. This is the absolute lowest-overhead way to invoke a function. Unfortunately those functions (right now at least) have to be written in C, but that's just fine. The important thing here is that they are very, very fast to invoke, since there's essentially no overhead. No putting things in registers, no setting up calling conventions, nothing -- they're just invoked.

In fact, one thing that was planned was that modules could provide things that looked like functions but were actually exported ops (using the loadable opcode library system). That is, your code did a foo(bar), but the compiler knew that foo was an op and emitted:

foo P16

or wherever the bar happened to be. Moreover, module authors could slowly migrate their code from an HLL, to C code via NCI, to opcode functions. In some cases compilers could actually generate opcode functions from the source, though that does require the compiler in question to be able to generate C code. (But, conveniently, parrot can generate C code from bytecode...) When you think about it, and you really should, there's no difference between an opcode function and a regular HLL function with a compile-time fixed signature (except for the difficulty in generating them from bytecode).

Just to be real clear, since it's important, opcodes are just library functions with very low call overhead. That's it. Nothing fancier than that. They're not massively special internal anything. They're just functions that are really cheap to call. Cutting down the number of opcode functions is not sensible -- it's foolish. Any library function that could reasonably have a fixed number of arguments and not need the calling conventions (and not need to be overridden) should be an opcode function.

Concentrate on the function part. Not the opcode part.

Posted by Dan at 06:43 PM | Comments (0) | TrackBack

June 10, 2005

WWIT: Generating executables

One of the things parrot is supposed to be able to do, and currently does (albeit with on and off breakage) is generate standalone executables for parrot programs. That is, you feed in source or bytecode and you get back something that you can run the linker against to give you a standalone executable, something you can run without having parrot installed anywhere.

Interestingly, this is a somewhat controversial decision. I'll get to that in a minute.

The upsides to doing this are several:

Distribution is easier
No versioning problems
Execution's faster
Fewer resources used in multiuser situations

And of course the downsides:

You can get a lot of big executables with a lot of overlap
Some of the dynamic features (on the fly compilation, for example) are somewhat problematic
Bugfix upgrades don't happen easily

Now, it's very important to keep in mind that generating executables is not a universal solution. That is, there are times when it's the right thing to do, times when it's the wrong thing to do, and times when it's kind of a wash.

Building executables has never been a general purpose solution, and in most cases the correct thing to do is to either run a program from the source, or run it from the compiled bytecode. (and there are plusses and minuses to each of those) However...

The problem with all the 'scripting' languages is packaging and distribution. Not just in the commercial sense, which is what a lot of people think of (and which causes a lot of the knee-jerk reactions against it, I think), but in the general sense. If I have a program I want to distribute to multiple users, it's a big pain to make sure that everything my program needs is available, especially if I've followed reasonably good programming practice and actually split my code up into different files. In that case I've all the source to my program that needs to be distributed, and a list of modules that the destination system has to have installed, along with their prerequisites, along with possibly the correct version (or any version) of the driving language.

This isn't just a problem with people looking to distribute programs written in perl commercially, or looking to distribute them in a freeware/shareware setting. It happens institutionally, a lot.You may have ten, or a hundred, or ten thousand desktops that you need to distribute a program out to. The logistics of making sure everything is correct on all those desktops is a massive pain in the ass, made even more complex by the possibility that you've got multiple conflicting requirements across multiple apps. (That is, you've got one app that must have perl 5.6.1, another that has to have perl 5.8.x but not 5.8.0, a third that requires one particular version of GD, and a fourth that doesn't care but has been tested and certified with one particular set of modules and you can't, either by corporate policy or industry regulation, use anything else)

That's when the whole "just install [perl|python|ruby] and the requisite modules" scheme really sucks. A lot. Pushing out large distributions with lots of files is a big pain, and pushing out several copies is even worse. Then there's the issue of upgrading all those desktops without actually breaking things. Ick.

This is where building standalone executables is a big win. Yeah, the resulting file may be 15M, but it's entirely self-contained. No worries that upgrading some random module will break things, no need to push out distributions with a half-zillion files, and if you want to hand your app to Aunt Tillie (or random Windows or Mac users) you've got just a single file. No muss, no fuss, no worries.

Yes, it does mean that end users can't upgrade individual modules to get bugfixes. Yes, it does mean the executables are big. Yes, it does mean there may be licensing issues. Yes, it does mean that pulling the source out may be problematic. Those are all reasons it's not a good universal solution, not a reason to not provide the facility for times it is. (That people have felt the need to roll their own distribution mechanisms to address this problem in the current incarnations of the languages is an indication that it is a real problem that needs addressing)

Like many other problems that there were multiple implementations for (like, say, events) Parrot provides a solution as part of the base system so folks can use their time reinventing other wheels more productively.

Posted by Dan at 12:27 PM | Comments (6) | TrackBack

June 09, 2005

Continuing ever onward

As I go typing up notes and such, I figured I'd write this up as well.

A couple of days ago I made vague reference to the large number of continuations $WORK_PROJECT creates when running reports, and its heavy use of them in general. Since it's a pretty good example of places that're worth using continuations, I figured I'd go into some detail.

Assume, for a moment, that you've got an interactive application that has a built-in menu system. The user chooses a menu option, a subroutine is called, and at some point control gets dropped back to the menu. (Your basic modal application. We shall set aside the modal / non-modal argument here, since that way lies vi vs emacs fights, and we just don't want to go there. Well, not right now at least) At any point in the execution of the code, even deep in nested subroutine calls, the program can bail back to the menu. Basically you want to dump whatever you're doing and just wait on input again.

Now, how would you normally do this? You might think that since you're calling into the menu selection handling sub, you could do a plain return from the sub call and get back to the menu. That doesn't work, though, since you may be two or three (or ten, or fifty) calls deep when the return to the menu happens, so something more complex is needed.

What's needed, essentially, is to backtrack all the way to the menu code whenever there's a bail-to-menu statement executed.

There are a few ways to do this. We'll assume for the moment that you've got full control over the compiler for the language in question. That's not necessary for some of these things, but it helps with others.

1) You could set things up so that there's a status code of "exit to menu", and the code for every call to a subroutine checks the status. This is, needless to say, really error-prone, not to mention tedious. Yech.

2) Since we've got control of the compiler, you could force it to emit that code for you. This isn't actually all that bad, as it just adds a little bit of boilerplate in one or two spots to note that the status is set and do an automatic return (propagating the status) if a called sub returns because of an exit to the menu.

3) You could throw an exception, if the system supports it. (I could do this, as it's parrot, but we're speaking generically) This can be an issue since you may have intermediate exception handlers that might catch and have to rethrow the exception. There's also the cost of unwinding the stack looking for exception handlers. That's not a huge problem, though, since you're then going to be waiting on user input, and if the unwinding takes long enough to notice you've really screwed up somewhere.

4) The compiler, since it knows about all the internal structures, can just mark where the stack is (amongst other things) and generate code to put things back in shape, skipping right back to the menu code.

5) Just take a continuation that'll get you back to the menu input code

Now, #4 is almost #5. Arguably it's the same thing, but it depends on the sort of low-level access you've got. And generally #5 is easier for app code, since something else is doing the work.

As a for example, the code I'm using looks like this to set up a continuation to return to the menu and save it in the global store for later fetching:

$P3486 = new Continuation
set_addr $P3486, MainLoop
store_global 'decision::prompt_continuation', $P3486
MainLoop:

and to jump back to the menu because code said bail:

$P3508 = global 'decision::prompt_continuation'
invoke $P3508

The $Pxxx things are just parrot temps. As you can see, it's... simple. Really simple. No muss, no fuss, works just fine. And the nice thing is that I can reuse the continuation over and over, since continuations can be reusable. Woohoo! Efficiency even, combined with (as you can see) dead-simple code. It's a good thing.

We could certainly take the continuation after the mainloop label, but it worked out better for me in this case. (There's some elided code between the continuation and the mainloop label -- $WORK_LANGUAGE declares that initialization code runs before the main loop and can bail to the main loop at any time)

Isn't simple nice? Even better is that you can do this and not know squat about how continuations work. They just do, and it can all be magic.

Posted by Dan at 01:56 PM | Comments (3)

June 07, 2005

Snapping the Tiger

On IRC today, Chris Nandor was remarking on how spotlight was chewing up vast amounts of CPU time constantly reindexing Eudora mailboxes when they're being downloaded. I was thinking he was wrong, since Eudora's traditionally pinned the CPU when downloading mail for me, and 100% is 100%, right?

Well, turns out Chris was right. The importer does suck down a lot of CPU time, especially on large mailboxes. I dunno about you, but spotlight indexing is pretty useless for me with Eudora, what with everything globbed into big boxes. (Finding that a 10K message mailbox matches my search does not help) Chris just nuked Spotlight, but I like it enough to keep it around, but putting Eudora's mail folder into the forbid list definitely speeds things up. And, if like me you've got a half-gig or so of mail folder, it saves some on spotlight database size too.

Posted by Dan at 08:59 PM | Comments (0) | TrackBack

June 04, 2005

It's a funny old world...

Almost five years ago (five years next month) Perl 6 got set in motion, at whichever TPC it was. (4? 5? I don't remember) We had a whole bunch of people volunteer to handle bits of the project -- I took on the dev lead hat. (At this point I think, of the original volunteers, only Larry remains) A couple of months ago I gave up the hat, and now I'm giving up parrot development altogether. Chalk up another developer driven away.

I may spend some time writing up explanations of why things were done the way they were over the next few months, along with designs for some of the systems that never got (and likely never will get) implemented. I'll probably let the blog peter out to nothing after that. I'll leave it up since there are links into it, but if I feel the urge to blog I'll start up something fresh elsewhere, since it likely won't be particularly technical and it'll forestall the whining about me writing about things nobody cares about.

And no, my post-mortem on things won't be public, though I am going to put one together. It's always a good idea to look back on why you've bailed on a project.

Posted by Dan at 09:52 AM | Comments (31) | TrackBack

June 01, 2005

Harnessing Evil for the power of... well, less evil

There are days I think the $WORK_PROJECT is an exercise in extended programming irony.

The parrot code my compiler generates for $WORK_LANGUAGE makes heavy use of continuations. Really heavy use of continuations, to the point where reports are taking (and discarding) three continuations per record plus another two or three per page of the output.

For a language which doesn't have functions, blocks, or lexical variables, and whose idea of sophisticated control flow is goto or gosub to bare labels.

Go figure.

Posted by Dan at 11:16 AM | Comments (2)