July 29, 2003

Why go with CPS for Parrot?

This came up on the perl6-internals list a while ago. I ought to reply there, so I will, but I figure the reasons might be useful to other people, so I'm going to put it up here as well.

If you read back through the blog, it's pretty clear that while I like CPS, I wasn't convinced that CPS was the way to go, but if you look at the parrot code now, we use CPS. Why?

First, it's important to know why I wasn't going to go with CPS. That's pretty straightforward--fear and uncertainty. I was uncertain they were sufficiently useful, and afraid that if we did go with them we'd end up with an engine that nobody would create code for and nobody'd be able to work on because we were using a scheme (pardon the pun) that noone understood. I don't mind pieces of something that only a dozen or two folks at any one time feel comfortable working on--you don't need that many people poking at the internals of the garbage collector--but we need to make sure that things are at intelligible to at least some folks.

The uncertainty about their utility has been dealt with. Partly because of some of the needs that parrot has, and I'll talk about that a bit later, and partly because I've just gotten more comfortable with them. I had a fair amount of lingering discomfort with them that needed dealing with. The fear about scaring people off with them is gone as well--I understand them, I know how to explain them to people quickly and reasonably simply, and in a way that can deal with the knee-jerk fear often associated with them. (I'm convinced at this point that all of the fear people have about continuations is a direct result of how they're taught and what's associated with them, though that's a rant for another day.

The reasons to use them fall out of two needs--the need for optimizations, and the amount of context that needs to be preserved across function calls.

Optimizations are easy--having a CPS makes doing tail calls dead-simple. We can't always do tail calls, of course--they can have some issues with destruction timing guarantees and while we don't explicitly make those, it's not nice to surprise people when they haven't asked for it. Still, it's there for the using, and it means the difference between a deeply recursive algorithm dying a horrible death and running to completion. Besides, in a managed environment (and we are managed--while everything will ultimately be open to inspection there'll be some limits to what you can actually change) it can be quite difficult to implement tail calls without engine help. It's trivial to provide it, difficult to make it happen if we don't, and useful for the optimizer. Not a tough choice.

Calling context is another big one. We have gobs of stacks, registers, context, and whatnot that need preserving, putting us far past the "stick the return value on the stack and branch off to wherever" style of calling code. Not only is there this stuff that's required at the user level, the interpreter has a lot of stuff it potentially needs to save--what runloop you had, what your security context was, what flags were in effect, and so on. What we were going to do, pre-CPS, was, in the caller:


and the callee would do a:

The call would push all the interpreter info onto the call stack, along with the return point. Return uses that information to put things back on return. With a CPS style, that turns into:

with the callee doing
invoke Px

with Px being the register holding the return continuation in it. Generally P1, but that's not required. (In both cases, the sub/method to be invoked has to be in P0 so no parameters are needed for the call) You'll note two differences: We spell return "invoke" in the CPS version and give it a parameter, and give call the "cc" suffix. Ooohhh, scary!

We could, of course, make the state saving explicit, rather than rolling it up with the call in the first example, but since we must save the state in all cases, there's little point in doing that. Mandatory state saving is fine, making it mandatory but not automatically doing it is silly. Someone'll forget and things will go bang, or we'll get odd security holes.

Interestingly, it also means that if you squint a bit and don't think too hard about it (and generally you won't) the only difference between a CPS scheme and the old scheme is we pass in the "return object" as a parameter, rather than put it on the stack. We could even rename the ops if we so chose, though the old jsr/ret style of calling is being preserved for a variety of reasons, so we're not going to.

Security is something of a concern--since a sub/method boundary is also a privilege boundary (which is to say that code on one side of a sub may have more or fewer privs than the other side) all the state saving, restoring, and respecting has to be handled by the interpreter. That's a good reason to not even give the option of managing interpreter state manually, since that's dangerous. It may be possible with sufficient bytecode verification and proper security design, but that's a lot more work than just managing it all ourselves and not giving user code the opportunity to screw it up. (That, after all, is the job of the interpreter writers! :)

Going CPS isn't entirely without extra effort, but its effort that's hidden under the interpreter covers, where it ought to stay. Compiler writers and people generating assembly by hand don't have to deal with it, as they ought not have to.

Posted by Dan at 09:59 AM | Comments (2) | TrackBack

July 28, 2003

Parrot op count

I just did a quick scan through parrot's opcode files. We have, as of now...

261 ops.

Yep, that's it, 261. Barely more than .NET or the JVM. That's counting multiple variants of the same op as one op, which isn't an unreasonable thing to do, as the only difference between the open op that takes a string and one that takes a PMC is the PMC version needs to extract out the string from the PMC before opening.

Now, if you count actual op bodies, we're at 527 ops, which isn't an unreasonable number to count either, as that's the number of functions that need maintaining. At this, at least for right now, we're still somewhat below the number of op bodies you'd need for a .NET interpreter. (While .NET has fewer opcodes, many of those opcodes are untyped, and need to check the types of the elements on top of the stack, so while it has one add op we might have 9. We're just making explicit what .NET does implicitly, but the amount of code's the same since we both need to have 9 types of addition. (Or 18, if you consider immediate constants as separate types))

If you go full-on and count actual opcodes in the generated source, we're currently at 1097. That number's mostly irrelevant as nobody ever sees those op bodies, as they're all generated from the input data files. The only people who have to deal with that are folks writing JIT versions of the ops. (This is a place where we win a bit over .NET, since our ops are simpler, so we don't have to do nearly as much analysis to get the equivalent results) Since we're as much concerned with interpreted as JITted or compiled performance, as we can't count on the JIT everywhere, nor on the JIT when running with enhanced security, the fanout's a win for us.

Not nearly so many as I'd expected, which is cool.

Update Of course there's always the question "Why do I care?" Well, a few reasons.

First, folks seem to like throwing the "You have a zillion ops and that's insane! We have a small number and that's just swell!" thing at me. I don't care all that much, but it's nice to be able to say "Well, actually, that turns out not to be the case."

Second, there's the issue of brain space. If we really did have a zillion ops, that makes the interpreter potentially more difficult to master. Not a huge issue, as anything we didn't make an op would just be a function, and there's not a huge difference between a zillion ops and a zillion functions (except that the ops, even non-JITted, would run faster) but still, it's there.

Finally there's a maintenance issue. That's a lot of op bodies to maintain, fix if some fundamental structural flaw pops up, secure when it's time to enable security, and generally thump about. But, again, not much of a difference there between a zillion ops and a zillion functions.

Posted by Dan at 01:53 PM | Comments (0) | TrackBack

July 21, 2003

More cool Cocoa dev tools

In the past I think I've ranted about the Cool Thing that is Cocoa Browser. (If not, I should, it's damn useful) Well, there's a companion of sorts to it, AppKiDo. While Cocoa Browser lets you browse the docs for individual classes in the Foundation And AppKit frameworks (showing the hierarchy, of course) AppKiDo goes one step further and allows you to browse the docs for all the parent classes for a class simultaneously. This is useful, as the Cocoa frameworks make heavy use of inheritance, and it's not uncommon to have to bounce up a class or three to find a method.

While AppKiDo's not quite as Visually Swell as Cocoa Browser, it means you don't have to dig through a dozen parent classes to find the method you're looking for--it shows (if you ask) all the methods that an object of class NSX can perform, regardless of which class in the hierarchy is providing the method.

I like 'em both, and when I'm doing CamelBones stuff, I keep them both in my Dock.

Posted by Dan at 12:03 PM | Comments (0) | TrackBack

July 20, 2003

New cellphone

The dying batteries in my current cell, the lousy service from my current carrier, and the upcoming trips to europe have finally pushed me over the edge--I've gotten a new cell phone, complete with new phone number. It's a mild pity that I couldn't keep my current number, but since it's not a local call from my house it's no big deal.

So, if you have my current cell number, you might want to get in touch to get the new one. I should probably get some new business cards, too...

Posted by Dan at 09:53 AM | Comments (0) | TrackBack

July 18, 2003

Un-pigging Acrobat Reader 6.0

Acrobat Reader 6.0, as it ships by default, is an utter and total pig. Takes forever to start, eats great gobs of memory, and its search capabilities are, for simple searches, significantly sub-par relative to old versions.

News from MacOSXHints is that a lot of the gook can be turned off, and you can actually get Acrobat 6.0 to load faster and slimmer than 5.x by turning off most of the plugins. (To the point where it launches in about a quarter the time it used to for me)

If you're on OS X, click on the acrobat icon and choose Get Info from the Finder menu, and open up the plugins part. Uncheck anything you don't want to load. (Personally I have just EFS, PrintMe, Reflow, and Search loading, all the rest are off) In the application menu of Acrobat there's an "About plugins" menu item that tells you what each bit does. What I have disables a lot, but that's just fine by me, as I can turn it back on if I really want. (The only even mildly interesting one is the update checking, but since I generally find that more annoying than useful, no great loss)

Posted by Dan at 03:05 PM | Comments (1) | TrackBack

Articles online

Looks like the Parrot and Perl 6 articles Damian and I wrote for Linux Magazine (which appeared, amusingly, in the April issue) are online. If you don't have the magazine, go take a look.

Posted by Dan at 02:38 PM | Comments (0) | TrackBack

July 17, 2003

Pie-thon update

Apparently Guido's not particularly worried about Parrot. (And Miguel thinks Parrot's based on religion, though that's a second-hand quote) Works for me. Proof'll be in the pudding, or in this case the pie.

Posted by Dan at 11:27 AM | Comments (6) | TrackBack

July 14, 2003

The Python vs Parrot challenge is on!

I should've posted this during OSCON, but you'll notice a distinct lack of posting in that period. :)

Anyway, the Python vs Parrot challenge is official, and the rules are simple.

  1. The python folks will put together a benchmark program and compile it to Python 2.3 bytecode by December 2003. The base program should run at least 30 seconds so that the inevitable cache wobbles don't significantly affect the time
  2. Both sides can massage the bytecode as needed into a native format, done off-line.
  3. Latest stable CVS is used for the tests
  4. Builtin functions only, no non-python libraries allowed.
  5. No I/O. All data has to be static
  6. Best of 3 runs, on the same x86 Linux box, provided by the conference
  7. Output is compared to a stock python running the reference bytecode, and should be the same

The stakes are $10, a round of drinks, and a pie at 10 paces. Guido's not chosen his pie flavor yet. :)

The reasons for a few of those might be non-obvious, so some explanations.

#1 is because I want a benchmark, and don't know enough python to write one. Besides, this way there's no squawking about a bogus benchmark.

#2 is useful for both of us--me because I'll need to massage, Guido because he may have The Next Best Thing by then.

#3 so we get the latest Fast Thing in case either of us has done better since the last big release

#4 because this is a test of the interpreter speed, not our C compilers

#5 was Guido's but that's fine, I don't want to get penalized for disk latency

#6 is to cover the possibility that something fires off in the background -- it's not either of our fault if a cron daemon decides to go checking its data and blowing caches

#7 ought to be pretty obvious :)

Posted by Dan at 03:27 PM | Comments (3) | TrackBack

Direct and indirect feature usage

Had dinner wednesday night with a whole bunch of folks in the .NET and dynamic languages community at OSCON here. It was interesting, though I'm sorry I missed the .NET/Dynamic Languages BOF earlier in the day. (I was setting up food for a party and, after all, one must have priorities :) It was a good opportunity, as the static and dynamic language folks tend to talk past each other, and .NET is definitely in the static language camp.

Anyway, one of the things that came up at the dinner was the issue of "Why do you need that feature? What are you doing with it?" It's the standard question, one people ask when you're looking to do something that they either don't understand or do understand but don't see the point of because almost nobody will use it. It's a good question, too, one that should be asked. But...

The question misses the mark in some ways, because people tend to merge the use of a feature in general with writing code that explicitly uses a feature. For example, take Perl's source filter feature. (I bet you were thinking I'd trot out continuations) It lets you install an arbitrary piece of code that can perform any sort of transformation on source code before it gets fed to the parser. It's cool, but fiendishly difficult to use. Even with the Filter::Simple module it's a major pain. And it was added in for the sole purpose of making on-the-fly decompression/dearchiving of source files possible. (So you could keep compressed and/or tarred source)

Now, there are a bunch of modules on CPAN that use source filters. All but one, so far as I know, are gag modules. (Many of the Acme modules, including Acme::Bleach and Acme::Pony, are source filters) Actually using source filters to do real things is massively difficult, so arguably they're of no use, because nobody uses them, right?

Well... Nobody uses them directly. Indirectly, though, they get used by a lot of people, because the SWITCH module uses them, and people use that. SWITCH provides a very nice switch statement to perl, nice enough that it got bundled in with 5.8.0 as part of the standard library. The same thing can be said about Lisp macros (though I don't see what the big deal there is), continuations, coroutines, threads, and even closures. Nobody, to a first approximation, uses them in general, and certainly nobody uses them in languages that don't have them. And they're all hard concepts in many ways. (Yes, even closures--you've just gotten used to them) So shouldn't they be left out?

The answer is "Not because they're hard concepts." Which isn't to say they shouldn't be left out, just not because they're hard. Each of those concepts has a number of very interesting things you can do with them that's between difficult and impossible to do without them. (Yes, yes, I know--with continuations and Lisp macros you can do anything, as long as it still looks like Lisp...) Many, perhaps most, perhaps even the overwhelming majority, of programmers will never use them well. That's fine.

They aren't there for the majority of the programmers using the language.

That's the point people miss. You don't provide source filters, continuations, macros, or whatever because everyone will use them, you provide them so the two or three people who will use them can produce things that would otherwise be insanely difficult. What people do use is what those few folks made, and that is what makes the difficult stuff worth providing.

Posted by Dan at 03:06 PM | Comments (8) | TrackBack

July 12, 2003

Interesting things found

Gameboy Advance cartridges are 32MBits in size, but you can get 64 and 128MBit developer carts. That's 4, 8, and 16MBytes each. And there are GBA devtools for OS X. (I just found a 256MBit cartridge, which is 32MBytes of space)

Hmmm... a stripped parrot is only 1.5M...

Posted by Dan at 05:57 PM | Comments (0) | TrackBack

Woo! OSCON's over

And that's just fine by me. I like conferences, but they're awfully draining, even when I'm dodging most of them. (I always have delusions that I can get some sort of useful work done, which alas never seems to be the case)

Posted by Dan at 02:17 PM | Comments (2) | TrackBack

July 03, 2003

The first high-level(ish) Parrot CGI program

In BASIC, but parrot nonetheless. Go take a look.

Posted by Dan at 06:33 PM | Comments (0) | TrackBack

Cool new book

I've been hitting Borders with some regularity, about every week or two, for new reading material. Usually the latest New Scientist is enough, but if my timing's bad or I'm bored, I'll wander over to the graphic novel section to browse. (One of the pleasant side-effects of the surge in popularity of Anime in the 'states is the surge in popularity of manga collections, which has helped drive american graphic novels as well. That's the subject of a separate entry, though)

Anyway, I'm always up for a good book. I stumbled upon Courtney Crumrin and the Night Things (Oni Press) by Ted Naifeh. The cover's very reminiscent of Edward Gorey and Charles Addams works, two illustrators I very much like, so I picked the thing up. (Invader Zim, too, and who can go wrong there?)

All I can say is... buy it. Right now. Go out, find the thing, and take it home with you. It's great.

Posted by Dan at 03:51 PM | Comments (4) | TrackBack

July 01, 2003

CPS and tail calls--two great tastes that taste great together

Or something like that, at least.

Using tail calls is a nice way to not obliterate your stack, and close to a requirement for the functional language family. In a traditional calling scheme, where stuff goes onto and off of a stack, implementing tail calls can be a bit of a pain. Not a huge pain, mind, but a bit of one, as you have to screw around with the stack some, which is often awkward, painful, or just annoying. I hate complex stack manipulation. Fooey.


If, rather than using a stack calling system you use a CPS system instead, almost all of the pain goes away, which is always nice. (If you don't guarantee timely destruction, all of the pain goes away, which would be nice if timely destruction wasn't actually nicer) How, you ask?

Well, if you remember with CPS, when you call a function you pass in a continuation that represents the place you're going to return to when you're done. In the normal case, a continuation is taken immediately before the call (with a resume address for the code right after the call, otherwise you'd call over and over and over...) and passed into the function you're calling. If you remember with tail calls, when you make a tail call you pretend to be whoever called you, so it's like you never existed.

If you think for a second, how they work together is pretty obvious. In a CPS system, making a tail call is as simple as calling a function and passing in the continuation you got, rather than generating your own. Which is about as dead simple as you can get, unless your code generation system loses the passed-in continuation, in which case you've got bigger problems. In pseudocode, a function making a tail call that looks like:

sub a (int foo) {
  // Do some stuff
  return b(foo + 1);

gets turned into

sub a (int foo, continuation hidden_continuation_thingie) {
  // Do some stuff
  b(foo + 1, hidden_continuation_thingie);

Since function calls in a CPS system are just gotos (and remember, we're working at the implementation/machine level here--gotos are generally necessary, normal, and not particularly evil) there's no need to clean up a's state before making the call, as the GC system will clean up after it soon enough.

Without CPS, it'd look rather messy, and a plain source transform won't do it (which is another nice thing about CPS--while there isn't really a source transform before compiling, there could be) so it'd look rather confusing and I'm not going to try and sketch it out. (Besides, I've already done that in the tail call entry) With CPS, well... tail calls are remarkably simple and elegant. And, while simple and elegant doesn't necessarily transform to fast (in fact, it often transforms to slow, as the complex messy stuff generally runs faster) in a language where continuations are a requirement it falls out with no extra cost, something that I always like, being something of a speed fiend. In fact, with a CPS system in place, doing tail calls makes things faster, which is something I definitely like.

Posted by Dan at 03:36 PM | Comments (0) | TrackBack