May 07, 2003

What the heck is: Continuation Passing Style (CPS)

Before you go any further, make sure that you understand, at least a bit, about continuations. I wrote about them earlier, and there's plenty of material around the internet to confound and confuse you on the subject. (And bemuse and horrify, but that's the 'net for you, I suppose)

So, we'll assume you understand continuations. As a refresher, they're a sort of super-closure, one that remembers both variables and the call chain. When you take a continuation you get a thingie that remembers all the lexicals that were in scope when it was created, as well as remembering the call sequence you took to get there. (And if the places you were had lexicals in scope, the continuation captures them too, since they'll need to be put back in place again if you use the continuation)

Now, consider the humble function call:

foo(1, 2, 3);

What happens?

Well, in a standard system, when your code gets to the function call it does a few things. First, it pushes an address onto the stack. Then it puts the parameters either on the stack or into registers, depending on how many registers your CPU architecture had handy when the calling conventions were established. Finally it calls the function.

When the function does its thing it yanks the parameters out of wherever they were (on the stack, or in registers) and then it does what it's supposed to. When the function is done and has to return, it puts its return value somewhere (if it has one) then pops the address off the top of the stack and jumps there. No biggie.

Got that? Caller pushes the return address, the address of the instruction to execute when the function call is done, onto the stack. Caller then calls the function. Function does its thing and, when its done, removes that return address from the stack and goes there.

It's actually pretty simple, and a lot of hardware actually has built-in support for this function calling mechanism--there's the jsr (and sometimes bsr) instruction that pushes a return address onto the stack and jumps into a subroutine, and the ret instruction that pops an address off the top of the stack and jumps there. Different CPUs may have different names for these operations (and some don't have bsr, just jsr) but they do the same thing, and they make subroutines easy for the assembly language programmer. Unless you need to pass parameters in and out on the stack, in which case they get really annoying (since the parameters are beneath the address on the top of the stack) but that's computers for you.

Now, you can probably see a potential issue here. "What," you might say, "happens if I make a massively nested and/or recursive set of sub calls?" Well, bucko, you blow your stack and crash. Or scribble over heap memory, which is arguably worse. Not fun, as you might expect, though in these days of 32 and 64 bit address spaces it's not as common an occurrence in non-threaded programs as it was back in the 8-bit days, when you had maybe 48K of RAM, total. (And, of course, we had to walk barefoot five miles up hill in the snow to pop the tape with our program into the 300 baud cassette loader. Ahem. Sorry, old fogey moment) Still, it can happen, and in circumstances where your stack is much smaller, such as when running with threads (you often get only 20-30K of stack space per thread) it can be darned common. It has the twin advantages of hardware support and conceptual simplicity, though.

The stack method of calling is a bit of a pain for some uses--when you had to use the stack to pass parameters you end up twidding what entries are where so you don't prematurely pop off the return address, or cover it with the return data.

You may be thinking "But what about lexicals? How does this interact with lexicals?" The answer is.... it doesn't. This is, remember, an old method of doing things, and it predates lexicals. (Yes, I know, it doesn't really, but it predates lexicals in common use. And no, Lisp at MIT doesn't count as common anything) Generally if you want to save your current lexical state, you push a pointer to your lexical pad onto the stack before you make a function call, and restore it after the call is done. If you're working with a language that doesn't do lexicals, as most don't, it's not a problem since they're just not around to worry about. (The problem with lexicals is that they can't be allocated on the stack if there are closures or potential closures being taken)

Continuation Passing Style, or CPS for short, is completely different. With CPS, no address is pushed onto the stack before a function call is made. Strictly speaking, the whole jsr/ret scheme isn't used at all. What happens with a simple function call like:

foo(1, 2, 3);

is a little different than what happens in the stack style. First, a continuation is taken, one that resumes just after the function call. (Remember that, in addition to the lexicals and call stack, continuations have to hold the place you are going to go if they are invoked) Then the parameters are put somewhere, often on the stack. One of the parameters, generally unspecified, is the continuation that was just taken. (This is the continuation passing part) Then we just jump to the start of the function.

When the function executes, first it takes the parameters and puts them wherever it needs to. Then it stores the continuation someplace. The function does its thing, whatever that is. When it's finished, it generally puts its return parameters someplace, and invokes the continuation that was passed in. Since that continuation, conveniently, puts us in the calling function at the spot right after the function call was made, with its environment intact--just like in the stack style.

The cool thing about using a CPS for function calls is that it makes taking a continuation much, much easier. When taking a continuation, you only have to take a snapshot of the current function/sub's lexicals and control stack--you don't have to care at all about the function's caller, or the whole stack or anything. Why not? Because when the function you were in was called, its caller conveniently took a continuation and passed it into you. You don't have to look outside your own routine when taking the continuation because the caller already noted that stuff for you. That makes taking a continuation much faster and simpler. It can even be optimized some--you're taking note of just the info in your own sub/function/method, so the compiler knows exactly what is and isn't important, so it can ignore anything that you're not using.

CPS also makes what're called "tail calls" (and tail recursion) much easier and faster. I'm going to talk about them in a later post, but consider this:

sub foo {
# Do some stuff
return bar();

That is, we have a sub that does stuff and, as its last two actions, calls a function and then exits with its return value. And also remember that foo got its caller's continuation passed in. And that we pass in the continuation of where the called function should go when it finishes. Which all adds up to, in this case, foo not bothering to take a continuation at all, just passing in the continuation it got into bar. Which is a nice optimization. (And in cases of recursion, avoids allocating a continuation structure per call, which if you're looking for the millionth prime number can be a not inconsiderable savings)

Unfortunately, CPS does have a down side. Two, actually. (Ignoring the whole "Aaaah! Continuations! Make the pain stop!" issue, since they're really not that bad)

The first thing that springs to mind is return values from functions. Since invoking a continuation is supposed to put everything back the way it was--stack and lexicals--you'd wonder where the called function is supposed to put them. Can't go on the stack, as in the stack system, since we put the stack back. Generally in a CPS system there's a single return value, and it goes into a register. (Registers, generally, aren't restored) This makes returning multiple values a bit tricky, though most languages that do CPS don't do multiple return values. But, then, neither do most languages that do a stack system. CPS is a bit of a hindrance for languages that do have multiple return values, since then you need to either have a special area that isn't restored on continuation invocation, or build a little stack of return values and pass back a pointer to it as your return value. (If you're an old hardware wonk you might wonder what you'd do in the case of a system, such as the 6502, where the registers are both too small to hold the largest data element the processor can handle and are too small to hold a pointer (all the registers are 8 bits, a pointer is 16). The answer is "You can't use CPS at all easily there. But you can play M.U.L.E., so it's a net win")

The second problem with CPS is that it involves continuations. Not in a terror-inducing way, but rather in a speed hit way. The one downside to supporting continuations is that it means you can't use the stack to hold both control information and data. So things like C's auto variables can't be put on the stack, which slows things down as you need to allocate a frame for them somewhere out of heap memory. While not slow, it's definitely much slower than just incrementing/decrementing (depending on which way the stack goes) the stack pointer by a few dozen words. If you're shooting for speed, well, you just missed a bit. If that's not a problem, or the semantics of the language you're using requires the heavier-weight variable allocation system (or, like Lisp, has so few variables in use that it doesn't matter much how long it takes to allocate them) it can be a useful win.

Still, in a register rich system CPS is definitely pretty darned cool I'll admit, after this, I'm almost tempted to shift Parrot over to a full CPS system. Almost....

Posted by Dan at May 7, 2003 09:30 AM | TrackBack (3)

Nice explanation. When I was learning about CPS, though, we always talked about the code _explicitly_ passing and invoking continuations, which can then be plain functions. It seemed pretty easy to understand that way, especially with examples. So, a trivial example:

The first sub is vanilla, the second is its CPS equivalent.

sub fac {
my ($n) = @_;
if ($n == 0) { return 1 }
else { return $n * fac($n - 1) }

sub fac_k {
my ($n, $k) = @_;

if ($n == 0) { return $k->(1) }
else {
return fac_k($n - 1, sub { my $v = shift; return $k->($n * $v) })

Hope this is of some use.

Posted by: John at May 7, 2003 05:21 PM

Thanks for a very nice explanation. I am curious why you don't have Parrot works as a full CPS system.

Since Parrot will have to support continuations anyway and much of the optimizations that CPS disallows cannot be done, it seems like a fine idea. Of course I could be all googly eyed over a new things that I just learned about...

But anyway, more of you logic with respect to not having Parrot be a full CPS system would interest me.

Posted by: Boots at May 7, 2003 06:04 PM

Thank you for these explanations of all kinds of things.
I thought "lexicals" were the auto (in C terms) allocated variables, but I'm beginning to get the feeling I'm totally wrong about this. Am I?
Maybe another idea for What the heck is:?

Posted by: Klaas-Jan at May 8, 2003 08:34 AM

Lexicals aren't quite the same as auto variables, if for no other reason than lexicals generally persist as long as there's an outstanding reference to them, while auto variables don't. (Hence all the fun in C when returning the address of an auto variable from a function :)

That is sort of an implementation detail, I suppose--I'm not sure there's anything about lexicals that, strictly speaking, requires this sort of behaviour. I'll add it to the "What the heck is:" list, though.

Posted by: Dan at May 8, 2003 10:19 AM