March 31, 2003

Continuations and VMs

Part 3 (well, 3a, since I haven't gotten to the VM bits) in a series of "Why the JVM and .NET have issues with some classes of languages" posts. This one's about continuations.

Now, before I go any further, do note that what this bit talks about does not affect python, perl 5, or a host of other languages. They don't have continuations, which is fine. It does affect Perl 6, Ruby, Scheme, Lisp, and a few others, though I don't know that anyone's going to be porting unlambda to .NET. But, then, I didn't expect a befunge or BrainF*** interpreter for Parrot, either. Nor Threaded INTERCAL, for that matter.

Anyway, a continuation is essentially a closure that, in addition to closing over the lexical environment, also closes over the control chain. (Well, OK, control stack, but if you're doing continuations odds are it's a singly linked list rather than a real stack) CS texts generally go on about it continuations being "the rest of the program" or "gotos with arguments" or suchlike stuff. If those float your boat, great--they never made sense to me.

I doubt that a single phrase is suitable description--if it was far more people would understand the things--so it's time for some text.

Assume, for a moment, that we have a language, like Perl (or C, for that matter) that has lexical variables. Each block has some variables attached to it, and (unlike C in practice, though not in theory) those variables are each stored in a separate data structure somewhere. A scratchpad, if you like. Those pads are linked together, so an inner block's pad has a link to the immediate containing block's pad, and so on. In this example:

{
  my $foo;
  {
    my $bar;
    {
      my $baz;
      return sub { eval $_[0] };
    }
  }
}

three blocks, three pads, and each block's pad is linked to the containing block's pad. So the pad with $baz in it has a link to the pad with $bar in it, and the pad with $bar in it links to the pad with $foo in it. Got that? Good. If we made a closure inside the inner block (like the return does--returning an evil closure but that's neither here nor there) then that returned closure object has a link to the innermost pad, and through the link chains all the pads out to the top level of pads. When we call into the closure, it has access to all the lexicals that were in scope when the closure was created.

Got that? Closures are subroutines that capture their lexical scopes. When you call them they (temporarily) restore their lexical scopes.

Now, think about how control flow works in a program. Whenever something "controlish" happens--you make a function call, an exception handler is established, a new lexical scope is put in place--a marker for that event is put on the control chain. When you leave the scope of the item on the chain you remove it. (And removing it might have some action associated with it--when the runtime removes the element for a function call it fetches the return address out of it)

So, we have this control chain. It's a singly linked list (conceptually) of control elements. We can capture it any time we want, if we've got a mechanism to make sure what we've captured doesn't get altered. (Making the chain elements copy-on-write, or just cloning them all off, both work) We also have a closure--a sub that has a chain of lexical scratchpads. And if you take that one step further, we can divorce the lexical pads for the closure from the closure itself, leaving us a chain of control and a chain of lexical variable scopes.

Now...

Imagine what would happen if we bound together that chain of lexical variables, the chain of control, and an address into one thingie, such that when we invoked that thingie we'd jump to the saved address, put in place the associated control chain (discarding the current chain), and put in place the lexicals?

Well... we'd call the thingie a continuation. And that's what it is.

Now, in practice continuations aren't made of a random lexical chain, control chain, and address. When you make a continuation you capture the current control chain and lexical chain, and generally the current or next instruction in the instruction stream, and they're often bound together with special function calls ("call-with-current-continuation") but they don't have to be.

One of the cool thing with continuations is that, since they're supersets of closures, the variables they capture keep their values from invocation to invocation of a continuation, just like variable values persist across multiple invocations of a closure. And like multiple closures that close over the same scope, multiple continuations that share scope will see variable changes that other continuations that capture the same scope make.

And, just to top it off, someone (no, I don't know who, so this might be CS Legend) once proved that you can build any control structure you can think of with continuations. Though often you shouldn't, but that's a separate thing. Exception handlers are conceptually a form of continuation, for example. When the handler is established a continuation is taken, and when an exception is thrown you just invoke the exception handler continuationa nd *poof!* you're at the spot in the code that represents the exception handler, complete with its lexical scope.

There's a lot of other evil stuff you can do, too--there's no reason, strictly speaking, that the destination you jump to when invoking a continuation has to be anywhere near where you were in the program when the lexical and control chains were captured (and they don't have to be related either). Granted, doing Weird Evil Magic like this is a lot of work, but hey, if it was easy it wouldn't be fun. :)

Posted by Dan at March 31, 2003 01:55 PM | TrackBack (9)
Comments

I still don't have a handle on that :)

Is there any more concrete code/usage example that you can give us?

Posted by: Dougal Campbell at April 3, 2003 05:11 PM

Right... as I understand that, a continuation is basically introducing dynamic scoping to part of a program, then when that part is over we revert back to lexical (static) scoping?

And normal closures could be implemented by making a deep copy (for how ever many frames you need to get all the used variables in the closure) of the "variable stack" and placing it in the heap. Then whenever the closure is executed, "reattaching" the vars in the heap to the "end" of the stack frame??

Posted by: Mark at April 3, 2003 05:33 PM

Hrm. I'll see what I can come up with, and post an example.

As for the second comment, lexicals are a bit more complex than that. Continuations don't introduce dynamic scoping as such, they just make it much easier to have multiple versions of a particular static scope around at once. Deep stack copies aren't really good enough to make normal closures, unfortunately, since closures can be invoked multiple times and may create closures that close themselves. It gets... complex. But static copies are inadequate for the task.

Posted by: Dan at April 3, 2003 06:34 PM

Ahh, cheers. Thinking about it now I don't know where the idea of dynamic scope came from. I didn't think of closures inside closures ... tricky problem indeed. Time to re-read SICP again for the nice intro into Scheme and the "environment" trees (along with all the other stuff that I've forgotten, but part relearned in this semesters courses :) ) I'd recommend it to anyone, it's certainly worth percivering.

Posted by: Mark at April 3, 2003 08:05 PM

A couple things I remember from my grad school days...Yes, its true that any control construct can be implemented with continuations, but I doubt I could reproduce a proof.

More interestingly, if continuation are true first-class objects (i.e., you can assign them to variables), then it's no longer true that control can be represented by a linear chain; you need a tree. Consider: You process along, save continuation A, continue awhile, save continuation B, and then invoke continuation A, returning to that state. As the program continues, it is creating a new control chain beyond A, but it doesn't (and can't) _replace_ the original chain (from A to B) since B is still around waiting to be (maybe) invoked. If after a bit you save continuation C and then invoke B, you'll have to keep the tree around until the variables those continuations have been saved in go out of scope.

Posted by: John Nienart at April 4, 2003 01:47 PM

Just today I tried to explain continuations to two fellow Perl programmers, and I came up with the notion that it's just:

$continuation = \&return;

They grokked it instantly.

Posted by: Autrijus Tang at June 17, 2004 02:39 PM