May 01, 2003

What the heck is: a coroutine

Well, OK, grammatically it ought to be "what the heck are: coroutines" but we'll make do for the sake of easy searching. (I probably ought to see about putting categories in, but I don't know that it's that important)

Anyway, what is a coroutine? Well, a coroutine is a subroutine or function that can pause in the middle and return a value. When you call it again it picks up right where it was when you paused, environment intact, and keeps on going. Coroutines are almost always closures, though strictly speaking they don't have to be--if you had a language that didn't have closures, or lexical variables, you could still have coroutines. (Though generally speaking nobody does that. You could, though, if you were so inclined) Often coroutines are called iterators or generators, but that's not strictly correct--while you can make iterators and generators with coroutines, you don't have to. Nor are coroutines limited to being iterators or generators, as some systems use coroutines to implement threading, and coroutines can make really nifty callbacks.

From the outside you can't tell if a function is a coroutine any more than you can tell that a function is a closure. Your code calls it and gets back a value (or values) and that's all that's important. (Yay, encapsulation)

Like normal subroutines or closures, the first time you call a coroutine it starts at the beginning. If a coroutine hits a return, or falls off the end of the code (assuming you're using a language that supports just falling off the end of a function) the next time you call into the coroutine it also starts back at the beginning. In this regard it's no different from any other subroutine or function. It's when you call yield (or your language's equivalent), though, that things get interesting.

yield, like return, returns both a value and control to the code that called the coroutine. What it does, though, is remember where in the coroutine's code it was, so that the next time the coroutine is called it picks back up right where it was the last time you were in the coroutine.

For example, if you have this code:

  coroutine foo {
    yield 1;
    yield 2;
    yield 3;
  print foo(), "\n";
  print foo(), "\n";
  print foo(), "\n";
  print foo(), "\n";
  print foo(), "\n";
  print foo(), "\n";



Now, you may ask "What about all those pesky parameters? What happens with them?" Well, the answer is "It depends".

Coroutines can certainly take parameters. While they wouldn't be completely useless without them, they certainly would be less useful than they might otherwise be. How languages handle parameters and coroutines can differ, though. The issue isn't passing parameters in the first time you call a coroutine, as that's easy, but rather what happens when you call a suspended coroutine with potentially different parameters than you called it the first time. Languages generally have three different things they do in this case. Let's alter our coroutine a bit:

coroutine foo (int a) {
yield a;
yield a + 1;
yield a + 2;
print foo(1), "\n";
print foo(2), "\n";
print foo(1), "\n";
print foo(1), "\n";

The first way is to just ignore passed-in parameters on the second and subsequent invocations. Yeah, it's nice that you called the coroutine with 1 the first time and 2 the second time, but the coroutine was happy with 1 and it'll stick with that. If you happen to call it when the coroutine's reset then the parameters will be respected. So in our example above, it would print "1 2 3 1".

The second way is to remember what the coroutine was invoked with and create a new instance of the coroutine if you pass in different parameters. Whether the language retains multiple versions of a coroutine depends on the language--some do, some don't. Our example would print "1 2 2 3" if the language does retain multiple versions of a coroutine, or "1 2 1 2" if it doesn't. (The change from 1 to 2 as a parameter would discard the old coroutine, as would the change from 2 back to 1, if the language discards on parameter changes)

The third way is to just pass in the parameter and alter the variable that it's tied to on each coroutine entry. (This only works if you're actually using the passed in parameters directly--in the case of perl, where everyone copies them out of @_, you'd probably not see the changes because you copy out of @_ at the beginning of a sub) So our example would print "1 3 3 1".

Which way is right? Well, they all are, really. Each way has its advantages and disadvantages, both from a language comprehension standpoint and an implementation standpoint. Whoever's designing the language decides which way they like best, or which way is doable with their current implementation, or which way grinds the right ideological axe, and goes with that.

One thing to remember is that you can yield out of a coroutine no matter how deeply you're into it. Our examples are really simple, but you could be nested ten or twenty levels of scope deep in a coroutine and still yield out--when you re-invoke the coroutine you'll be dropped back where you were, ten or twenty levels deep, with all your lexicals put back in place. (If you're using a language without lexicals it makes things easier)

Co-routines are definitely very powerful and often are used as iterators, tied to some data structure where re-invoking the coroutine walks through the structure a step, or generators, where each invocation of the coroutine returns the next value in a series. It can be much easier to do both when you can retain both variable and position in code state, depending on the type of iterator or generator. Closures are often sufficient for simple iterators and generators, but sometimes just retaining data isn't sufficient, or if it is sufficient it's quite inconvenient to make sure you retain enough data to pick up where you left off.

The big downside to coroutines is in their implementation. To make coroutines work properly requires relatively complex calling conventions and state keeping, though not really any worse than what you might need if an engine supports continuations, so if you have those it's not much more work. It really wants chained call frames rather than a call stack and doesn't want parameters passed on the stack, amongst other things.

Implementation, though, is a mildly complex thing, and discussion is definitely a task for another day.

Posted by Dan at May 1, 2003 04:47 PM | TrackBack (3)

Just some nitpicking from a perlish perspective, if you have
coroutine foo { yield 1; }
shouldn't the return sequence be ( 1, undef, 1, undef, .. )? Or is it assumed that the compiler treats the last yield with no following code as a return?

Posted by: Olli at May 2, 2003 06:56 AM

Good point, though my bet is that Larry will declare that if the last executable statement in a block is a yield it's treated the same as a return.

Posted by: Dan at May 2, 2003 08:50 AM

Another excellent addition to the "What the heck is..." series. Thanks! Encore! What's next?

Posted by: Gnomon at May 2, 2003 09:03 AM

Regarding coroutine parameters, it seems like the issue of how to treat changes could be easily resolved if, given
coroutine foo (arg1, arg2) { body ... }
a call like
$cr = foo(val1, val2);
didn't run the body code, but instead returned a coroutine object - a type of closure, closed over the parameters bound to the supplied values. The coroutine could then be called like any anonymous sub.

This would also allow the caller the choice of whether to create a new copy of the coroutine or not, rather than forcing the choice into the language. This would make it easy for multiple instances of the same coroutine (using different data) to work with each other.

Posted by: John at May 2, 2003 04:50 PM

The problem with returning coroutine objects is that it forces the calling code to know that it is calling a coroutine, which limits the utility of coroutines in some cases. Making it completely hidden from the caller makes coroutines useful in more circumstances.

Posted by: Dan at May 2, 2003 05:43 PM

You might look at the support for closures and iterators in the next C# compiler.

They were implemented with no changes to the CLR - they're a compiler feature.

Posted by: Don Box at May 3, 2003 09:58 PM

If you had a coroutine object based thing it wouldn't be that hard to do:

  my &coro;
  my coroutine foo {...}
  sub coro_wrapper {
    &coro //= foo(*@_);

Of course, in Perl 6 you could wrap all that up in a macro, but it seems like a lot of fuss.

Posted by: Piers Cawley at May 5, 2003 06:21 AM

1. There is another way of dealing with passing the arguments when a coroutine is resumed - have the arguments be the value retuned from the yield operator. Using

@_ = yield @these_results;

in the body of the coroutine would reassign the new argument to @_ (or allow the code to be written to do other things with new args, like ignore them).

2. While you state that coroutines can be used for more than just iterators, the yield operator is limited to essentially that capability. It "returns" from "the current coroutine" to "the invoking (co)routine)" while retaining the ability for "the current coroutine" to continue from where it is by being "reinvoked". This still presumes a sub(ordinate)routine relationship instead of a co(operating)routine relationship. It is useful to be able to choose which coroutine is being resumed at the same time that the current coroutine is being suspended. Something like:

@newargs = yieldto $coro, @values;

would let you choose which coroutine is being resumed. In fact, resume would be a better name than yieldto for this specific purpose; although the frequency of iterators with the implicit "whoever resumed me" choice justifies having a yield operator as well, in which case yieldto is tolerable. That requires some sort of coroutine object, which you try to avoid.

3. Having the called code be either a subroutine or a coroutine, transparently to the calling code, but always the same for any caller, is not sufficient. You refer to the problem when you talk about yielding from a deeply nested context. If a routine always acts as a coroutine, then it cannot have a nested (recursive) context - the "recursive" call is creating a new coroutine (or getting confused by trying to resume the existing coroutine which is already running as the caller). With an operator to "create a coroutine object that will call subroutine X", and a resume operator, you get to choose whether an invokation is a recursive subroutine call or a new coroutine creation and resumption. If a single function can be called both as a coroutine and as a subroutine, it is hard to imagine how the choice can be made without the participation of the caller.

Posted by: John Macdonald at November 14, 2003 05:08 PM

Can you point me to more info on Co-routines.


Posted by: Doug Little at February 4, 2004 09:11 PM

From a Python perspective, coroutines are generators.

def foo(a):
  yield a
  yield a + 1
  yield a + 2

for x in foo(1):
print x

# Prints 2, 3, 4

The nice thing about the Python implementation is that you do not call the coroutine multiple times in order to get multiple return values (instead, you extract values from the iterator).

To pass new parameters into the coroutine, use a mutable object as an argument to the function.

Posted by: Connelly Barnes at August 2, 2004 05:07 PM