June 26, 2003

What the heck is: A tail call

Tail calls, and their more restricted brethren tail recursion, are one of those clever optimizer tricks that were invented many years ago, then promptly forgotten by most of the mainstream language implementors. (Naughty them!)

Consider, for a moment, this simple function:

sub foo (int a) {
  a += 15;
  return bar(a);

Not a big deal at all--takes a parameter, adds 15 to it, then returns the result of calling bar on the new value of a. The important thing to notice here is that the very last statement calls a function then immediately returns the result of that function.

Now, think of how code's generally generated. When you enter a function, some stack space is allocated. When you exit from that function, the stack space is deallocated and a value is generally returned. So the sequence of events looks like:

(do stuff)
call function

There are several interesting things here. If you look at it, the stack space that foo has isn't used after the bar function is called. While this isn't a huge amount of space, it is space, and space that isn't needed. If it was freed before bar was called, your program would be a little more space-efficient. For one function this isn't a big deal, but imagine what happens if you're 200 levels deep in a call (perhaps because there's some indirect recursion) and 100 of those levels look like return somefunc();. That's 100 chunks of stack that are being used for no reason. If your stack only had space for 199 chunks, not using those 100 chunks of stack would mean the difference between your program running and your program crashing.

There's also that pesky "return what I just got" action. In our example all foo does at the end is collect the return value from bar and then immediately return it. The act of collecting and returning the value is essentially wasted time. If somehow foo could make itself go away so bar returns its value to whoever called foo, we'd save the time it took to do that extra value collection and return. Is this a whole lot of time? No, not really, unless there's something really bizarre about the language you're using. But even if it's only a few microseconds, over thousands of calls it can add up.

So, what can you, or rather your language compiler, do? Well, what it can do is turn code of the form return somefunc(); into the low-level sequence pop stack frame; goto somefunc(); In our example, that means before we call bar, foo cleans itself up and then, rather than calling bar as a subroutine, we do a low-level goto operation to the start of bar. Foo's already cleaned itself out of the stack, so when bar starts it looks like whoever called foo has really called bar, and when bar returns its value it returns it directly to whoever called foo, rather than returning it to foo which then returns it to its caller.

This, simply, is a tail call. Tail because it has to happen as the very last operation in a function, or as the function's tail operation. Call because that last operation is calling a function. Making tail calls can sometimes (though not by any means always) require support from your platform's calling conventions, linker, and runtime library. It also needs a non-trivial amount of effort in the compilers that support it.

There's also a more restricted version, called tail recursion, which only happens if a function, as its last operation, returns the result of calling itself. Tail recursion is easier to deal with because rather than having to jump to the beginning of some random function somewhere, you just do a goto back to the beginning of yourself, which is a darned simple thing to do. That means code that looks like:

sub foo (int a, int b) {
  if (b == 1) {
    return a;
  } else {
    return foo(a*a + a, b - 1);

gets quietly turned into:

sub foo (int a, int b) {
    if (b == 1) {
      return a;
    } else {
      a = a*a + a;
      b = b - 1;
      goto label;

What was a recursive function, chewing up lots of stack gets turned into a loop that chews up just one stack frame. And the compiler can do it as a simple under-the-hood transformation that needs no special knowledge of any other function's internals, or special function entry points or anything. This is a much easier piece of source transformation than full-blown tail calls for many languages, so it's more likely (though still not common, alas) to be implemented in a compiler.

Now, if you were taught to program using one of the more common procedural or OO languages, you were probably warned to avoid recursion because it was very easy to blow your stack out and die, which it is. If your compiler supports at least tail recursion (if not full tail calls) though, it means some of the recursive algorithms that used to be unsafe are now perfectly safe, since you're not allocating a chunk of stack for each level of recursion, which is a Good Thing. This can make some problem solutions easier, since you don't have to do the goto-style transformation yourself. (Especially since everyone these days seems to have a pavlovian-conditioned aversion to gotos--a shame since they're quite useful in a limited range of places--so you'll almost inevitably do bizarre contortions with infinite loops or something equally awkward)

How big a deal are tail calls and tail recursion? In a language like Lisp where just about everything requires a function call, and it's really easy to have a dozen or more functions in progress just in your code alone (not to mention the library functions your code calls, and the library functions they call) it makes a huge difference. Doing tail call optimizations are also really easy for Lisp, just because of the way the language works and is structured. Once you know about tail calls, it just falls out of the compiler, which is cool.

In procedural and OO languages the benefits are a bit smaller, since there are generally fewer function calls and more code executed between function calls, and many of the returns don't return just the result of a function call, but even then it does make a difference. Stack frames for procedural languages are usually larger than Lisp frames, since you have more stack-based variables--a C or Java stack frame may well be 10 or 20 times larger than a Lisp stack frame (since C and Java have stack-allocated variables, something Lisp generally doesn't have), so even if you can do a tail call optimization 5% or 10% of the time you'd save as much as you would for an entire Lisp run, give or take a lot of statistical handwaving. They probably won't happen that often (I'd bet the number's more like 1% or .1%) but still, a win's a win.

Is there a downside to tail call optimization? Well... no, not really. The only place it causes a problem is if you have a language that supports introspection sufficiently to allow you to inspect the call stack, as you'd not see any of the functions that tail-called out of themselves, but this isn't generally a big deal--the space and runtime win are considered worth the hit to introspection capabilities. They also put a bit more complexity into a compiler, which is likely already a pretty damn complex piece of code, so there's a burden on the language implementor, but if you think about it at the beginning it's not too big a deal to implement. (And if you have a continuation passing style of calling functions, it turns out to be essentially free, which is really cool, though the topic of another WTHI entry)

So, tail calls. Your friend if you're a programmer, and a darned useful thing to implement as an optimization if you're a compiler writer.

Posted by Dan at June 26, 2003 01:48 PM | TrackBack (5)

The only place it causes a problem is if you have a language that supports introspection sufficiently to allow you to inspect the call stack

One other place it can cause a problem is when you are using a debugger to try to debug optimized code. Unless your compiler is very careful, it can lead to somewhat confusing stack backtraces.

Posted by: Doug L. at June 28, 2003 03:00 AM

Dan, I know that you're a regular Lambda: the Ultimate reader, so undoubtedly you've noticed this already, but others may benefit: http://lambda.weblogs.com/discuss/msgReader$7485 links to a paper entitled "Compilation of Functional Programming Languages using GCC -- Tail Calls" about the work done to support tail-call and tail-recursion optimization in GCC (since it's used as the back-end for the Glasgow Haskell Compiler and many other "(functional-language-foo)-to-C" language converters).

Caution: the paper is a 376 kilobyte gzipped PostScript file, and the relevant explanatory bits about tail calls and tail recursion begin between pages 7 and 9.

Posted by: Gnomon at July 1, 2003 04:52 AM

Just like everyone, I treasure the smooth explanations in the "What the heck" series.
Many thanks! But because I am in habit of re-reading your articles
for greater insight, it should not be a secret if by now I have also become
more picky:

I suspect that the benefits the micro seconds speed-ups due to tail calls (or
for that matter, of any other speed-ups) have not been clearly
put into prospective. At least, not explicitly.

While the micro speed-ups might otherwise be just a "win", if they occur inside
the bottleneck points, as per profiler, these wins are now more important;
and with the tendency of tail-recursions occurring at important parts of the programs
(for non-i/o bound programs ), these tiny wins could very easily turn out to be more
significant than was previously implied.

Posted by: Ioannis Tambouras at August 8, 2003 03:03 AM