March 30, 2004

Adventures in Garbage Collecting

One of the nice things about being within driving distance of people who're phenomenally smarter than you are is that you sometimes get the benefits of their research. (Though Citeseer's darned nice too)

For the interested, there's a seminar on garbage collection at MIT on Monday April 5th 2004. The announcement follows:

"A (Grand?) Unified Theory of Storage Reclamation"
Speaker: David F. Bacon
Host: Professor Martin Rinard
Host Affiliation: Computer Architecture Group

Date: 4-5-2004
Time: 3:30 PM - 5:00 PM
Refreshments: 3:15 PM
Location: Gates 7th Floor Lounge

The two basic forms of automatic storage reclamation, tracing and reference counting, were invented at the dawn of the high-level language era over 40 years ago. Since then there have been many improvements and = optimizations, but all systems are based on one or the other of these methods, which are uniformly viewed as being fundamentally different and possessing very distinct performance properties. We have implemented high-performance collectors of both types, and in the process observed that the more we optimized them, the more similarly they behaved -- that - they seem to share some deep structure.

We present a formulation of the two algorithms that shows that they are in fact duals of each other. Intuitively, the difference is that tracing operates on live objects, or "matter", while reference counting operates on dead objects, or "anti-matter". For every operation by the tracing collector, there is a precisely corresponding anti-operation by the reference counting collector.

Viewed in this light, we show that all high-performance collectors (for example, deferred reference counting and generational collection) are in fact hybrids of tracing and reference counting. We are developing a uniform cost-model for the collectors to quantify the tradeoffs that - result from choosing different hybridizations of tracing and reference counting. This will allow the correct scheme to be selected based on system performance requirements and the expected properties of the target application.

Dunno if I'll make it, but I'm sure going to try...

Posted by Dan at March 30, 2004 03:10 PM | TrackBack (6)
Comments

It would be nice if GC could be gotten right. For example in Python, if you have objects in a cycle and they have explicitly defined destructors, then the objects are not garbage collected. (There are all sorts of theoretical claims as to why this has to be). Even Java makes no guarantee that finalizers will be called.

This means that all the languages can claim to be garbage collected automatically, except when it really matters (file handles, sockets etc) which have to be manually closed/destroyed etc.

Posted by: Anonymous Coward at March 30, 2004 08:51 PM

Sweet. Will there be any material online? This is definitely something I'd like to hear more about.

Posted by: Aristotle Pagaltzis at March 31, 2004 08:24 AM

Slides are apparently at http://www.research.ibm.com/people/d/dfb/talks/Bacon04Grand.ppt if you want 'em. Having looked, though, I'm not sure they're useful by themselves.

Posted by: Dan at March 31, 2004 11:47 AM

Well, Parrot'll guarantee that we'll call all the destructors. We won't guarantee that we'll call them in the right order, at least for circular structures, nor when exactly we'll call them, but... one takes what one can get.

One thing that does help is to split the finalizer calling from the low-level cleanup. Often times there'll only be one object in a circular structure with a destructor (be honest, how many times have you really written one?) so calling all the finalizers and then going and tossing all the dead objects works reasonably well and makes more cases work right.

Posted by: Dan at March 31, 2004 11:51 AM

well, at least you got GC for that litle pesky thing, memory, even if you do not have sockets autocollected :)

BTW, I believe that it's up to the language to provide sensible abstraction to avoid this kind of problems.

One example is the RAII idiom in C++, where the object is inialized when one scope in entered and it's destructor is called when the scope end, not that different from refcount.

Another interesting approach is the ruby one, passing the code to be executed to a routine that takes care of creating and reclaiming the resource.

Posted by: gab at April 3, 2004 04:44 AM

gab -- RAII is nothing like a refcount.

RAII is a fully static technique of using scopes, and if the programmer fails to obey scopes (for example, stores a pointer to an RAII'ed stack object) it fails mmiserably.

Ref counting, OTOH, is dynamic in the sense that you can't determine statically (at compile time, without knowing what inputs your program will have) when the object is destructed.

Posted by: Ziv Caspi at April 6, 2004 05:42 PM