April 08, 2004

Taking out the trash

So, I went to the talk on unified garbage collection theory Monday. I'd like to say it was breathtakingly innovative, filled with impressive insights, but honestly it was pretty mundane. There wasn't anything new, though it wasn't a waste of time.

The executive summary, for those who're interested, is that Refcounting and Tracing, as means of garbage collection, are really the same thing. Or, rather, opposites of one another--refcounting finds the dead stuff and tracing finds the live stuff. Which is better for you depends on the application and your requirements. Slides for this talk are on the web.

Even though the talk wasn't particularly revolutionary in itself, nor did it really break any new ground, the measure of success of any talk is what you got out of it--whether what you got out of a talk was actually in the talk is secondary. :) By that standard, I was happy to have gone. It was also nice to get a chance to wander around the (only semi-finished) interior of the new CSAIL complex. (Which I shall now, and forever more, think of as the Theodore Geisel Memorial Complex, even though that isn't its name. Jokes about the building are pretty old by now, I expect)

Anyway, here's a barely edited set of notes from the talk. Things in italics are actual talk notes, the rest are things that occurred to me while I was taking the notes. And yeah, some of the things I noted down I did know beforehand, but I tend to have a spotty memory for details so I figured it was worth noting down to keep 'em in the front of my brain.

  • I want tracing without stop the world, dammit. Pity I can't have it. :(
  • Can we hijack the MMU to map pages as read-only to halt mutating threads? Not particularly portable, but...
  • Wedge in mutating macro or function or something.
  • Real time goes ref count, but I wonder if that's truly viable. Unbounded (well, count of object) collection time... (Note that this isn't true if you use a deferred collection list, as you only take out as many objects as you have time for)
  • Oooh, a Lisp jab. Keen. :)
  • Refcount commonly tosses count mutation for putting things on the system stack
  • No-mutation costs on tracing tends to go away as you throw in write barriers and such
  • Copying collectors with object structures would be really nice.
  • Trace collector finds live things, refcount collector finds dead things
  • ref and trace are flipsides. Trace goes from live, refcount goes from dead
  • RC starts with ref too high and goes lower, trace starts too low and goes higher
  • Sweep tends to take a lot of time in tracing collectors. Depressingly true. (Fewer things in general help)
  • Wonder if its worth having a used list as well as a free list, so sweep only sweeps the used list. When a PMC is allocated its put on the used list, though the used list explicitly isn't part of the root set. Probably only worthwhile if we hit a highwater mark at some point.
  • Nice transition from trace to GC on slide. (Check to see if the powerpoint slide set has this) Non-incremental ref count, though.
  • Interesting note in multiprocessor systems--can't count on atomic and universally visible refcount coherence, so scheduling inc and dec operations are a good thing. Throw a thread at the scheduling buffer.
  • Low pause time stuff -- refcount with occasional trace to find cyclic garbage
  • Add in mutex and condwait on PMC allocation? Or for mutator thingie?
  • Generational heap with lists of generation objects?
  • Dammit, this'd be easier if we could move PMC pointers
  • We can do this with sufficient application of memory. Blows the hell out of cache, though
  • Refcount is good for low mutation rates, tracing for high mutation rates
  • Ulterior ref counting; Trace the nursery, refcount the mature space
  • Distributed refcount - trace on node, refcount across nodes
  • Makes a concerted effort to evaluate GC schemes based on time and space use
  • Would it be reasonable to macro the hell out of all the fundamental GC operations and then wedge in things to experiment with schemes of collection? Probably. Worth getting in right now before it's too late.
  • Investigate ways to stop the world, with a reasonably cheap fallback scheme if we can manage it.
  • New playroom in the building sucks. Too much through traffic.
  • Some formulas on space and time usage. In the powerpoint. (Want to instrument Parrot to allow for expermientation and cost estimates)
  • All realistic collectors are a trace/refcount hybrid
  • Cost of refcount isn't just cost to mutate the referenced refcount, but also the cascade through the dead objects
  • Worth investigating a more sophisticated DOD model. Getting the infrastructure in would be well worth it.
  • Count in the size of the metadata for real benchmarks. (Need to do this with the quota mechanism)
  • If you've got pluggable GC, then linking a collector type to an app, probably with some sort of dynamic feedback profiling info, would be really useful. The problem there is that it guarantees a minimum cost to pointer mutation, since at least some of the schemes require pointer mutation tracking. I suppose you could have multiple cores that get dynamically loaded in.
  • Split the heap into "has pointer" and "doesn't have pointer" memory. Could leverage the MMU for it with write barrier stuff.
  • Check costs of statistics gathering.
  • Check to see if circuit theory algorithms tie in any. (Comment from a guy in the audience--something on "thevinin and norton equivalence", nodes and branches)

I'm not sure what the net result of all this will be, other than I really want to get at least macros wedged into parrot to intercept anything GC/DOD related. While full runtime plugability has some required overhead that we may not want to pay for the flexibility, we can do it in source and just macro-out the stuff we don't need for a particular scheme. Instrumenting Parrot to get more stats would be a useful thing, to be sure.

Posted by Dan at April 8, 2004 03:14 PM | TrackBack (0)
Comments

There's an article on OSNews, "A Glance At Garbage Collection In Object-Oriented Languages" by Zachary Pinter, for those that are, like me, a little lost about all that stuff.

Posted by: NoName at April 27, 2004 09:57 PM