March 05, 2004

More destruction!

So, I wasn't too clear in the last post about the problems with finalization. So here's some more detail.

If you remember, there are three ways to handle finalization. You can call every finalizer method in a dying object's hierarchy, you can call the finalizer method like any other method and count on code to properly redispatch, or you can disallow finalization methods altogether and instead pass in relevant bits of the object to an external cleanup routine.

Now, let's first assume that we can do redispatch properly. That means with a class hierarchy like:

    A   B
     \ /

If A, B and C all have a method foo, doing in C calls A's foo, and if that method then calls then we'll invoke B's foo. (Note that perl 5's SUPER doesn't do that, you need to use NEXT instead) This is an important assumption for systems with multiple inheritance. (With a single-inheritance system it isn't an issue)

And, so we have a nice diagram to refer to, lets assume our class hierarchy looks like:

A   B  A   E
 \ /    \ /
  C      D
  \     /
   \   /
    \ /

Note that A is in the hierarchy twice. We'll assume, for the sake of argument, that all the classes have DESTROY methods.

So, how do we finalize an object of class F?

Well... we could call the DESTROY method in F, whatever that might be called. That means we call F's DESTROY. If all the methods SUPER properly, that means we call them in the order F, C, A, B, D, E, or F, C, A, B, D, A, E, depending on whether we prune the tree or not. For the sake of argument we'll assume that we do prune the tree, as that's the common thing to do.

There is, right there, a problem. If we call finalizers in normal method order, then A gets cleaned up before D has a chance to clean itself up--if D depends on its parent classes being still in good working order (not an unreasonable assumption, even in a finalizer) then you're hosed because it isn't--it already cleaned up after itself. The sensible thing to do in this case is to have a separate traversal scheme for finalization. Yeah, special cases are sub-optimal, but it beats screwing things up. Of course, if the DESTROY method in E reparents the object into the root set, well... you're in some serious trouble anyway.

The big problem with this scheme is one of trust--what happens if some DESTROY method, say C's, decides not to call SUPER.DESTROY? Well... then you get an object that's only partially finalized, and its possible it'll leak some resources. (This, interestingly, is not common in practice since usually each resource that needs active finalization has a wrapper object so the worst that happens is that those objects get cleaned up by their own destroy rather than the finalize of the wrapping object. This is a good reason to not have one object wrap multiple bare things)

The big advantage to this scheme is that you can actually call your parent finalizers before you're done with your own finalization. At least, I assume this is an advantage. I dunno, I don't do objects.

The alternative means of doing finalization is to make sure we call all the finalizers for the object. No redispatch or anything, the object cleanup code just scans the hierarchy for an object and calls all the DESTROY methods. For our example class, we'd likely call the finalizers in F, C, D, A, B, E order, as we'd prune the tree and call from shallowest to deepest. If a class is in the hierarchy at multiple depths we use the deepest version of it.

This alternate method actually works pretty well, since you're insulated from potential problems in the tree--you don't have to worry that your child classes might forget to redispatch or something. It can also be done reasonably quickly, as you can actually cache the traversal order and methods in the class somewhere and not have to do any dynamic lookups, but that's not likely a huge issue. (If you have issues with dispatch speed for finalization you've probably got bigger issues) There's still the issue of reparenting--it's possible once again for that pesky E class to reparent the object that's now mostly dead.

The third way is to attach a closure of some sort to the object, or to each class, that takes some of the data out of the object. When an object is slated for destruction you yank out the bits and call the closure. The advantage here is that, unless your object is self-referential, there's no way to reparent the thing. (And if it is, the assumption is that you're really out of luck and reparenting will just fail, or die horribly, or something of the sort)

Each of these methods is fine. Personally I prefer the automatic calling of the finalize methods, so I don't have to worry about forgetting or screwing things up. (There's that whole 'encapsulation' thing--how can a child class have any clue as to whether my parent class should or shouldn't clean up? It shouldn't know, as it's just none of its business) The nasty bit for Parrot is the mix'n'match issue.

I'd love to choose just one, but I can't--we've committed to making perl 5, perl 6, python, and ruby work. Perl 5 and python use the "you better delegate right" scheme, Perl 6 uses the "call 'em all!" scheme, and ruby uses the closure scheme. So... what, then, do we do if we mix it up?

For example, in our diagram, lets think that F and B are Python classes, A is a Ruby class, and the rest are perl 6 or C++ classes. And, just to make it more difficult, B doesn't redispatch its DESTROY method. (After all, why should it? It's a top-level class) Contrived? Maybe. We are pushing Parrot's interoperability, though, so someone'll do it. And even if it doesn't get so bad, there'll be plenty of perl 5/perl 6 mixing, and you'll likely see perl 6 on the top and bottom with perl 5 in the middle in a lot of cases. (As perl 6 is used for new code, and people start refactoring old code from the bottom up)

The table looks like:

CPerl 6Auto
DPerl 6Auto
EPerl 6Auto

The question, then, is... what gets called? And in what order?

The first thing that springs to mind is taking them by group--call the Python-style finalizer, then call the automatic finalizers, then call the ruby finalizers. That, though, will destroy the object out of order--there'll be classes where the parent class attributes are gone before the child class gets to do its thing. That's A Bad Thing. No joy there.

The next thing to do is assume that the C++/Perl 6 finalizers just automatically redispatch, as does the ruby finalizer. But... in that case, B's failure to redispatch means we never call E's finalizer. That's bad too.

The third thing to do is redispatch anyway if there are automatic finalizers left to run, even if a manual finalization doesn't redispatch. But... what if the finalization didn't redispatch on purpose? There may be a reason. (I'd not bet on a good one, but I don't get to not implement things based on judgement calls on other people's code, even if it really sucks. Still gotta make it work)

You could continue the redispatch if a manual method didn't redispatch, but only go for the automatic methods, which wouldn't be too bad, but still sits poorly for me. Ick. I don't, I'm afraid, have a good solution. There may not be one, in which case it's a matter of choosing the least bad. Or punting this to Larry, Guido, and Matz and letting them hash it all out.

Finalization. Who knew death would be so darned complex and annoying?

Posted by Dan at March 5, 2004 04:55 PM | TrackBack (5)

Slightly off-topic but I'm wondering: when you say A appears twice in F, does F actually have 2 instances of each of A's attributes, or is it like C++ virtual inheritance, where there's only one copy.

An on-topic thing: when you say it would be wrong automatically do a SUPER:: to a base class (because there may be a good reason for not doing (or for doing so twice!!), would it necessarily be wrong to automatically call the NEXT:: dtor?


Posted by: Dave Whipp at March 5, 2004 07:24 PM

Argh! I was writing a long and thorough reply and then my browser crashed about 90% of the way there. I'll start over tomorrow… *grumble*

Posted by: Aristotle Pagaltzis at March 5, 2004 11:54 PM

When I said A was in there twice, there's only really one instance of A's attributes--the second and subsequent instance are virtual and generally pruned out. (So if you were doing a redispatch thing you'd not call A's methods more than once, even if it could satisfy the call. (There are languages where this isn't the case, but I'm not going to worry too much about those at the moment. Heck, there are languages that do automatic guaranteed redispatch of methods too, and we're not doing that :)

Anyway, when I was talking SUPER I meant real, proper redispatch, which corresponds to perl 5's NEXT, rather than its (broken) real SUPER.

Posted by: Dan at March 6, 2004 12:06 PM

Reparenting is baaad, ng-kaaay? Just say no.

There. Solved. Next problem?

(I wish it were that simple...)

Posted by: Eric TF Bat at March 7, 2004 02:03 AM

Oh, I'd be so happy if I could skip reparenting. That'd make my life so much easier. Alas, forbidding reparenting requires the same sort of hassles as proper reparenting does, so arguably the only real difference is that if we allow it we need to redo the DOD sweep after the end of the DESTROY methods for the object that reparented, in case other things now fail to be dead.

Handling it properly does have the odd upside--it means we have to have the infrastructure in place for generational garbage collection, and that's not a bad thing.

Posted by: Dan at March 7, 2004 10:41 AM

Okay, let's have at it again. I'll be using AF for "auto-finalization" and MR for "manual redispatch".

First, it should be established that a programmer working with AF may, and has all the rights to, rely on their classes being finalized at destruction. As well, they don't even have a means to manually redispatch, even if they knew they're going to be run on Parrot — there's nothing like that in the language. So Parrot has no choice here, really; AF classes have to be finalized, period.

Now that that's out of the way, there are 4 edge cases to be decided upon, and we'll start with the simple ones:

1) MR subclass, MR superclass: typical MR language scenario; we let the subclass decide.

2) AF subclass, AF superclass: dito typical; we finalize both.

The first non-trivial case is still simple:

3) MR subclass, AF superclass: we can draw on the above expectation here; the AF class expects / relies on finalization, so we make sure it is eventually finalized.

So far, there is a very simple strategy: look for all AF classes in the tree and make sure they're eventually finalized. But what do we do about the final case we will encounter?

4) AF subclass, MR superclass: we can imply that the subclass redispatched, or we can imply that it didn't. Which is more sensible? If you put yourself in the subclass's programmer's view, it's clear: s/he does not expect to have to do anything about their superclass, and s/he also expects their superclasses will be finalized. So, we imply redispatch.

Now the simple strategy gets a little ugly: we have to walk the tree and make note where an AF class inherits from an MR class.

Posted by: Aristotle Pagaltzis at March 9, 2004 06:35 AM