April 14, 2003

Efficient objects for dynamic languages

Hadn't planned on this, but since I've been mulling over for the presentation at tonight's Boston Perlmongers meeting, I figure it's worth posting.

Consider, for a moment, the humble method call, the building block of any OO system. To wit:

SomeObject foo;
result = foo.bar(12)

Now, you might think that's simple, and in a statically typed language it is. You know at compile time what the type of foo is, you know where in its list of methods bar lives, and probably even can precalculate the multimethod dispatch if your language does that sort of thing, so the final executable at best needs to fetch the class base pointer for foo's class (which it knows at compile time, and thus might even be resolvable by the linker) take an offset to find the bar method (who's offset you know at compile time) and call it. If you've a horrible runtime or linker it's at worst a search for foo's class, a pair of pointer fetches and an addition, plus the ultimate method call. If you have a good linker or runtime (so you know the offset of foo's class in your master class offset array) it's two adds and two pointer fetches, plus the method call. If you're really good, it can all be resolved at link time so there is no runtime cost at all over the method call.

Generally we're looking at the good, but not best case, since if you're going static you might as well do it right but you don't want to touch your linker, so figure in most cases making a method call costs two pointer fetches and two additions to find the method. That's not bad. I'd love to be able to do that with parrot.

Alas, not, because all our method calls are essentially this more interesting thing:

$someobject $foo;
randomstring $bar;
$result = $foo.$bar(12);

Not only is the class of the method unknown at compile time, so is the method name. And, in the true spirit of perl (not to mention Ruby--not Python, though, as they frown on this sort of thing generally) it's possible that $foo changes types. Heck, it's possible that the call to the $bar method, whatever that is, changes $foo's type. But that's not really an issue at the moment. (Later, yeah, but it's just a specific case of the more general problem)

Now, let's throw one more monkeywrench in the works here, since $foo's class can get in the act and decide how (or even if) to dispatch $bar, and can do it differently every time if it wants.

Given this, the only sane thing to do is to punt and delegate the method call entirely to the object. That means the degenerate case is a pointer fetch, addition, function call, and insanity as the class code does bizarre things, but we won't worry about that, since you can be degenerate in your own time.

The common case, then, will be a pointer fetch, addition, function call, then method lookup. And that method lookup's the killer. This is also the best case, and the only way to make the speed acceptable is to have a fast method lookup on the back end.

This back end lookup is nasty for a number of reasons, not the least of which is the fact that the inheritance hierarchy is mutable at runtime, the methods in the various classes in the tree are mutable at runtime--not just their bodies, but their very existence--and we may well end up having to redispatch so a method body that satisfies our lookup may not be the end of it, and we may need to keep on going. (Which is so much fun... though really useful)

Anyway, for that to all work out means we need a lot of support infrastructure. Method caches, a notification and event system so we can invalidate those caches, and fast optimized code available for the normal case, so most of the flexible stuff can just be tossed because we don't need it.

Hrm. This is long, so I think I'll go on about parts of this later. But I have decided that to support the common case, which is calling a method that's a compile-time constant name, we'll have a method call op like

callmeth Ix

as well as the

form (Since remember that parrot's calling conventions specify the method name in one of the string registers, so we don't have to put it in the instruction stream) where in the first case Ix is the hashed value of the method name, using parrot's default provided hash scheme. At least that way we don't have to recalc the hash every time to go look up the method in the cache...

We probably should have a

callmeth Px

in whatever PMC register might be free for this, in the case where the method PMC can't change, so we don't have to go look it up. Perl couldn't use this in many cases, but something like C# might. (Though, arguably, not when calling into perl code, but we could have some method/object property check to see if things are runtime fixed. Hrm)

Posted by Dan at April 14, 2003 04:49 PM | TrackBack (0)

Dan, have you considered using the inline cache technique that some SmallTalks use for fast method lookup. My understanding is that its best case is faster than a typical C++ VTABLE method dispatch.

Posted by: Jim Weirich at April 16, 2003 09:12 PM

I plan on diving into a lot of the research the Smalltalk folks have done on efficient method caching and dispatch, but I've not done it yet. I'm hoping that it can be deferred until later and used as a transparent backend optimization. I expect I'll find that a few tweaks to the opcode semantics (which by that time I won't be able to do) would end up making things faster if I do that, though.

Posted by: Dan at April 17, 2003 12:09 PM