September 21, 2005

WWIDD: Assertions on code's properties


One of the things we could've put into parrot, had we thought of it much, was a lot more metadata and the facilities to automatically use it. (And I think this is one of the things I'm going to add into Cola)


In this case I'm talking about function signatures, the enforcement of function signatures, and the capability of the engine to manage them. This also touches on a mis-design in namespaces.


Parrot was designed for dynamic languages, and one thing dynamic languages do is change. (Yeah, I know -- Duh!) The code is mutable at runtime and potentially unknown at compile-time. Lots of uncertainty, which while it can be really nice for the programmer (it brings a fair amount of flexibility) it's a pain in the neck for code generation, since that uncertainty makes it really tough to do any sorts of optimizations, take any shortcuts, or generally cheat in reasonable ways. It also makes it tough to put any sorts of guarantees on what code does, which means that to behave properly there needs to be a lot of metadata passed around and checked, which is costly.


In this case, I'm talking specifically about parameters for subroutines and functions, as well as their return values. Now, you'd figure that this stuff would all be known at compile-time and it wouldn't be a big deal, right? Well...


There are a couple of issues here.


The first is one of parameter counts. Parrot's calling conventions specify the number of parameters which are passed, so that you don't have to know at compile time how many return values you got, nor do you have to know at compile time how many parameters you're passing in. The sending code sets the parameters and parameter counts, and the receiving code checks the counts. That's great, except that it's a little slow. Not a lot slow, but a little for each call, and that adds up. It's a lot nicer if you can just depend on what you're getting without having to check.


As a for example, if your compiled code assumes a function signature of:


(int, int) = some_func(int, int, int)


you'd really hate to have that replaced at runtime with a function that looks like


(int) = some_func(int, int, int, int)


since there's a darned good chance your code is going to fail.


Another issue is one of types, which is a twofold issue for parrot. Firstly, because parrot's got multiple types of registers, these two functions:


some_func(int, float)

some_func(string, pmc)


pass in parameters in entirely different places, and if you're assuming you know where things are going, you'd be wrong if you compiled for the first but had the second at runtime.


Even if the low-level types are the same, a lot of languages make some guarantees at compile time based on higher-level types, so these two functions:


some_func(Foo, Bar)

some_func(Bar, Foo)


are not equivalent in most cases. If your language depends on compile time type checking as part of its guarantees of correctness, you really do need to make sure that those guarantees aren't violated at runtime. (Since with Parrot you can't be sure that library code somewhere hasn't messed around with the symbol table, since a lot of parrot's languages are allowed, or even encouraged, to do that)


Sure, for all of these you can generate the code to check at runtime, but that's pricey (since you're checking every time) and in most cases you want things to fail as soon as possible, not as late as possible. The best failure is one where when you load the program bytecode and the library bytecode in you immediately get type failure errors and the program dies before it can do anything. (Which is much better than getting halfway through screwing with your production database, say, then dying because of a signature error)


This sort of failure's not correct for every language, of course. For some they can handle the runtime errors, and for others it's OK to wait since you might not actually hit the problem code path. I mean, what the heck -- how often do you run the error handler code anyway, right?


So, to the actual solution. What I'd do differently is twofold.


First, I'd have a defined, though not mandatory, function signature property. PMCs which can act as a function may have this attached, so the interpreter knows where to get at it.


Second, I'd take function PMCs out of the generic global namespace. fetch_global and store_global would no longer deal with these, instead separate fetch/store_global_function (or something similar) would be used to put functions into the global namespace, splitting the global data and function namespaces apart as part of that. Yes, this has some issues that need dealing with, but generally speaking most of the languages parrot cares about make at least a mild attempt to split function and data into separate namespaces, and at this point I don't think it's unreasonable to just make it mandatory. Possibly with some interpreter support (store_appropriate or something) to query the PMC being stored for its type, but even then I'm not sure.


Third, of course, is the ability to set mandatory properties on names. That is, be able to say:


make_mandatory "name", "prop_name", "prop_val"


such that any store into that named slot will fail unless it's got a property of the given name with the given value. Probably with a different name, since that one's not that good.


Alternately, something could be done with putting a store on the namespace slot and having the notification code do the checking and throw the exception. That has the advantage of not requiring another facility to be built into the interpreter (a win) but make it more difficult to do metadata checks (a loss). Alternately the interface could use the notification system under the hood and hide it all, which has a certain appeal as well.

Posted by Dan at 04:21 PM | Comments (0) | TrackBack

July 09, 2005

WWIDD: MMD everywhere

On the top of the list of things I'd do differently in Parrot's design is fully embrace multiple dispatch. I mean fully. Any operation with two or more operands would be multiply dispatched, and this includes assignment.

When Parrot started MMD just wasn't on the table. Oh, sure, there were overloaded operators, but that was all left-side-wins stuff, wedged into the PMC vtable. We introduced a standard MMD system, with the assumption that PMCs which wanted to do it would have their vtable slots use the standard system so there'd be cross-compatibility. That then, after a while, led to the realization that we could shrink PMC vtables a lot, simplify the internals, respect a default left-side-wins anyway,make things faster, and generally make things work nicely if we just went entirely MMD for binary operations with some proper defaulting. Unfortunately this was after we'd tossed the keyed versions of the different binary ops, a loss I still think is a waste. That's a rant for another day, though.

The one binary operation I failed to consider was assignment, and it's an important one, and parrot didn't multiply dispatch it. Big mistake. (One that's fixable with the current parrot design, interestingly enough, the same way that we could make the binary operations use MMD without affecting the bytecode in any way)

Now, the whole point of MMD, at least as far as I'm concerned, is to allow you to cheat like hell when you know it's safe. That is, you provide a nice, generic, safe interface that, while potentially a little slow, lets you have those nice black-box data structures we're always told we really want (and never believe until it bites us hard, usually a year or two too late to fix the problem) while still being as fast as it possibly can be in those cases where we know we don't need to be careful.

For example if we're assigning an Integer to an Integer there's really no need to go jumping through any sorts of hoops -- the Integer class knows what the internal structure of an Integer looks like (we can hope, at least, and worry about the authors of the class if it doesn't) so making function calls to get values is silly. That is, the assignment function for an Integer-> Integer assign should look like:

dest_pmc->cache.intslot = source_pmc->cache.intslot

rather than

dest_pmc->cache.intslot = source_pmc->vtable->get_integer(interpreter, source_pmc)

(modulo wrong off-the-cuff C indirect function call syntax). Of course you can't have the first form as the standard assignment function, since it's wrong for so many things. Indeed, the only thing you can really do is have the standard assignment look like:

dest_pmc->vtable->set_pmc(interp, source_pmc, dest_pmc)

with the set_pmc function for the destination then:

dest_pmc->cache.intslot = source_pmc->vtable->get_integer(interpreter, source_pmc)

Which is definitely sub-optimal in specific cases, while OK in the more generic case. We could, of course, throw a flag test in the assignment function for the Integer class, but we know from experience that flag tests are more expensive than they're generally worth, are an inextensible pain since they lead to if ladders at the top of functions, and are an indication that we should be doing MMD anyway.

There you go. Straight assignment should've been MMD, but it wasn't, as much for hysterical raisins and evolving understanding as anything else, and it could be made multiply dispatched if

That then leads to the question 'should the full binary operation be MMD on the destination?' That's a valid question, since parrot requires (or did require) that the destination for a binary operation exist. There's been much bitching about that on the list, but it's a pretty significant win in terms of temporary objects created (or, rather, not created) and going on about that's a topic for another WWIT anyway, so I'll stop with the explanation there.

So. Should "a = b + c" dispatch on just the types of b and c for the addition, and then on the result type and a for the assignment, or should there be just one big dispatch on the types of a, b , and c?

That's a good question. I dunno what the right answer is. Or, rather, both Yes and No are perfectly fine answers, and the best one is a matter of figuring out which would be the common usage and going with that.

My gut feel is that generally it's not a win, so the two-step dispatch is better. On the other hand, a good case could be made for doing the three-arg dispatch. Pleasantly, since as far as the bytecode is concerned It's All Magic Anyway, either one could be chosen and later on the other could be switched in.

Who knows, if Chip wants it, this could go into parrot now. Certainly's going to in my tree.

Posted by Dan at 09:18 AM | Comments (2) | TrackBack