September 21, 2005

WWIDD: Assertions on code's properties


One of the things we could've put into parrot, had we thought of it much, was a lot more metadata and the facilities to automatically use it. (And I think this is one of the things I'm going to add into Cola)


In this case I'm talking about function signatures, the enforcement of function signatures, and the capability of the engine to manage them. This also touches on a mis-design in namespaces.


Parrot was designed for dynamic languages, and one thing dynamic languages do is change. (Yeah, I know -- Duh!) The code is mutable at runtime and potentially unknown at compile-time. Lots of uncertainty, which while it can be really nice for the programmer (it brings a fair amount of flexibility) it's a pain in the neck for code generation, since that uncertainty makes it really tough to do any sorts of optimizations, take any shortcuts, or generally cheat in reasonable ways. It also makes it tough to put any sorts of guarantees on what code does, which means that to behave properly there needs to be a lot of metadata passed around and checked, which is costly.


In this case, I'm talking specifically about parameters for subroutines and functions, as well as their return values. Now, you'd figure that this stuff would all be known at compile-time and it wouldn't be a big deal, right? Well...


There are a couple of issues here.


The first is one of parameter counts. Parrot's calling conventions specify the number of parameters which are passed, so that you don't have to know at compile time how many return values you got, nor do you have to know at compile time how many parameters you're passing in. The sending code sets the parameters and parameter counts, and the receiving code checks the counts. That's great, except that it's a little slow. Not a lot slow, but a little for each call, and that adds up. It's a lot nicer if you can just depend on what you're getting without having to check.


As a for example, if your compiled code assumes a function signature of:


(int, int) = some_func(int, int, int)


you'd really hate to have that replaced at runtime with a function that looks like


(int) = some_func(int, int, int, int)


since there's a darned good chance your code is going to fail.


Another issue is one of types, which is a twofold issue for parrot. Firstly, because parrot's got multiple types of registers, these two functions:


some_func(int, float)

some_func(string, pmc)


pass in parameters in entirely different places, and if you're assuming you know where things are going, you'd be wrong if you compiled for the first but had the second at runtime.


Even if the low-level types are the same, a lot of languages make some guarantees at compile time based on higher-level types, so these two functions:


some_func(Foo, Bar)

some_func(Bar, Foo)


are not equivalent in most cases. If your language depends on compile time type checking as part of its guarantees of correctness, you really do need to make sure that those guarantees aren't violated at runtime. (Since with Parrot you can't be sure that library code somewhere hasn't messed around with the symbol table, since a lot of parrot's languages are allowed, or even encouraged, to do that)


Sure, for all of these you can generate the code to check at runtime, but that's pricey (since you're checking every time) and in most cases you want things to fail as soon as possible, not as late as possible. The best failure is one where when you load the program bytecode and the library bytecode in you immediately get type failure errors and the program dies before it can do anything. (Which is much better than getting halfway through screwing with your production database, say, then dying because of a signature error)


This sort of failure's not correct for every language, of course. For some they can handle the runtime errors, and for others it's OK to wait since you might not actually hit the problem code path. I mean, what the heck -- how often do you run the error handler code anyway, right?


So, to the actual solution. What I'd do differently is twofold.


First, I'd have a defined, though not mandatory, function signature property. PMCs which can act as a function may have this attached, so the interpreter knows where to get at it.


Second, I'd take function PMCs out of the generic global namespace. fetch_global and store_global would no longer deal with these, instead separate fetch/store_global_function (or something similar) would be used to put functions into the global namespace, splitting the global data and function namespaces apart as part of that. Yes, this has some issues that need dealing with, but generally speaking most of the languages parrot cares about make at least a mild attempt to split function and data into separate namespaces, and at this point I don't think it's unreasonable to just make it mandatory. Possibly with some interpreter support (store_appropriate or something) to query the PMC being stored for its type, but even then I'm not sure.


Third, of course, is the ability to set mandatory properties on names. That is, be able to say:


make_mandatory "name", "prop_name", "prop_val"


such that any store into that named slot will fail unless it's got a property of the given name with the given value. Probably with a different name, since that one's not that good.


Alternately, something could be done with putting a store on the namespace slot and having the notification code do the checking and throw the exception. That has the advantage of not requiring another facility to be built into the interpreter (a win) but make it more difficult to do metadata checks (a loss). Alternately the interface could use the notification system under the hood and hide it all, which has a certain appeal as well.

Posted by Dan at September 21, 2005 04:21 PM | TrackBack (0)
Comments