September 28, 2005

Concessions to hardware

Now that I'm back poking around inside cola, some things that've been idly kicking around are coming back to the foreground. And yeah, it means more ops, which I'm sure people would whine about except that the people who'd whine don't care about cola. Yay for marginal projects!

Anyway, the kicking around things. In this case, arithmetic.

If you've looked you've seen, I'm sure, that there are provisions for basic math built into the interpreter, which is just fine. What isn't there, though, are size-limited and checked math operations. That is, there's no way to say "add I1 and I2, but assume they're 8 bit integers" or "multiply I4 by I6 and throw an exception on over/underflow". This is something that both the JVM and .NET have, and is useful to have -- I know I've been annoyed on more than one occasion writing C code and having over/underflow problems that caused subtle bugs. You don't always want that semantic, of course, but sometimes you do.

This is the sort of thing that could potentially be library code, of course, but it's pretty darned low-level, and making a function call to do it adds an order of magnitude or more to the cost, which is really counterproductive for something like this. It's also worthwhile enough that the JVM and .NET folks were willing to sacrifice their very limited set of basic opcodes to implement at least some of it.

This could also be left to compilers to implement -- it isn't too tough to emit checking code, and that's certainly a possibility, but checked and size-limited arithmetic is usually supported at the hardware level (size-limited certainly, and on some architectures you can get exceptions for over/underflow) so it'd be kind of stupid to not potentially take advantage of it where it exists. Means some probing at the configure level and macros (Yeah, I know. Still...) to support it reasonably easily.

So... in go 8/16/32 bit math variants, _checked variants, and the combination of the two.

As yet there are no 64 bit variants. That's something that's sort of problematic to do portably, so it's still being punted on.

Posted by Dan at 11:34 AM | Comments (2) | TrackBack

September 22, 2005

Which language is it anyway?

I've started a new section, Cola notes, to track some of the things I'm putting into Cola. They should get documented in the cola distribution itself, but this is as good a spot as any to make notes on what's going on.

Things were always designed with multiple languages in mind, but one of the things we never really did deal with was how to actually get an executable of the appropriate name to act as the language in question, rather than as parrot. Kind of an important omission, but since parrot was so far from being production it didn't much matter. Given that I've decided that it was a mistake to not have parrot actively hosting a HLL for real and quickly (that's a matter for the post-mortem) I want to rectify it with cola.

Ultimately we'd like to be able to do:


where foo is some language, and have cola invoke the foo compiler and do whatever it is that language does when invoked.

In this case, I'd like cola to treat this the same as if it were invoked as:

cola -L foo

that is, invoke the foo language compiler. This is something that's also not currently in cola, but is going to be added. Like all the other languages we care about, there's going to be a defined on-disk layout of the library, including a defined spot for language compilers.

Basically whenever cola gets invoked it checks the name under which it was invoked and, if it isn't cola, it'll assume it's been invoked as a language and go load up that language's compiler module and Do The Right Thing. (For carefully and well-defined values of "right thing", of course, but that's a metadata issue for later, and another post)

I'm kind of hoping something like this can be ready for qarte reasonably soon. While it isn't anywhere near the language that Parrot was originally designed for, it'll be good to have a working language to futz with.

Posted by Dan at 01:23 PM | Comments (2) | TrackBack

September 21, 2005

WWIDD: Assertions on code's properties

One of the things we could've put into parrot, had we thought of it much, was a lot more metadata and the facilities to automatically use it. (And I think this is one of the things I'm going to add into Cola)

In this case I'm talking about function signatures, the enforcement of function signatures, and the capability of the engine to manage them. This also touches on a mis-design in namespaces.

Parrot was designed for dynamic languages, and one thing dynamic languages do is change. (Yeah, I know -- Duh!) The code is mutable at runtime and potentially unknown at compile-time. Lots of uncertainty, which while it can be really nice for the programmer (it brings a fair amount of flexibility) it's a pain in the neck for code generation, since that uncertainty makes it really tough to do any sorts of optimizations, take any shortcuts, or generally cheat in reasonable ways. It also makes it tough to put any sorts of guarantees on what code does, which means that to behave properly there needs to be a lot of metadata passed around and checked, which is costly.

In this case, I'm talking specifically about parameters for subroutines and functions, as well as their return values. Now, you'd figure that this stuff would all be known at compile-time and it wouldn't be a big deal, right? Well...

There are a couple of issues here.

The first is one of parameter counts. Parrot's calling conventions specify the number of parameters which are passed, so that you don't have to know at compile time how many return values you got, nor do you have to know at compile time how many parameters you're passing in. The sending code sets the parameters and parameter counts, and the receiving code checks the counts. That's great, except that it's a little slow. Not a lot slow, but a little for each call, and that adds up. It's a lot nicer if you can just depend on what you're getting without having to check.

As a for example, if your compiled code assumes a function signature of:

(int, int) = some_func(int, int, int)

you'd really hate to have that replaced at runtime with a function that looks like

(int) = some_func(int, int, int, int)

since there's a darned good chance your code is going to fail.

Another issue is one of types, which is a twofold issue for parrot. Firstly, because parrot's got multiple types of registers, these two functions:

some_func(int, float)

some_func(string, pmc)

pass in parameters in entirely different places, and if you're assuming you know where things are going, you'd be wrong if you compiled for the first but had the second at runtime.

Even if the low-level types are the same, a lot of languages make some guarantees at compile time based on higher-level types, so these two functions:

some_func(Foo, Bar)

some_func(Bar, Foo)

are not equivalent in most cases. If your language depends on compile time type checking as part of its guarantees of correctness, you really do need to make sure that those guarantees aren't violated at runtime. (Since with Parrot you can't be sure that library code somewhere hasn't messed around with the symbol table, since a lot of parrot's languages are allowed, or even encouraged, to do that)

Sure, for all of these you can generate the code to check at runtime, but that's pricey (since you're checking every time) and in most cases you want things to fail as soon as possible, not as late as possible. The best failure is one where when you load the program bytecode and the library bytecode in you immediately get type failure errors and the program dies before it can do anything. (Which is much better than getting halfway through screwing with your production database, say, then dying because of a signature error)

This sort of failure's not correct for every language, of course. For some they can handle the runtime errors, and for others it's OK to wait since you might not actually hit the problem code path. I mean, what the heck -- how often do you run the error handler code anyway, right?

So, to the actual solution. What I'd do differently is twofold.

First, I'd have a defined, though not mandatory, function signature property. PMCs which can act as a function may have this attached, so the interpreter knows where to get at it.

Second, I'd take function PMCs out of the generic global namespace. fetch_global and store_global would no longer deal with these, instead separate fetch/store_global_function (or something similar) would be used to put functions into the global namespace, splitting the global data and function namespaces apart as part of that. Yes, this has some issues that need dealing with, but generally speaking most of the languages parrot cares about make at least a mild attempt to split function and data into separate namespaces, and at this point I don't think it's unreasonable to just make it mandatory. Possibly with some interpreter support (store_appropriate or something) to query the PMC being stored for its type, but even then I'm not sure.

Third, of course, is the ability to set mandatory properties on names. That is, be able to say:

make_mandatory "name", "prop_name", "prop_val"

such that any store into that named slot will fail unless it's got a property of the given name with the given value. Probably with a different name, since that one's not that good.

Alternately, something could be done with putting a store on the namespace slot and having the notification code do the checking and throw the exception. That has the advantage of not requiring another facility to be built into the interpreter (a win) but make it more difficult to do metadata checks (a loss). Alternately the interface could use the notification system under the hood and hide it all, which has a certain appeal as well.

Posted by Dan at 04:21 PM | Comments (0) | TrackBack

Back in the saddle again

Well, now that time's passed, and feelings have cooled, it's time to start back into some serious coding. And yeah, it's parrot-related. (You knew I couldn't stay away, and so did I) Well, semi-parrot-related, at least.

A while back I got a parrot repository dump, and yanked it into a new subversion repository on my home server box, where it's just sat, waiting on time and interest. Two things that I have at the moment. So... it's time to code again. I'm not entirely happy with the structure of the repository as it's checked out from subversion (It has branch and trunk directories for some reason, which I assume come from the cvs2svn script that built it -- not complaining, I'm happy to have all the revision notes) but hey, it works, I can deal.

First up is a better forth compiler than what's in the original repository, with some architectural changes made to it based on some of the things I've learned working on the DecisionPlus compiler for $CONSULTING_GIG. (And yeah, I do have some free time for that -- if anyone needs deep perl or C magic done, get in touch) This won't be a pure forth, since I'm not going to even try and preserve some of the more architecture-dependent behaviours, so I think I'll call it Quarte just because.

After that, I've a hankering for a working Scheme compiler. Again, I doubt it'll be a pure and compatible scheme (though I'd like it to be, unlike the forth implementation) so it'll get called Plot.

And yeah, I have learned my lesson. Having working HLLs makes doing VM development a damn sight easier.

I've a couple of perl 5 related projects in the works as well. NCI, to provide a parrot-style NCI interface to perl 5. StrictSubs, something I promised Tom Phoenix ages ago and tried (but failed) to get done, which will be a perl module that'll scream if you use any subs that don't exist at compiletime (lexically scope). Devel::Size will get some abuse as well, as there are a number of bugs and limitations that need lifting.

Plenty of swamping out of Leo crap in the repository that desperately needs throwing out and rewriting as well, and implementation of some of the features that I wanted implementing, like async IO, the IO streams system, events, and proper exceptions. That should be fun.

If you're keeping track at home, the new project is going to be called Cola. It's got to be called something, and that's as good a name as any.

Who knows, if I get ambitious I might leverage the PPI module to build a working perl 5 compiler for it. That'd be fun.

Posted by Dan at 11:53 AM | Comments (8) | TrackBack

September 02, 2005

Hanging up the shingle

As I'm sure folks have noticed, I've not been posting much lately. That's because... the $WORK project has wound down, at least my part of it. This is a good thing! It means I have a 99% working system -- grammar, parser, compiler, data types, runtime libraries, database conversion scripts, and ISAM to Postgres shim system. Woohoo!

That also means we're in the testing and getting ready for deployment part of the program, things that are better left to other people, especially in a small IT shop like we are. There's less for me, as the guy writing the compiler to do, and more for the people who're doing the rollout and setup to do. As such, me staying on full-time's kinda pointless, so I'm not.

And so... I'm freelancing, and it's about time.

I'll be updating my skillset list over the next week or two as I get things settled in, but if you need interfaces to C libraries, or low-level speed-critical C code written for perl, ruby, or python code; unusual programming jobs done; or have an antique 4GL that you desperately need an inexpensive and low-risk migration path away from... drop me a line and we'll see what we can do.

In the mean time, I may finally get all the pending blog entries finished. Woohoo! :)

Posted by Dan at 12:01 PM | Comments (3) | TrackBack