May 18, 2006

Saying bye to dynamism, and hello to some ground rules

It's always useful to frown at your assumptions when you're doing design work, since they so often go unexamined, and often force you to build in limitations you hadn't expected in your software. (And sometimes they force you to build in features you hadn't expected, which is nice) Those assumptions also force tradeoffs, which are inevitable -- nothing's free, alas.

One of the base assumptions that Parrot had was extreme dynamism, in part because the languages it was targeted for are extremely dynamic. Code could change at runtime, variable types could change at runtime, and array sizes could change at runtime. (Indeed, the structure of your program could change while the code was inside a single opcode) That brought in all sorts of nifty things, but it also made life somewhat slow and annoying. Or relatively slow, at least, since if you embrace your assumptions rather than fighting against them you generally do pretty well.

Tornado, on the other hand, doesn't have the same sorts of assumptions that Parrot has, because it's not targeted at the same sorts of problems Parrot was. Parrot needed flexibility, while Tornado needs speed, security, and concurrency. Tornado's really more of a co-processor than a CPU -- something you call from your main program to perform some relatively specialized tasks, while you leave the general purpose code to your mainline code.

Because of that we can make some assumptions. They shoot the sort of full dynamism that perl has way down, but that's OK -- you'd call Tornado from perl, rather than running perl on Tornado. (And yeah, there's a perl interface module for it, though the engine itself doesn't yet function. Go figure)

So here are the ground rules for Tornado.

  1. Vectors are of fixed length
  2. Variables don't change type
  3. All the elements of a vector are the same type
  4. Metadata's not around
  5. Vectors are one-dimensional
  6. Operations on multiple vectors (addition or modulus or whatever) require two vectors of the same size
  7. We don't much care about text
  8. Data conversions must be explicit
  9. Lossy cross-type conversions aren't allowed
  10. Conversion rules are type-dependent, not value-dependent

#1 means that we can check vector lengths on variable fetch and be happy. Vectors may be of indeterminate length until their first assignment or use (so we can mark a parameter as being a vector but not knowing how long the vector might be at compiletime) but once it has a size the size is fixed.

#2 means that attempts to turn an integer variable to a bigint will fail

#3 means we don't deal with mixed-type vectors. All the elements are ints, or floats, or whatever, and we can store them as efficiently as possible

#4 means that we don't carry around metadata in the bytecode. No variable names, no attributes beyond type, no annotations, nothing. Maybe we'll lift that later, but probably not. The only thing we keep around is the minimum amount of information to allow potential bytecode verification.

#5 is one of those 'duh' things -- if it's a vector it's only got one dimension. (Otherwise it'd be a tensor) This limitation'll probably be lifted later, though it affects #6.

#6 is a flat-out dodge of a lot of hassle. If you add two vectors together and one's longer than the other, what do you do? Repeat? Zero fill? One fill? Well, Tornado pitches an exception. When (well, if) vectors become tensors, then we'll probably have to do something fancier, what with the way multidimensional things operate, but that's for later.

#7 is because, well, we're not a text processing engine. That's really not the point, though I can certainly see doing Clever Things with parallel algorithms and text searching, but... maybe another time. As far as Tornado's concerned there is no text at all, and if you want to do text-y things you can do them elsewhere. (And yeah, I know, Tornado's well-suited for some specialized text processing applications, but let's face it, crypto doesn't deal with text, it deals with streams of numbers that happen to be also text. I don't think anyone's got a crypto algorithm that does Special Things with multi-codepoint characters and if they do I really don't want to know)

#8 means that if we want to turn a float to an Int4 you have to do it explicitly. None of Perl's mighty morphing power scalars. They're good for some things, but not what we do. We might allow you to do operations with two types where one type is a subset of the other (Int8 and BigInt operations, or Int4 and Float operations), but the destination assignment had darned well better be lossless. Try and store a BigInt into an Int4, or a BigNum into a Float and you lose unconditionally.

#9 is there to allow code to get around the rules for #8 -- sometimes you can know that your results will fit into an Int4, even though there are floats and bigints involved, but if you do (or don't care) then you have to force an explicit conversion.

Finally #10 means that promotion rules depend on the type of the variables involved, not on their values. If you multiply an Int4 and a BigInt you'll get a BigInt, even if the Int4 has a value of 3 and the BigInt a value of 2.

The one potential caveat here is that we may allow for temporary casting of byte vectors to other types. That is, while a vector may hold a wad of bytes, you may want to treat it as a vector of Int2, Int4, or Int8s. (No Int3s -- If you're doing video data transforms, go for a 4-element format like CMYK or RGB with an alpha channel)

Yeah, not having dynamism, and code that changes based on circumstance, and dynamically loadable code, and a handy language compiler built in are all limitations, but they're limitations I'm fine with, since they serve a purpose, and Tornado's very much not a general-purpose VM.

Posted by Dan at May 18, 2006 07:17 PM | TrackBack (0)