January 31, 2006

Banging on blocks of data

Continuing on with the multithreaded vector related VM stuff, I've been thinking about vector operations. That (thanks to a reminder from someone) has also got me thinking about APL. Well, OK, really J, since thinking about APL tends to make me think of Unicode, and when I do that I need a good lie-down until it passes.

Anyway, vector operations. Useful things in a small subset of problems, and at the moment I'm interested in that particular subset. The question, then, is what vector operations should be permitted?

As a start, there's the question of how many dimensions are allowed? And yeah, I know, vectors are one-dimensional, but once you move past scalars the issue of dimensionality raises its head. Should two or three (or four, or N) dimensional matrices be allowable? At the moment, I'm saying no, since that complicates things, but I'll freely admit it's a tentative no, so it may be allowed in the future.

Next there's the issue of machine operations. Should the VM treat scalar values and vector values identically? That is, should:

dest = source1 + source2

work just fine regardless of whether source1 and/or source2 is a vector or scalar value? (We'll ignore the underlying machine representation, whether we're register or stack based, and all that, for right now. We're talking abstract here) There are certainly arguments to be made for both requiring explicit declaration of intent and allowing for implicitly figuring things out at runtime.

Being implicit is certainly possible -- there'll have to be sufficient information at runtime to determine that a variable is a scalar or a vector, as well as how large a vector it is. (Remember we're dealing with an embeddable VM, and we want some measure of safety, so there can be some runtime checks to make sure things are what they seem so as not to allow malicious code to do Evil Things(tm). There's been more than enough crap done that way in various systems that there's no way I'd design a system that didn't allow for proper checking) And as a number of languages have shown, just Doing The Right Thing is very useful.

On the other hand, explicitness does have its virtues. When you're writing code of the sort we're specializing in here you're reasonably sure (or at least had better be sure) what types of data you've got. Yes, you might not know the dimension of a vector, but you certainly know whether you have a vector or a scalar. And if you're explicit there's the potential to do more error checking, as well as to be a bit more efficient in the generated code, if you can do some static analysis of the bytecode. (Whether you want to do this or not depends on the use that you'll put the code, of course, and in many cases can be inferred from the data anyway) But basically if the information's in the source language, it ought to be in the bytecode if at all possible, since the more information you have the better job you can do making efficient choices. Information loss is a Bad Thing.

So, we're going to do only scalars and one-dimensional vectors, and we're going to have the operations explicit. What, then, would the actual operations be?

Well, that's a good question. For my purposes I need basic math (addition, subtraction, multiplication, division, and modulus) as well as the basic transcendental operations (log, sin, cosine, tangent, and exponentiation), so that's a good start. Ten operations, and two or four permutations (scalar or vector parameter for the unary operators, scalar or vector on each side for the binary) leaves thirty basic operations, which isn't that bad, all things considered.

There's also the issue of logical operations, but right now I'm inclined to leave those scalar-only.

Posted by Dan at January 31, 2006 01:59 PM | TrackBack (0)
Comments

As for multiplications: your math isn't quite sound. There are a couple of (useful) ways to define multiplication for single-dimensional scalars, even.

In general, I'm all for making things work in the most general case, so I'd tend to allow for tensors. That's something I originally intended to add to the Math::Symbolic Perl 5 module. Then, I thought about it some more and realized that compared to the amount of thought a user has to put into the math and algorithms related to objects with dimensionality beyond matrices, the programming is highly insignificant.

Steffen

Posted by: Steffen Müller at January 31, 2006 05:43 PM

Huh, I'm pretty sure I did my math right -- five operations with four permutations (ss, sv, vs, and vv) and five operations with two permutations (s and v) gets thirty operations. Or so my math goes, but it's been an odd day and I may miss something. Wouldn't be the first time... :)

The big reason to not go all the way to tensors (and, I'm ashamed to admit I had to go look that up. Been far too long) is that I'm not sure what the right thing to do with them is. Well, that and I don't need them now, so I'm more than willing to punt until later and leave my options open.

It is, after all, a lot easier to allow something previously forbidden than it is to forbid something previously allowed (or change the way something previously allowed works). :)

Posted by: Dan at January 31, 2006 07:06 PM

Ouch. It was late when I posted that. "single-dimensional scalars" should, of course, be "single-dimensional vectors". A scalar has no dimension. :)

I guess, this makes two kinds of multiplication obvious: a three component vector has scalar and cross products.

Steffen

Posted by: Steffen Müller at February 1, 2006 02:58 AM

Mixed scalars and vectors can be very nice. I use them regularly in R, so if I'd ever become a user of your new project, I would vote for them.


Posted by: tom at February 1, 2006 05:57 AM

Have you considered going wild and providing general support for array operations on vectors of any kind of object ? Like in http://www.fscript.org/download/OOPAL.pdf

?

Posted by: Victor at February 1, 2006 04:17 PM

You have looked into Morgan Stanley's APL variant, A+?

Posted by: Jarkko Hietaniemi at February 7, 2006 01:04 AM

Dot product or summation along a vector would be useful to support. Maybe there are a few other functions lurking around too, which aren't exactly vectorised versions of standard operations.

Posted by: Tim Lovell-Smith at May 2, 2006 06:08 AM

I'll throw it on the list. I'm going to be getting back to Tornado again in a few days, so gathering up useful things to do is, well... useful. :)

Posted by: Dan at May 2, 2006 06:32 AM

You and Leo need to sign my book. It says you had something to with it right there on the cover.

Posted by: C.J. at August 22, 2006 09:41 PM