November 29, 2003

What the heck is: A type

Update: I forgot to finish the type inferencing section. D'oh! Fixed.
It's a good excuse for a fight, at least in some circles. And papers. Lots and lots of papers.

But, more usefully, a type is a notation on some piece of data that marks a promise as to how that data behaves. (If you're looking for sheep, try elsewhere) They may be an indication of the storage that a data needs, the class an object belongs to or inherits from, or the values that data may take.1

Types, when used well, allow you to avoid, or at least reduce the incidence of, a whole class of errors in your programs, so called "type errors". These occur, as you've probably figured, when you're using the wrong type--trying to call a method on an integer, or do addition with two filehandles or something like that.

With a properly designed type system it's possible to prove at compile-time that type errors are impossible in your program. How useful this is depends both on how rich your type system (if the only type you have is "int", it's of limited use) and how much use your compiler can put the type information.

Static and Dynamic Typing

These two are the poles (or perhaps lightning rods) of type checking. Simply put, static typing assigns and checks types at compile time, while dynamic typing assigns and perhaps checks types at runtime.

Before going any further, it's important to realize that Pure Static and Pure Dynamic typechecking are two ends of a continuum. With a very few, generally pedagogical, exceptions no language is on one end or the other. Pretty much all languages fall somewhere in between. Whether this is a good or bad thing is a matter of opinion, but we'll dodge that question for now.

Note that a language can be statically typed yet still have some level of runtime dynamism, such as function or method redefinition or instantiation, reference or pointer based function or method calling, and runtime compilation of code. The language just needs to ensure that any assertions made at compile time remain correct in the face of runtime changes--function signatures may not change with a redefined function, for example, and in the face of compile time uncertainty (such as dispatching through a function pointer of unknown origin) that the constraints of the type system are enforced at runtime. More on this later.

C is an adequate, or at least nearly ubiquitous, example of a statically typed language. The types of all variables must be declared, the return types of all functions must be declared, and the parameters to all functions must be declared. (That it's possible to lie in many profound ways to the C compiler is a separate matter we'll deal with in a bit) The compiler knows how everything should be treated, and can act accordingly. For better or worse.

Python (and Ruby, and Smalltalk), on the other hand, is a good example of a dynamically typed language. In python the types of variables, values, parameters, functions, and methods are not known to the compiler, and in fact may change as the program runs. (Indeed, with these languages there arguably aren't any "variables" as such--just names which are bound at runtime to different things as the program runs.2) In languages such as this, if the type of something is important generally the programmer has to explicitly check.

Before going off on the "But the behaviour of an object is more important than its type!" rant, assuming you conflate classes and types (which is OK by me), I should note here that with a sufficiently rich type system the behavior of a thing is reflected in its type. Which is to say that methods and interfaces that a class implements are as much a part of its type as the class' parents or the class itself. That's partial types, and we'll go there later. Very few languages support them as part of the static type information, unfortunately.

Strong and Weak Typing

Another quality of a type system is its strength--basically an indicator of how easily and to what extent your program can lie to the type checking and validation code and get away with it. A type system is considered strong if you can't lie, and weak if you can lie. (And broken (or at least laughed at by its peers) if it starts out strong and then gets officially hacked as time goes on to allow it to be weakened. A lot of languages fall into this category, as practicality asserts itself)

C is a good example of a weakly typed language. Yeah, sure, C has data types, but they're more suggestions, or indications of current usage, than anything else. As an easy example, look at this:

  int foo;
double bar = 12.5;
foo = (int)bar;

It's a simple enough little chunk of code. foo is an integer, and bar is a double-precision floating point value. A straight assignment from one to the other is arguably illegal, as they're different types. (C will allow it, but it generally yells) That (int) bit in there, though, tells C to treat what follows as an int, rather than whatever type it is.

This is generally called type casting and there are two ways to do it. In this case, typecasting performs a conversion--C knows how to get the integer value from a floating point number and does so. With a conversion bar never loses its inherent float-ness--the bits that the name refers to are still considered part of a floating point number, we just then translate it to an integer. Casts can also change the way that the bits themselves are interpreted, for example:

  int foo;
double bar = 12.5;
foo = *(int *)&bar;

will, rather than treating bar as a float and convert it to an integer, assume that bar is, in fact, really holding an integer and interpret the first four bytes of it as if it were an int. (More or less 4 bytes at least, depending on the size of an int) We have to play pointer deref and cast games here because C won't let you directly change the interpretation of the underlying data storage for a variable, which is probably for the best.

Perl, oddly enough, is a good example of a strongly typed language. (The rich--and arguably excessive--behaviour of its base types tends to confuse folks on that account) A scalar is a scalar is a scalar--no matter what you do you'll never convince perl to treat it as an array, and when you have a reference to an array you can't, unlike in C, treat it as a reference to a hash or scalar or filehandle. There's no casting possible, either directly or through references.

Most languages that do the C-style "interpret the bits differently" cast are low-level languages operating on either primitive types or explicit relatively primitive structures. Once you get abstract enough (and most OO languages are abstract enough) this low-level direct bit hitting stuff just isn't allowed as there's no way for it to be useful without an understanding of a number of very low-level implementation dependent things (like class and object layouts) that's outside the range of the normal. (Not to mention the fact that the folks writing those sorts of languages tend to be opposed, on general principles, to that sort of low-level access)

Higher-level type casting as conversion is more commonly supported, as its often very useful--if for no other reason than to be able to turn strings (like, say, from user input) into structured data (like real integers, floats, or widgets). Opinions seem mixed as to whether this is a good thing or not, but practicality usually wins and it is supported, though generally it needs to be explicit. Very few languages will implicitly autoconvert or call autoconversion routines for you since it's assumed that if you're assigning a value of one type to a variable of another that you've made a mistake and its up to you to tell the compiler otherwise.

Partial Types

As we alluded to earlier, a type system can be far richer than just a plain "what class does this object belong to" or "Which structure are we allocating for you" thing. Types can also express behavior--what methods you have or interfaces you export--and ranges of permissible values. Since this means a full type may have more than one "thing" that defines it, you have cases where you don't really care about the full type--just part of the type.

For example, let's consider GUIs for a moment. It's common when doing GUI programming in a language that supports objects to have a class for each thing that can appear in a window, classes for windows, classes for events, classes for sounds, classes for actions--classes all over the place. As we're in an OO mood, each class is a type.

Now, consider the case where we're doing at least some level of static typing in our OO GUI language. The common thing to do, in a static type system, is to require each parameter to be a class or child of a class, which is fine, but... wait until you find yourself building objects that need to inherit from several parent classes at once. This isn't too tough to do--just throw serialization into the mix, and you'll immediately feel the pain. (The drawing methods require the parameter to inherit from some base Drawable class, and the serialization methods require their things to inherit from the base Serializable class)

The common way to deal with this is to ignore the heck out of all but one type constraint (Generally Serializable things take generic objects and probe methods at runtime) but, well.. ewww. The second-most common way to deal with this is multiple inheritance, but folks tend to get their rant on when dealing with MI. Partial types solve the problem. (And MI is a particular case of partial typing but Shh! don't tell)

With a type system that has partial types you declare that your drawing method must have a type of Drawable, but that doesn't preclude the object from having a type of Storable, Readable (if you've got a text-to-speech system), and maybe PhenomenallyUgly. (Though, with GUI systems, the last type is often a freebie)

With a type inferencing system (we'll talk about that in a bit) often you'll get partial types whether you know it or not, as they're handy things for optimizers--any information, even incomplete information, is better than nothing.

Runtime type checking

With some languages, compile-time typechecking is insufficient--there just isn't enough information at compile time to be sure that the rules of the type system are being followed. (This is especially true of type systems that include the value as part of the type) Languages like this generally provide runtime type checking, either automatically or via calls you can make. Runtime type checks are exactly what they sound like--as your program runs it checks the types of variables and values that get passed around.

This does, as you might expect, have a cost associated with it, since checks like this aren't free. (They're often quite cheap but even cheap adds up when you do it a million times or more) Runtime type checking also isn't restricted only to languages with some measure of dynamic typing associated with them--even the terribly strictly typed languages can have it. Rather it tends to be associated with languages with some measure of strong type checking. It's difficult to write a broad range of Interesting Programs without having at least some interface to the real world, and the real world tends to be a messy, typeless place. Runtime checks are used at the borders to make sure the data coming in conforms to the type information it's declared.

The more dynamic languages do tend to have more runtime type checking, though, since the more dynamically typed a language is the harder it is to check at compile time that the type invariants are enforced. If you can't tell, and you want to be sure, you have to resort to runtime checks, throwing type errors or exceptions if the invariants vary.

You'll often see, in dynamically typed languages (as well as statically typed languages with insufficiently expressive type systems), hand-rolled type checking at the start of functions and methods to make sure that the invariants the programmer wants expressed are expressed. Personally I've always disliked those, as they're both messy and error-prone, but often there's no good way to do anything else in the language at hand.

Type Inferencing

Type inferencing is when the compiler figures out what type most variables are and what functions return without the programmer necessarily having to explicitly type variables or functions. It's one of those neat tricks that a compiler can do because there's so much redundant information in most typed source code.

What the type inferencing engine does is trace the source of your program and makes type annotations based on the code, and uses these annotations to find the places where you've made type errors, without you having to actually put type notations everywhere.

The first thought that might spring to mind is "what about that pesky halting problem thing?" and yes, it's generally impossible for a type inferencing engine to be 100% correct for 100% of the programs you'll feed it, but that's fine--it doesn't have to be perfect to be useful. For example, this:

sub foo(a, b) {
declare c;
c = a.some_method(b);
return c**4;
}

q = foo(1, 4)
print substring(q, 1, 4);

That code has two type errors in it. First, foo returns a number of some sort, but we're doing a substring on it--unless we've got a language with lots of autoconversion that's not allowed. Second, the first parameter to foo must be an object, because we're calling a method on it, but when we invoke foo we're passing in an integer, and unless we've a language that assumes everything's an object that's not allowed.

It's clear, by inspection, that those errors exist, but in a dynamic language they won't be caught until runtime. Or... will they? A type inferencing pass on the source will pick them up, and they can be reported at compile time. That's a good thing, since generally the earlier an error is caught the better.

Of course, you can throw some uncertainty into the mix:

sub foo {
if (rand(4) >2) {
return "a";
} else {
return 12;
}
}

Still, that's OK. With a simple type inferencer it can just decide that the return type of foo is unknown and let that chunk of uncertainty flow through, in which case you're no worse off than if you'd not had the inferencing engine. Or... Very Clever inferencing engines can see that it returns either an integer or a string and, while it can't be sure which, it can certainly know that if you use the return value as, say, a matrix that your code's broken.

Types and objects

Static typing gets somewhat interesting when objects are involved. Or, rather, they get somewhat interesting when subclassing is involved, if the type notation on a value might indicate just a parent type rather than the actual type.

If your type system allows for subtypes and your object system allows child methods to have different method signatures from the parent methods they obscure, it makes it very difficult to do full static type checking. In this case you have to resort to runtime type checking if you want to maintain those type invariants, which has a performance penalty, and tends to make it more difficult to make sure your programs are right to begin with.3 This is one of the reasons the more dynamic OO languages tend to not bother with type checking, as without a good type inferencing system you very quickly get to the point where there's so much uncertainty that types are more hindrance than help. (Even with a good type inferencing system in somoe cases)

What's the point?

Ultimately, what the heck's the point of types? (Besides the excuse for an argument, of course) There are three big ones.

First, from a theoretical standpoint, they may, in the right system, allow you to prove your code is correct4 at compile time.

Second, they help catch some typos, provide a bit of help catching API changes (the ones that involve changing the types of the parameters, though that's arguably a Horribly Bad Thing), and provide something of a memory crutch for using libraries. (I personally find that last one useful, since I find details like argument ordering occasionally dribble out of my brain for less-used functions)

Third, they help the optimizer, and this is the one I like the best. Having types available at optimization time can mean the difference between knowing that foo is a generic thing and foo is a platform-native integer, something that can mean an order of magnitude or more difference in speed of use and memory usage. Not, perhaps, a huge difference for one variable, but it's something that can add up quickly.

In object-oriented languages typing also helps tremendously with method lookup. Code like this:
b = a.foo
can have the foo method pre-looked-up with a typed system. Without sufficient information at compile time, methods lookups all have to be be deferred to runtime. That means that every time the code snippet above is executed the runtime has to take the string "foo" (or perhaps the magic method token that corresponds to "foo", if your language can do that) and search the class hierarchy looking for that method. A good method cache can let you skip all but the first heirarchy search for a method, but you can't skip them all, and a lookup, even with a precomputed hash of the method name, is somewhat expensive.

On the other hand if, at compile time, you know the structure of a class hierarchy and the types (even just the base type) of a variable, you can build a vtable for each class, with one slot per method, and hang it off each object (or the class, at the cost of an extra dereference) and the runtime cost of the method dispatch is just an indirect array access, which is definitely a lot cheaper than any sort of string lookup. All you need to do to make that work is make sure that methods of the same name are always in the same slot for each branch of the tree, which isn't tough. (Just a bit of compiletime bookkeeping, something computers are reasonably good at)

Good optimization, however, doesn't require the source to be statically typed, it just requires the parsed representation of the source to be typed. (Type information reduces the uncertainty in a program, and uncertainty is the enemy of optimization) It's possible to have a dynamically or minimally typed source language that still has some type information made available to the optimizer--this is one of the places where type inferencing is really, really handy.

Why the fights?

Well, bluntly, because some people like fighting, and this is a good excuse. That's not a helpful answer, though.5 The battle is generally between the tinkerers and the mathematicians. Tinkerers (and hackers in the classic sense fall into this group) just like to get going, fiddle around, do their thing, and have the compiler get out of their way

Which is the right answer? Well... it depends, of course. Depends on the people involved, as the differing styles of typing appeal to different people. It also depends on the problem, since some problems require more stricture than others. (I, for one, want the code running a plane I'm in to have all the checks run on it as possible) And it depends on the performance requirements you have, since the more statically typed a language is the faster the code can be. (Not necessarily will be, of course, but the less uncertainty the optimizer has the more it can optimize) There really isn't One Right Answer. That'd be too easy. :)

1 Yes, a type could be "even numbers between 1 and 6" or "words that don't have the letter Q in them", though value based type systems are reasonably uncommon.
2 That's picking nits, and those "things" are really variables. But this isn't the place to get into the differences between names, variables, and values
3 You have to have a really good set of unit tests to catch this stuff in development or testing and, while unit tests are handy they're sometimes difficult to get 100%, and can't always check things right in really complex systems. Not that I dislike unit tests--I think they're great. They just solve a different, albeit partially overlapping, problem domain than types
4 For a very restricted definition of correct. Arguably anything's better than nothing.
5 Well, OK, it actually is a helpful answer, but for a more general problem.

Posted by Dan at 03:25 PM | Comments (16) | TrackBack

November 26, 2003

Doomed to repeat history

Just when you thought people would've learned from the travesty that was formmail.pl...

Movable Type comes, by default, with a program mt-send-entry.cgi that's supposed to allow pretty much anyone to send a blog entry to pretty much anyone else. Except, interestingly, it allows anyone to send anything to just about anyone else. (With a little tag that says its from your blog, of course)

It would seem that deleting or chmod -xing mt-send-entry.cgi would be a good idea if you've got a Movable Type installation.

Posted by Dan at 09:15 AM | Comments (1) | TrackBack

November 25, 2003

Obnoxious "help"

Important usability tip: If I'm running a command or program from the command line,the OS should not pop up any damn GUI boxes for what I'm doing, especially if I've not connected to any of the GUI libraries.

Today's culprit is Windows XP, which insists on popping up the "Your program has crashed! Would you like to report this to microsoft?" box when parrot dies in particularly violent ways. No, dammit, I would not like to report it, nor would I like to find my process blocked until I connect in and click the little button either, thanks.

Needless to say this makes automated text-only builds (like, say, Parrot's tinderbox system) less than 100% reliable.

I don't suppose anyone knows how to turn this helpful "feature" off?

Posted by Dan at 10:34 AM | Comments (4) | TrackBack

November 24, 2003

Cool But Bizarre

I got two copies of Perl 6 Essentials in the mail from O'Reilly today. In Polish. Which is really cool, albeit terribly bizarre. ('Specially as I don't speak or read any polish) It does drive home issues with internationalization and mixed-language character handling, though. (I see how Unicode is useful with this in ways that dealing with Asian character sets doesn't show nearly so much)

I should dig out the digital camera and put up a snapshot of the cover just for kicks. Well, that and the ego stroke, of course :)

Posted by Dan at 02:14 PM | Comments (3) | TrackBack

November 21, 2003

The mixed bag of syntax highlighting

Like so many other programmers, I use a syntax-highlighting editor. (Well, OK, I'm an emacs user, so technically I'm using a syntax highlighting operating system, but we'll let that one slide) I have noticed one interesting thing about using it, though--I comment my code less.

I noticed this the other day when I printed out the code for the compiler module I'm working on. In the editor, with colored highlighting enabled, it makes sense what's going on, and everything's reasonably obvious. On paper, though, in black and white with no highlighting the lack of comments was much more obvious, and the code itself didn't make a whole lot of sense. It's not even the amount of code that's visible, as I've got a near-obscenely sized set of monitors on my desk.

No, I don't know if this means anything. And yes, I know about a2ps, and I do use it, though it's highlighting's not as good. (Though I'm not sure now whether I want a good PIR higlighting mode or not, now)

It might be an indication that your code isn't quite as well documented as you think it is. Mine certainly isn't, though I'm going to go fix that.

Posted by Dan at 03:03 PM | Comments (10) | TrackBack

November 19, 2003

Full of sound and fury...

Well. Yesterday Massachussetts Supreme Judicial Court declared that the current marriage statutes in that state had sex-based discrimination built in and, as such, violated the protections granted by the Commonwealth's constitution. (I'd link, but if you can't find something on it you're not looking) This has, as you might expect, generated quite a lot of furor. As a mildly interested observer, I figured I'd weigh in on it too.

No, not on the decision itself. If you're a citizen or resident of the Commonwealth of Massachussetts and are in favor of the decision, write your congresscritter and voice your support. Good for you. Contrariwise, if you're a citizen or resident of Massachussets and are against the decision, write your congresscritter and voice your dissent. Good for you, I suppose. Either way you've gotten off your otherwise lazy political butt and participated in the process.

On the other hand if, like me, you are not a resident or citizen of the Commonwealth of Massachussetts, well... shut up. This is purely a state matter and if you're not a resident or citizen you have no say,

Yeah, I know "Oooh, full faith and credit! We'll have to honor them! All the Good Christian Marriages will dissolve and everyone will burst into flames!" or some other such crap. Yeah, whatever. If you don't want two men or two women getting married in your state, make it illegal or keep it illegal. If you don't want it happenning in Massachussets, tough. Welcome to the wonderful world of States Rights. And arguably the full faith and credit act won't apply if people really freak out. (State prohibitions on slavery or alcohol certainly didn't stop the practice in other states, nor were the covenants of slavery nor bands on distilling honored in those states that didn't agree)

The last time we had a big dust-up over this was in the mid 1860s. We could try that one again, I suppose. Or maybe people could just shut the fuck up and cope.

Nah, this is America--what am I thinking?

Posted by Dan at 09:47 AM | Comments (1) | TrackBack

November 12, 2003

It's never a good sign...

when the hold music for tech support is a lullaby. Though I suppose it beats the Dead Kennedys.

Posted by Dan at 08:46 PM | Comments (2) | TrackBack

November 11, 2003

It's the googleware, stupid!

It's never good to leave things hanging (nor, I suppose, is it that great an idea to link to yourself) so I did the sensible thing--empirical testing.

Threw up a new image, mentioned it on IRC, and a few folks went to look. No googlebot.

Dowloaded Opera and installed it, telling it that I was OK with google's adware/spyware stuff. Threw up another image, and looked at it with the new install of opera, which I then shut down and haven't fired up since.

The result? Five minutes and 38 seconds after my look at the image, here comes googlebot! There are exactly two hits on the URL, one from me with opera and one from crawler9.google.com, if the PTR record for 64.68.87.66 is to be believed. With a browser string of Mediapartners-Google/2.1 no less. (plus the googlebot URL, as normal from googlebot hits) So...

It's the googleware.

Which, FWIW, I'm fine with. The opera splash was pretty explicit about sending info to Google, and it's pretty clear they have. I don't otherwise use Opera, so it's not like any other SuperSecretInfo is leaking out. (Not that I have any, of course, and claims to the contrary are lies. Lies, I tell you, lies!)

OTOH, it might be worth noting if you do have SuperSecretInfo and use the free version of Opera, which would potentially be Somewhat Unwise.

Posted by Dan at 02:36 PM | Comments (5) | TrackBack

LL3 is over

And it went pretty well, all things considered. Most of the talks were interesting, and we had fewer clunkers than we had in the past, though what counts as a clunker's always pretty subjective. We had robots, too, which is darned cool, though I'm somewhat biased there. (And the poster session on PIC chips was interesting. You can get a fully programmable 8-bit microcontroller with 64K of RAM for $11. (This would be the spot to cue the "Imagine a beowulf cluster..." splashblot posts) You can do a lot with 64K of RAM on an 8-bit system, and at that price you can gang together a half-dozen affordably)

Unfortunately my talk was, well... sloppy. Kim was fairly kind in her assessment--the talk desperately needed a good edit and runthrough, and I should've had more concrete examples in it. I was trying not to pull Parrot into it and be more general, and I think that was a mistake.

Nah, that's not true. I know it was a mistake. Bleah. Next year maybe.

Posted by Dan at 11:06 AM | Comments (2) | TrackBack

November 10, 2003

Google spyware

I just had an interesting experience, which I'm not 100% thrilled with.

Over on IRC, I made a casual reference to a URL. http://www.sidhe.org/backgrounds/gif_for_folks_who_link_unasked.gif specifically. It's a GIF I swap in for the real image when I find folks linking to images in the background collection. (For the record, it's the background image from the nifty.org archive of gay/lesbian/bi fiction (though work-safe) and has the twin advantages of being very light (so it tends to obscure white text) and subtly subversive, while still being tasteful)

It's not that I care that people look at the backgrounds, nor that they grab them for their own use. I found them ages ago in a free collection of web graphics and threw them up there myself for my own use, and I'm certainly not going to complain when other folks grab them and use them--go for it and good luck. I'd just rather you use the copy on your server, not mine. :)

Anyway, I made an offhand comment on IRC with the URL for the real image (I hardlink it on the server, rather than playing Apache redirect games) and three minutes later... Pow! There was googlebot, looking at it. Turns out that one of the folks on the channel'd looked at it, and they run Opera with the google stuff on it so presumably that's how google got the URL. I will admit to being very impressed with the speed that the crawler struck out with (and it was the crawler, at least according to the log data and the PTR record for the IP address) but still...

Makes me wonder how much interesting info Google;s got indexed that's on public webservers but not linked to, because someone's doing work with a browser that's got the google plugins. Yet another argument for not putting private data on public webservers, no matter how obscure the URL is. (As if we really needed another)

Also makes me wonder how long before Google starts snagging the contents as seen in your browser in case the pages it tries to crawl aren't accessible...

Posted by Dan at 01:07 PM | Comments (14) | TrackBack

November 06, 2003

Rising blog-spam

While it's sad, and more than a little pathetic, blog spam, like all the other sorts of spam, seems to be on the rise. I've been getting more and more comments posted that aren't anything more than links to some pill site or other. Having some of the antispam MT plugins helps,but still, there's a bunch I need to go hand-delete. (After which I generally IP-ban the poster, which has worked as well to cut down on the spam, though I worry about the collateral damage)

It's really sad, though. Yeah, it means more work for me, and more maintenance, which is annoying, but all this blog-spam seems to be manually entered. (Heck, someone went to the trouble of expressing sympathy for my dog, though I'm pretty sure Random doesn't really need a link to a rape porn site) You've really gotta wonder, how many of these links do you need to put in to make a buck, and is it really worth it?

Posted by Dan at 05:03 PM | Comments (1) | TrackBack

November 05, 2003

Da Matrix

Well.

I just got a chance to see The Matrix Revolutions.

I don't want to spoil it for you, so all I can say is...

You know that weird SF geek friend of yours? The one that bought the complete set of Star Trek movies in the box set and not only unwrapped, but actually watched Star Trek V?

When it comes out on DVD and he buys it, borrow it. That way when you go to clean your apartment and need some background noise, you'll have something handy.

Pretty good Wrath of God music, though. Gotta say that for it. (But a CD of Wagner will cost you less than a ticket, and it lasts longer)

Posted by Dan at 05:17 PM | Comments (4) | TrackBack