March 31, 2003

Continuations and VMs

Part 3 (well, 3a, since I haven't gotten to the VM bits) in a series of "Why the JVM and .NET have issues with some classes of languages" posts. This one's about continuations.

Now, before I go any further, do note that what this bit talks about does not affect python, perl 5, or a host of other languages. They don't have continuations, which is fine. It does affect Perl 6, Ruby, Scheme, Lisp, and a few others, though I don't know that anyone's going to be porting unlambda to .NET. But, then, I didn't expect a befunge or BrainF*** interpreter for Parrot, either. Nor Threaded INTERCAL, for that matter.

Anyway, a continuation is essentially a closure that, in addition to closing over the lexical environment, also closes over the control chain. (Well, OK, control stack, but if you're doing continuations odds are it's a singly linked list rather than a real stack) CS texts generally go on about it continuations being "the rest of the program" or "gotos with arguments" or suchlike stuff. If those float your boat, great--they never made sense to me.

I doubt that a single phrase is suitable description--if it was far more people would understand the things--so it's time for some text.

Assume, for a moment, that we have a language, like Perl (or C, for that matter) that has lexical variables. Each block has some variables attached to it, and (unlike C in practice, though not in theory) those variables are each stored in a separate data structure somewhere. A scratchpad, if you like. Those pads are linked together, so an inner block's pad has a link to the immediate containing block's pad, and so on. In this example:

{
  my $foo;
  {
    my $bar;
    {
      my $baz;
      return sub { eval $_[0] };
    }
  }
}

three blocks, three pads, and each block's pad is linked to the containing block's pad. So the pad with $baz in it has a link to the pad with $bar in it, and the pad with $bar in it links to the pad with $foo in it. Got that? Good. If we made a closure inside the inner block (like the return does--returning an evil closure but that's neither here nor there) then that returned closure object has a link to the innermost pad, and through the link chains all the pads out to the top level of pads. When we call into the closure, it has access to all the lexicals that were in scope when the closure was created.

Got that? Closures are subroutines that capture their lexical scopes. When you call them they (temporarily) restore their lexical scopes.

Now, think about how control flow works in a program. Whenever something "controlish" happens--you make a function call, an exception handler is established, a new lexical scope is put in place--a marker for that event is put on the control chain. When you leave the scope of the item on the chain you remove it. (And removing it might have some action associated with it--when the runtime removes the element for a function call it fetches the return address out of it)

So, we have this control chain. It's a singly linked list (conceptually) of control elements. We can capture it any time we want, if we've got a mechanism to make sure what we've captured doesn't get altered. (Making the chain elements copy-on-write, or just cloning them all off, both work) We also have a closure--a sub that has a chain of lexical scratchpads. And if you take that one step further, we can divorce the lexical pads for the closure from the closure itself, leaving us a chain of control and a chain of lexical variable scopes.

Now...

Imagine what would happen if we bound together that chain of lexical variables, the chain of control, and an address into one thingie, such that when we invoked that thingie we'd jump to the saved address, put in place the associated control chain (discarding the current chain), and put in place the lexicals?

Well... we'd call the thingie a continuation. And that's what it is.

Now, in practice continuations aren't made of a random lexical chain, control chain, and address. When you make a continuation you capture the current control chain and lexical chain, and generally the current or next instruction in the instruction stream, and they're often bound together with special function calls ("call-with-current-continuation") but they don't have to be.

One of the cool thing with continuations is that, since they're supersets of closures, the variables they capture keep their values from invocation to invocation of a continuation, just like variable values persist across multiple invocations of a closure. And like multiple closures that close over the same scope, multiple continuations that share scope will see variable changes that other continuations that capture the same scope make.

And, just to top it off, someone (no, I don't know who, so this might be CS Legend) once proved that you can build any control structure you can think of with continuations. Though often you shouldn't, but that's a separate thing. Exception handlers are conceptually a form of continuation, for example. When the handler is established a continuation is taken, and when an exception is thrown you just invoke the exception handler continuationa nd *poof!* you're at the spot in the code that represents the exception handler, complete with its lexical scope.

There's a lot of other evil stuff you can do, too--there's no reason, strictly speaking, that the destination you jump to when invoking a continuation has to be anywhere near where you were in the program when the lexical and control chains were captured (and they don't have to be related either). Granted, doing Weird Evil Magic like this is a lot of work, but hey, if it was easy it wouldn't be fun. :)

Posted by Dan at 01:55 PM | Comments (6) | TrackBack

March 28, 2003

Or not...

Getting the spec and searching it shows no references to closure, other than transitive closures of classes. If it does them, this is new and hasn't hit the ECMA spec. (Or they're calling it something very different, in which case, if someone does know of it and can fill me in...)

Posted by Dan at 03:31 PM | Comments (7) | TrackBack

C# may do closures now

Joys of scanning for referrers in the weblog. Following through from here leads to an indication that C# does do closures in some way, and through it to another link that indicates that they're looking to add them to .NET. I'm not sure if C# does 'em now (there was no indication in any of my .NET or C# books that I found, but I might've missed it) but it looks like they're being added.

I suppose, but it looks like the .NET folks are going to be trading speed for kitchen-sinkery.

Their performance, I suppose. As long as they're not laboring under the illusion that Moore's Law actually holds.

Posted by Dan at 03:27 PM | Comments (2) | TrackBack

Important safety tip!

Never delete the default network route when ssh'd in from a remote machine, unless said remote machine is on the same subnet.

D'oh!

Posted by Dan at 01:35 PM | Comments (0) | TrackBack

March 27, 2003

(closures) The reason for Parrot, part 2

Okay, the previous entry wasn't titled part 1, but...

Anyway, I promised Jon Udell more of an explanation as to why perl's not a good fit for .NET and the JVM. This applies to Python and Ruby too, in their own way, though I'm not going to speak directly for them.

This time around, let's talk about closures and their relationship to lexical variables.

A closure is reasonably simple, at least in perl terms. It's a subroutine that captures its environment. Generally they're anonymous, and generated at runtime. A sample might look like:

  sub make_closure {
    my $foo = 0;
    return sub { return ++$foo; }
  }

This sub is desperately simple, and all it does is return a reference to an anonymous subroutine that will return a monotonically increasing integer, one per invocation of the subroutine. The anonymous sub is a closure because it closes over its environment, capturing what was in scope at its creation and remembering. Well, OK, arguably that's not desperately simple, so here's how it acts. If you had code that looks like:

  $sub_ref = make_closure();
  print $sub_ref->(), "\n";
  print $sub_ref->(), "\n";
  print $sub_ref->(), "\n";
  print $sub_ref->(), "\n";

It would print out the sequence 1, 2, 3, 4, each on its own line.

How it works is pretty simple. The first line calls make_closure, which returns a reference to an anonymous subroutine which increments and returns the value of the $foo in scope when the anonymous sub was created. The next four lines call it and print the results.

That's not the interesting part. The interesting part is code that looks like:

  $sub_ref = make_closure();
  $another_ref = make_closure();
  print $sub_ref->(), "\n";
  print $sub_ref->(), "\n";
  print $another_ref->(), "\n";
  print $another_ref->(), "\n";

because this prints out 1, 2, 1, 2. The reason this happens is that when make_closure is called, it instantiates a fresh version of $foo each time. Not too surprising, as this is what you'd expect from a sub--it instantiates the lexicals that belong inside it. The sub { $foo++ } part returns a reference to an anonymous subroutine, which for perl is inherently a closure, that accesses $foo. $foo is a lexical declared inside of make_closure, so since it's created anew each time make_closure is invoked, each new anonymous sub references a new version of $foo. There' s no collision there.

How does this complicate, and slow down, the system?

Well, remember how variable allocation works in non-closure languages, such as C. Ponder the following C function:

  *int foo() {
    int bar = 1;
    return &bar;
  }

What this function does is allocate a variable bar, give it a value of 1, and return a pointer to it. If you've ever done this in C, or languages that allocate lexicals like C does, you know what happens--the variable whose address this returns will soon turn to mush. Why? Because of the way C handles space allocation for lexical variables.

What C does is to use a stack, generally the system stack, to store lexicals. When you enter a C function, the function preamble reserves space for all its variables by adjusting the top of stack pointer a bit. The lexical variables are all in that chunk of stack that was just reserved. When the function exits, the stack pointer is readjusted and the stack space is freed up for the next function call.

Obviously, for closures to work, it means that lexical variables, what most folks think of as stack-allocated variables, can't be allocated on the stack. Or if they are, it means that the stack has to be a series of call frames, with each call allocating a new frame. Otherwise the variables the closure captures would just keep getting stomped on.

This slows down function calls in languages that support closures. In a non-closure language, all you have to do to reserve space is twiddle the stack pointer. Often there's not even a check to see if there's enough space on the stack--if too much space is reserved you fall off the end of the stack into unmapped memory, your program segfaults, and that's that. Leaving the function means you just decrement the stack pointer and that's that. (In a language with variables that have active destructors, like C++, destructors may get called)

For a language with closures, calling a function requires actively allocating a chunk of memory from the heap. While not horribly expensive, it does cost a lot more than just incrementing the stack pointer. Feeing up the frame for the lexicals also has to use the heap freeing system, which is also more expensive than just decrementing the stack pointer. A language with closures makes a garbage collection obligatory as well, though since the JVM and .NET already do this at least that's no extra expense.

Is this cost huge? No. But it does make function calls more expensive, which can be felt if you're using a system with a lot of small leaf functions, a common scheme in OO programming.

Closures are terribly useful, but there is that cost. If your language uses them, then of course the cost is justified, since it gets you the feature (closures) that you want. If you're working in a language that doesn't do closures the cost is useless overhead--you pay it because you must pay it (there's no good way to optimize the cost away) but you get no benefit from it.

That's why it's silly for the JVM or .NET to add support for closures. Java doesn't do closures, and none of the current .NET languages do closures. Wedge them into the engine and all of a sudden all the programs running on those VMs suddenly run slower, but get no benefit from it. Could they add that feature in? Sure. But why? To run perl or ruby? Is that really their target audience, and do they (Sun and/or Microsoft) really get any benefit from it?

Personally, I think not and not, respectively.

Posted by Dan at 11:09 AM | Comments (13) | TrackBack

March 25, 2003

(Perl|python|Ruby) on (.NET|JVM)

InfoWorld has an article on scripting languages, and Jon Udell has an entry in his blog about it. The main bit in there is the postulation that the big reason we're going with Parrot rather than using the JVM or .NET is a cultural choice, rather than a technical one. The rather flip answer in the Parrot FAQ (which I wrote--it's a good bet that any of the flip answers in there are mine) doesn't really explain things, so it's time I sat down and did so. Then I should go update the FAQ. (There's a sidebar about it in the April Linux Magazine, but it's not on their website yet, and doesn't really go into details anyway)

The easy answer for why we're not using .NET is that it wasn't out when we started the design, at least not such that we knew anything about it. IIRC, and I may not, it hit public beta in summer 2000. Regardless, I didn't know about it until 2001 sometime. .NET has major portability issues as far as we're concerned, since we have to run on any of a half-zillion platforms, and .NET is windows only. Mono makes that somewhat better, but still... got Mono for a Cray system, a VMS system, or a Palm? Probably not. I certainly don't.

Regardless of its portability issues, .NET has the same fundamental problems as the JVM does for our purposes, that is to run Perl. (Both perl 5 and perl 6) That's what I want to address.

First things first--both the JVM and .NET are perfectly capable of being target machines. They're fully turing complete, so it's not an issue of capability. But, like the Infocom Z machine, which is also turing complete, the issue is one of speed.

Perl 5 has two big features that make using the JVM or .NET problematic--closures and polymorphic scalars. Perl 6 adds a third (which Ruby shares) in continuations, and a fourth (which Ruby doesn't) of co-routines. (Though arguably once you've got continuations, everything else is just a special case) Python has similar issues, though I'm not the guy to be making statements about Python, generally.

To do closures means capturing and maintaining persistent lexical state. Neither .NET nor the JVM have support for this, as they use a simpler stack-based allocation of lexical variables. To handle lexicals the way perl needs them means we'd have to basically ignore the system variable allocation system and do it ourselves.

The same goes for the polymorphic "It's a string! No, an integer! No, an object reference! No, wait, a filehandle!" scalar that perl has. They're really useful, and a mostly-typeless (Perl is strongly typed, it just has very few types) language makes some things quite easy. To do that with .NET or the JVM would require a custom type capable of doing what perl needs.

So, to make perl work means completely rewriting the system allocation scheme, and using our own custom polymorphic object type. In JVM/.NET bytecode. Doable? Sure. Fast? No way in hell.

And continuations. Yow. To do continuations is non-trivial, and I don't think it's possible to do in the JVM or .NET without treating them as glorified CPUs and use none of their control and stack features. We'd essentially write all the functionality of the interpreter and target the JVM the way we do now with C and hardware CPUs, including complete stack management, completely ignoring any features at all of the VMs. I don't want to think about how slow that would go. It's bad enough doing it all in twisted, insane C targeting real hardware. Another layer of indirection would kill us dead.

With a custom interpreter, we can write the code to support perl's features, and have them run as fast as we can manage. Will it necessarily be as fast as, say, C# code targeting .NET? Probably not. (Though that'd be really cool... :) The required functionality we have forces a certain amount of overhead on us, and there's just no way around it.. Give me a budget of $30M a year, three or four years, and plenty of office space and maybe we could change that, but until then...

The other question is "could .NET or the JVM change to support features perl needs?" In this case mostly closures and continuations, which are the biggies. (weakly, runtime typed variables are less of a problem, though still a problem) The answer is yes, but they'd be stupid to do so. As I said, those features have unavoidable overhead. Running perl faster at the cost of running C# slower is not, at least in my estimation, a good tradeoff.

All features have costs associated with them, and nothing is free. You design your feature set, then the software to run it, and it's all a huge mass of tradeoffs. This feature lets you do something, but has that cost. Wanting something to be fast means something else is very slow, or effectively impossible, and sometimes two features are mostly incompatible. You make your list, make your choices, and do what you can.

That's engineering, and there ain't no such thing as a free lunch. Neither is ther eany such thing as a language-neutral VM. Or real M, for that matter. Anyone who tells you different is either lying, a fool, or ignorant. Real hardware doesn't like closures and continuations--VMs that don't do closures and continuations running on otp of hardware that doesn't like them is not a recipe for speed.

Posted by Dan at 06:08 PM | Comments (11) | TrackBack

March 24, 2003

Done! Finally!

Finished my chapter for Allison's "Perl 6 Essentials" book, and about damn time. 23 pages, and desperately in need of a good edit.

Now all I need to do is finish all the chapters for my own book before I die. (Either from natural causes or at the hands of my ever-patient editor...)

Posted by Dan at 04:12 PM | Comments (2) | TrackBack

Blog propagation and death to polling

I've been thinking on and off about propagating blog update notifications, in an attempt to get some solution that sucks far less than the current "poll the list of things you're interested in" scheme. Granted, shouting across string-connected tin cans would be better than that, but I was shooting for something a touch more sophisticated than that. I think I've found one. And, even better, there's a pretty good template to build the system design around.

Usenet News.

More specifically, Usenet News served by INN and other stream-capable servers.

Now, I'm not proposing that blogs get served over NNTP, though that's actually pretty darned close to what's needed, but think about it. What we really need is twofold. First, there needs to be a way to get pinged when things change. And second there needs to be a way for clients to get notification of all the blogs of interest that have changed without having to go and ping each blog in turn. NNTP handles both scenarios just fine.

Most people haven't run news servers, or have at best run leaf servers that don't have any special connection to their upstream servers, but I have (albeit a small one) There are a number of nice features in the system that could easily be pulled into the core feature set, things that are both simple and reasonably scalable, which'd allow for a distributed system of a size sufficient to manage what we're seeing now. Things like known subscribers with cached content change notifications, so a reconnect gets a blitz of queued change notices (which in a blog scheme doesn't even require separate spool directories if there's a server subscription list)

Plus, of course, usenet's show us all sorts of ways to not do things, which is often as important as showing what to do.

I think it's doable. Heck, I think an updated NNTP (BTP, anyone? :) scheme could work for distribution of blog content, as well as change notification, but that strikes me as a Phase 2 sort of thing.

Maybe I'll bodge something together at some point. Shouldn't actually be all that tough, other than the shortness of bandwidth I have here at home if it actually works and starts taking off...

Posted by Dan at 01:48 PM | Comments (7) | TrackBack

There's really only one thing to do

At least when you've been mugged by Girl Scouts with leftover boxes of Girl Scout cookies.

And that's make ice cream. Mmmmm, ice cream!

In this case, artery-clogging mint ice cream, made with frozen crushed thin mint cookies, heavy cream, and whole milk. Creamy and delicious. Plus I finally figured out how to make the ice cream right. (The trick is to get the mix as close to freezing as possible before adding it, and then just leaving the machine alone)

Posted by Dan at 10:46 AM | Comments (0) | TrackBack

March 21, 2003

Time for do-it-myself internet

When I'm not in the office, I find I spend most of my time at the little coffee shop down the street from my apartment. It's a nice little place, locally owned, and they've got a nicer variety of coffees than Starbucks does. They've also heard of roasts other than french (sorry--is that "freedom roast coffee" now?). I like the flavor of coffee, and it's nice to get some that isn't brewed from burned beans. The one thing that it doesn't have that Starbucks does is wireless internet.

At the moment, that's no big deal, since I don't use the wireless internet at Starbucks (It's from T-mobile) but I'm looking to start, as I'm spending a good piece of my time working from home, and it's less distracting than my office at home. So that means wireless internet would be good.

Interestingly, SBC business DSL, for a reasonably slow feed (384K down, 128K up) runs $34.99 a month. T-Mobile's wireless costs $39.95 a month for month-to-month, or $29.95/mo for a year commitment.

Are you thinking what I'm thinking, Pinky?

An Airport base station, which does NAT, is only $199. Methinks it's time to chat with the owners of said coffee shop...

Posted by Dan at 04:18 PM | Comments (0) | TrackBack

March 18, 2003

Americans do understand irony (part 2)

The US, as part of our "blow the heck out of Iraq" policy, is pumping billions of dollars into the plan, including multiple billions into Saudi Arabia, where we have lots of ground troops. Many highly-placed Saudi businessmen are making huge bucketloads of cash from this.

You remember the Saudis. The ones who fund Al-Quaeda and fly planes into large buildings.

We're invading Iraq to do what for world terrorism, again? I forget.

Posted by Dan at 07:30 PM | Comments (0) | TrackBack

So the world must be coming to an end

Because I now have ICQ and AIM accounts.

Or maybe it's just the damn fever. Bleah.

Update: my AIM username is DanSugalski, ICQ # of 271718977 (though I'm almost never running an ICQ client) I do have a Jabber client, but I don't know of any servers to connect to, nor of any reason to do so, so there's no account there at the moment.

ICQ, AIM, and IRC. Not much left...

Posted by Dan at 11:04 AM | Comments (4) | TrackBack

March 17, 2003

Cost of the war

Well, Shrub's going to announce shortly that we're going to blow the fuck out of Iraq. Yeah, hussein could leave, but he's not left yet, and I'm not picturing that happening. Everyone' s on the news prattling on about the "cost of the war". But nobody's actually running the numbers.

We've got an enourmous deficit in the US right now. Any money spent on the war is going to be borrowed, and current thinking is that the pentagon'll spring an $80B bill for the big kaboom. That's a lot of money.

What's really a lot of money is what we're ultimately going to pay. The current 30 year T-bill rate, according to Bloomberg, is 5.375. To a first approximation, a 30 year repay cycle at that rate costs about twice the original borrowing.

So, in addition to all the US soldiers we're going to cripple or permanently disable (serious casualty rates for the last kaboom in the area ranged between 40K and 80K people) , and all the lost productivity from the national guard units mustered up (and I really don't understand that one--is Iraq the 51st state or something?), and the terrorism losses (this will end up spawning terrorism on US soil, something that until now Iraq hasn't actually been involved with), we're going to drop at least 160 billion dollars.

Wheee.

The Shrub is definitely making a splash with this one. He will be remembered for it, though I take no bets as to how.

There are days when I wonder if the Shrub's an Apocalyptic...

Posted by Dan at 07:57 PM | Comments (2) | TrackBack

March 16, 2003

Heard on CNN

"40% of people think the risk will be higher if we invade Iraq. 50% think the risk will be lower if we invade Iraq."

Unfortunately, 90% of the people polled had no clue what the fsck they were talking about (and admit it, neither do you), and thus the numbers were entirely meaningless.

Yay America. Through the magic of polling, we have definitive answers on how we feel about a whole wide range of things for which we are entirely unqualified to make meaningful judgments on.

Posted by Dan at 09:09 PM | Comments (0) | TrackBack

Perl in Linux Magazine

Just got my comp copy of the april issue of Linux Magazine. There's a big article on Perl 6 from Damian, and one on Parrot by me. (Well, by Martin and me, really. I gave him an article about twice the size of what's in the magazine, and he viciously and brutally edited it down to something that makes me look literate. Yay, Martin! :)

Dunno when it'll hit the stands, as the march issue got there a week or two ago, but it shouldn't be too long. I expect there to be rather a lot of... interesting mail from it. We'll see.

And yes, the irony of a major article on Parrot appearing in the April issue of a magazine is not lost on me...

Posted by Dan at 02:24 PM | Comments (0) | TrackBack

March 15, 2003

Polling sucks

Actually, I'd go further than that--a system that polls has something about it that's fundamentally broken.

Now, occasionally (very occasionally) that fundamental brokenness is the point of the system. Operations management tools that track the health of remote machines and devices work like this, since you generally can't count on a machine to let you know when it's died unexpectedly. That's a rare and very specific case. Some hardware devices require polling, but they're generally broken as well, often by design. Polling as a means of hardware interfacing is prone to data loss, and is usually done for cost, complexity, or competence reasons. (I.e. it costs too much, would be too complex, or someone's not competent enough to do it properly)

For software, though, if you're polling, you're busted.

What prompts this is a quick scan through my webserver logs. I've gotten into a number of people's aggregator systems, which is fine, but some of these damn things are checking every 10 or 15 minutes to see if there's anything new, and almost universally there isn't. This is a huge waste of everyone's time, bandwidth, and resources. Yeah, sure, I'm sure it was the easy way to do things, but it's not the right way to do things.

When you design software and you think polling's the right way, what you should really think of is how to use a push or ping method instead, as they're near-universally (modulo the busted hardware case) better. Yeah, I know, RSS feeds are a newish thing, and nobody'd given much thought to them, but maybe it's time to do so. Some of the aggregators, like blo.gs and weblogs.com, take a ping to reread, and that's fine. And it wouldn't be at all tough to set up a subscription system to sign up for blog update pings, or set a central server to subscribe to, or an NNTP-style push system with feeds, or something. Anything's got to be better than what we have now.

Posted by Dan at 05:59 PM | Comments (5) | TrackBack

March 14, 2003

D'oh! Petulance strikes

Okay, so the last entry was a bit petulant. A virtual, albeit small, foot stomping snit.

I shall, for the moment, assume that Guido's got reason to not worry since, as I said earlier, he's not stupid. That means I need to hunker down and get some real benchmarks and timings, and see where things stand. More importantly, see what I need to do to make things faster.

Hate to get beaned with a pie at OSCON. That'd be embarrassing.

Posted by Dan at 07:05 PM | Comments (0) | TrackBack

Now I know how the Lisp folks feel

And it's not a great feeling.

I've an outstanding bet with Guido van Rossum (and that is the right spelling for you non-Python folks. Lowercase van (not von, that's a name from another language) uppercase Rossum) that Parrot can execute pure Python bytecode faster than the Python interpreter can. I put the bet up to show that I was both serious and had some reason to back up some claims I was making about Parrot, as Parrot's not quite in a state to demonstrate the claims yet. While hard numbers are the best way to prove you're right, being willing to publicly embarrass yourself is a way to demonstrate that you're confident you've some backing for what you're saying.

Guido took me up on that bet, which is fine. Details of the challenge get announced at OSCON 2003 in Portland, OR, at the Python Lightning talks, with the actual showdown at OSCON 2004. (Also in Portland, so you have a chance to sample all the microbrews you didn't have time to drink in 2003. And Powell's--one should never forget Powell's) Guido isn't, as you might expect, taking the challenge at all seriously. He's even gone on record as such. (I don't have a link to the python-dev discussion handy, as I'm off-line) That doesn't bother me--I've met Guido a few times, and while I don't know him as such, I like him well enough, and I'm not in the least surprised. It's a very Guido thing, and that's fine. Who knows, maybe this is a Clever Plan to make me think he underestimates me so I underestimate him. Guido isn't, after all, stupid.

Some folks really don't care, and that's cool. There's more to a language engine than raw speed, and raw speed's not as important as other things to many. No problem.

What bugs me is the off-hand dismissal of the challenge, with the stated assumption that Parrot has no chance, and there's no way we can win. Not with any investigation, mind--none of the folks doing the dismissal have actually looked at Parrot, so far as I can tell. On the other hand, this may be a massive Cunning Ploy to lull me into a false sense of security, or annoy me into doing something stupid. (It's the Python Cabal at work again, I'm sure :) If so, they've got the arrogant dismissal thing down pretty well, and I'm prepared to be suitably impressed and embarrassed at OSCON 2004.

Boys and girls, let's get this straight. I'm only going to say this once.

Parrot is an order of magnitude faster than perl 5 doing equivalent things. Without enabling any extraordinary measures.

You know how Python's performance rates against Perl 5.

Do the math.

I now understand why the Lisp folks are so peevish--they get this sort of dismissal constantly, despite the demonstrable strengths of Lisp. (Which, in the spirit of full disclosure, I still find profoundly uncomfortable as a programming language. That, though, is between me and Lisp, and isn't any sort of judgment on the language itself)

Posted by Dan at 03:59 PM | Comments (34) | TrackBack

Safe Queues

One annoying downside to dealing with an interpreter, or other generally 'low-level' infrastructure is the grotty details that you must deal with. IO details, interrupt level code, signals, platform calling conventions.. We don't have the luxury of ignoring them the way that a utility or application might. Today's fun task is to design a generic thread-safe and interrup-safe queue. That is, a queue that you can safely insert into from normal code or a signal handler/interrupt handler/AST, while user code is potentially removing elements.

Some platforms, like VMS, make it easy. Guaranteed-safe queues are provided by the platform and we don't have to do any work at all. They're even fancier than we need for parrot, since our needs are specific. We can guarantee that only user mode code is draining the queue, only user mode and one interrupt mode piece of code can possibly be simultaneously inserting into the queue, and user mode code won't be simultaneously inserting and removing elements from the queue, so we don't have to worry about conflicting AST routines trying to jam into the queue at once, or the different AST, user, and thread routines are inserting and draining simultaneously.

Still.. try this one in generic, ANSI C. You find yourself playing all sorts of evil games with volatile, double-check modifications, and mutexes. Mutexes almost make it easy, as they let you force single-threading in user code, but they aren't the complete solution since signal/interrupt/AST code can't do mutexes. The last thing you want is your interrupt handler to block on a mutex, after all. That's... bad.

Anyway, annoying code, but code that must be done. When I've got it, I'll post or link to it, since it's useful. After I throw it into the public domain, because I hate license issues.

Posted by Dan at 03:30 PM | Comments (0) | TrackBack

March 13, 2003

Apocalypse away!

Well, Apocalypse 6 is out. I'd link to it, but I'm off-line and don't have the URL handy. It's on www.perl.com somewhere, and you can always drill through dev.perl.org for it and all the other perl 6 and parrot documentation.

Reaction to this Apocalypse is a lot better than I'd originally expected. Maybe it's because the thing addresses a number of well-known, long-standing issues that perl has with subroutines, so it's giving people what they really want and they're happy. Maybe everyone who's going to get cranky about language design has left already, as it's just the same old thing all over again. Or maybe people just haven't finished it, as the darned thing masses a good 37+ pages when printed in a really tiny font. :)

Posted by Dan at 03:49 PM | Comments (0) | TrackBack

March 12, 2003

The secret is finding where to draw the line

One of the 'fun' things about trying to build a semi-generic1 object system--heck, the fun thing about almost any system--is knowing where to draw the line between things. What's in the system, what's not in the system, and where do you put the things that are in the system? Figuring out those three things is important, and once you've got them most of the hard work is done.

Deciding what to leave out is definitely separate from deciding what to put into a system. It may seem like they're essentially the same, but they're not. For example, we've decided specifically to include classes in the object model, but we've chosen specifically to not include classless objects in the object model. (Though, honestly, you can still do it if you want) A difference? I think so.

Of course, it's important to decide what goes into a system. That almost goes without saying, though some people's judgements are different than others as to what goes in. Putting a feature in is a conscious decision.

Leaving things out, though, can be done by accident. What's not in a system can be a matter of omission, encompassing all the things you didn't think to include. I'd argue that this is a bad thing. Sure, with any large system there's a huge number of things you can choose to do or not do, and sometimes you have to prune by category, but it's important that you put thought into what you leave out.

When you exclude things on purpose, it means you've put at least a little thought into what's getting excluded. Might not be enough, but at least it's something--the exclusion is purposeful. The feature has been considered and discarded. When you exclude by accident, because you didn't think about it, you don't know for sure whether the feature would've been good or not. You might've left it out anyway, but you don't know--you didn't look at it, and that's bad. You can also look at a feature and decide to defer it rather than discard it, and build the design so that it can be added afterward.

Where it really comes out is later, when you're working on a system to add new things, and you find that you can't, because what you're looking to add has already been excluded. Threads often fall into this category, as does multi-character-set or Unicode text processing. It wasn't thought about, and was excluded by default rather than on purpose, and in a way that was bad because you didn't really want to exclude it, just defer it.

1 I'm firmly convinced that a truly generic object system is either completely untenable, or doomed to be so slow without an insane amount of engineering work as to be untenable, given the number of things that get lumped into the group 'object system'

Posted by Dan at 03:26 PM | Comments (0) | TrackBack

March 09, 2003

Objects, objects, objects

I'm in the middle of trying to nail down the semantics of the object systems I want to support in Parrot. What I thought was going to be simple is turning out... not to be.

I'm not a big object fan, and never really have been. I joke about it at times (my initial exposure to objects was less than wonderful), but the religious fervor and "OO is the way to go!" mantras that people seem to have really grate. There's far too much knee-jerk reaction going on. Mine included, of course.

That, oddly enough, isn't actually a problem--it's a help, but it's not enough of a help.

The bigger problem is the definition of what an object and an object system really is. Somewhere on the web there's a list of the characteristics of an OO system, but the joke is that it's a "pick 4" list. And it's true. It's easy to round up a half dozen languages that are all unarguably (or mostly unarguably) do OO style programming, but they all do things differently. While that's not considered a huge problem for most people--it's not often that you'll be inheriting from a C++ class in Smalltalk--it's a problem for me. (And I'd argue that mutually incompatible object systems is a big problem, but that's a rant for another time)

Parrot, you see, is supposed to unify this sort of stuff, at least well enough to allow minimal functionality. Sure, it's not really Larry's primary goal for Perl 6, but Parrot's long-ago moved past that, at least as the driving design goal. And even if we were sticking with just Perl, a Perl 6 style class inheriting from a perl 5 style class (or vice versa. Or worse, from each in turn several times, with MI) is an interesting and potentially non-trivial thing if you want to do more than just give a passing nod at the problem with a "Don't do that" warning. And I hate those warnings.

In addition to both styles of perl objects, we have to throw Python and Ruby into the mix, though luckily they don't expand stuff much. (Python objects are reasonably primitive as objects go, though they're a bit closer to traditional objects than perl 5 does. Ruby's a full-blown OO system, but as a pretty direct descendant of Smalltalk it fits in well with most other OO systems) Plus, of course, C++, .NET, and Java objects, at least in some form or other. I'd also like Objective-C in the mix, as I'm an OS X guy now, but as that's a C/Smalltalk mix it doesn't pose any semantic issues.

I think, though, that I've finally done it, at least well enough to get by. It's taken a while to distill down the necessary semantics, and it's been at least as much work to figure out which semantics we aren't implementing (sorry classless OO languages--you're still on your own), but I've got it boiled down, I think. More to the point, I think I have a good front-end API so that even if things don't work on the back end (inheritance across styles of objects) they will work on the front end, in user programs.

The first goal, as always, has been to be able to transparently use an object of any type from within parrot. That's nailed, for our problem domain, though it's taken a while to realize. The second goal, to transparently use classes, is nailed enough to write the spec. And so, off to write the spec I go.

Posted by Dan at 11:51 AM | Comments (0) | TrackBack

March 07, 2003

Mmmmm... Pizza

There's nothing quite like home made pizza, if you plan ahead a bit. While I don't have the sort of oven you really need to get good pizza (not too many folks have a coal burning or beehive wood oven) I do know how to make a good dough. With a heavy-duty stand mixer, it's trivial--throw in all the ingredients (3C flour, 1C water, .25C oil, 1 t salt, 1 t yeast) let it knead with the bread hook until it's a nice consistency (10 minutes or so), then throw the dough in the refrigerator over night. The cold slow rise gives a really nice texture to the dough, which is really important.

Then pop it into a 425 degree oven with stuff on top, cook until done, then enjoy. Yum!

Posted by Dan at 06:57 PM | Comments (2) | TrackBack

March 06, 2003

Who says Americans don't understand Irony?

Though maybe one of his speechwriters is British. Last week, Bush made a speech chiding the world community for not taking action on Iraq, his current obsession. Not a problem, everyone's pretty much aware of his feelings on the matter. The part that amused me was the analogies in the speech--he warned the world community not to act as it did in 1939 with Hitler. (This is, mind you, a president that gets incensed at being compared to Hitler)

For those folks who are fuzzy on their WWII history, in 1939 the democratically elected president of a major industrial world power whipped his populace into a patriotic and mildly paranoid frenzy and invaded a weaker country that wasn't actually a threat.

Ah, I love the smell of irony in the morning...

Posted by Dan at 10:15 AM | Comments (3) | TrackBack

March 03, 2003

Back again

I just resubscribed to perl6-language again. (Yes, this does mean an apocalypse is pending release) Hopefully I don't regret it as much as I did the last time...
Posted by Dan at 06:02 PM | Comments (0) | TrackBack

More holes

Some days I loathe computers.

Apparently sendmail has a vulnerability such that a carefully crafted mail message can root a machine. Joy. Updates available, and installed, but still...

Posted by Dan at 03:56 PM | Comments (2) | TrackBack

March 02, 2003

Creaky old website

I really need to get around to revamping the website here. It's always been something of a catchall server, where I threw things up in pieces and chunks as I had something interesting, or had stuff I wanted accessible to someone or other. Nothing fancy, just serviceable. There's the VMS perl stuff, Jane's old gallery (since moved to its own domain, also living on this server), various icons, buttons, backgrounds, mailing list archives, and whatnot. (And no, I'm not linking to them, they're mostly crap) I've even got a mildly up to date set of home pages for myself, though I should dig out the CSS book and get them less ugly.

Still, there's nothing coherent, and it's all so ad hoc. Maybe it's time to work out some sort of general look, bodge out some CSS, and start updating things so they don't look so early web.

Posted by Dan at 04:27 PM | Comments (0) | TrackBack

Snappy new log toys

Like a lot of other folks, I run Analog as my web logfile analyzer. I don't do anything fancy with it, just rip it through the logs with the proxy stuff stripped out. (Yeah, I should use Squid or something, but I don't. Apache's mod_proxy, locked down hard by IP, is good enough to not bother changing)

And, being the stats junkie I am, I run it with DNS lookup turned on. On the past few years worth of logs. Why? Well.. I have the disk space, and the server's not that busy, so why not? (Though starting a blog has upped the hit rate. OTOH, Amanda's Buffy Review Page still gets a lot more hits than I do, which puts it all in perspective :)

The one big why not is the length of time it takes to build up the report, since there's a lot of data in the log files. More to the point, there are a lot of IPs in the log files and the reverse lookups take ages (like hours), even with caching. Plus the cache file's crept up to ~30M.

Anyway, I just threw in an upgrade to Analog, since I was running 4.16, which predates all the "throw malicious crap in the referrer" hacks and it was time to upgrade. In doing so, I found a nifty little utility, DNSTran, which does the reverse lookups for you, potentially translating and compressing the log files in the process. While I ought to do that, I tried it just in the 'build Analog DNS cache' mode and... wow. This thing screams. What used to take Analog up to 10 hours to do took this thing somewhere on the order of 10 minutes, and that's with a brand new cache file. (And spending a lot of time waiting for the straggler lookups to finish on each of the 30 or so log files--I expect it'd be much faster end to end if I was working on a single source file) Sweet! Plus it got named to peg the server CPU, something I've never seen before. Guess my creaky hardware's not that happy doing 50-80 lookups a second, but that's fine.

Running Analog on the resulting prebuilt DNS cache took all of two minutes, 44 seconds. This is down from ~15-20 hours if it was starting from scratch. Yow!

Posted by Dan at 03:08 PM | Comments (0) | TrackBack

Cognitive Dissonance and economic policy

I'm no economist, but even I can see that some of the economic theories and plans getting thrown around by the various pundits and administration advisers are just full of crap.

The biggest fact that is presented in any report on the economy is that two thirds of the economy, at least the US economy, is driven by consumer spending. That's fine, and not too much of a surprise, since there are a lot of consumers, and everyone needs to eat. But, given that one statistic, the rest of the policies just make no sense.

For example, the push to shift the tax burden from the current income tax to more of a consumption tax that rewards savings and investment. Huh? If 65%+ of the economy is driven by people spending stuff, a shift to consumption taxation is stupid, since all it'll do is slow down consumption. If you want the economy larger, money needs to flow faster, and that just won't happen if there's less money flowing. Duh.

Then there's the dismantling of taxes paid by folks with vast amounts of cash, including the inheritance and dividend taxes, not to mention lowering of the income taxes in the top bracket. The point of these taxes is, in part, to prevent the buildup of capital in non-circulating places, which is generally what wealth is--capital that's not circulating. Not universally true, certainly, but commonly true, and in many cases much of the remaining wealth is used in a way to siphon money out of the economy. (In the form of rents, leases, and interest charged on loaned money)

And lets not mention the policies that encourage moving the production jobs--those jobs that actually make something, or do something useful--out of the US economy. Tell me again--how's the economy, the one driven by consumer spending, supposed to grow if there are fewer consumers with cash?

Methinks that its been so long since the folks in power have actually done anything useful that they've forgotten how things work. Or perhaps everyone's just working on moving back to a more feudal system, which works just fine (for some versions of fine) regardless of how poorly off most of the population is.

Posted by Dan at 10:29 AM | Comments (0) | TrackBack