February 28, 2003

Systems, second systems, and system failure

I was going to go rant on a bit about all the folks who think perl 6 and/or parrot is going to fail miserably, but I think I won't. Most of them don't know what they're talking about, and many of the rest are obnoxiously snide just because, well, they're obnoxiously snide. At this point those folks can just go jump, I don't much care. (And for all the snide python folks, well, I shall have to make sure I have a nice, hires image of Guido getting pied when it happens. And it will)

Instead I think I'll go on about one of the few points people do have about the plan as it stands--the complete throwing out and rewriting of the perl 5 core. The big thing thrown at us is "Rewrites always fail" and that we should refactor parts of the core instead. Well, we're not doing that for a number of reasons, and while I ought to explain them I'll do it elsewhere.

Instead, I want to tackle the "complete rewrites always fail" meme that seems to be out there, because it's just not true.

The problem is that most large software projects fail. Period. Start a big project--heck, any project--and odds are you'll never finish. The larger the project the less likely success is. It's not an intrinsic property of large projects so much as an intrinsic property of the way we manage large projects, but things in rewrite tend to be big, so it crops up. The odds of a big project dodging the bullet once are poor, and twice is really poor, though a rewrite has a slightly better chance of getting pulled off since the people involved did it once. If you figure the chance of a large project succeeding are (1/X), the chances of a large rewritten project succeeding are (1/X2) more or less. Since rewrites are often larger undertakings than the original the odds are a touch worse that a project will survive two full development rounds, though that may be offset a bit by the fact that the organizers probably have at least a bit of the skills needed to get a large project to completion.

Sometimes the "Second System Syndrome" stuff gets thrown in the mix, and this is a valid thing to be worried about. Fred Brooks talks about it in The Mythical Man-Month, a great book about large project management. (One of a number of project management and project design books I have that I sometimes pull out and read) Basically the second system someone designs is the one likely to fail through overreaching, as you try to do all the stuff you couldn't in the first system but haven't yet failed at so you don't know they're infeasable.

If you realize all this, though--that the Second System Syndrome is real, and that project management that avoids the pitfalls gets you a higher chance of success--a rewrite is often in a position to succeed better than a brand new project. You've got people familiar with the original code base and functionality, and you know much better what you want to do and not do with a design, both of which can help doing the rewrite.

More than anything else, especially with a volunteer project, you need to sit on the folks who are bitten by (or are perpetually reveling in) Second System problems, but that just means that your lead designers and project managers have to keep things from getting out of control.

Writing a large system a second time isn't easy, by any means, but doing it a second time is far from the kiss of death that you'd think it was.

Posted by Dan at 06:25 PM | Comments (0) | TrackBack

February 23, 2003

Digging in to the upgrade hell

Well, I just went to try and install SpamAssassin 2.5. No Joy. I'm pretty sure (though not 100% positive) it's unhappy with the stock 5.005_03 install of perl I have. Yeah, I know it's old, but it works OK. More to the point, other stuff depends on it, and upgrading to a newer version is going to be interesting. I think I've only got to upgrade mod_perl, but I'm not sure, and I'm definitely not sure what background stuff might go bang.

OTOH, this version of Redhat is going out of support soon, and is so far behind the times that I suppose I might as well give up on the idea of going with RPMs and just handle everything standalone. Or completely rev the server, but that's... unappealing.

Posted by Dan at 03:59 PM | Comments (2) | TrackBack

A. Nony Moose, at your service

So, I went ahead and put in the trackback bits to the templates. Clever things, those templates. Anyway, I now know that nobody's actually referred to anything I've said anywhere.

Man, the web is an awful lot like my kids... :)

Posted by Dan at 02:35 PM | Comments (0) | TrackBack

Tracking back

Well, I made my first link to someone else's blog. (Which feels just so weird, somehow, like I just teetered on the edge of the archetypical slippery slope, and am now headed inexorably to wanking self-indulgence) I figure I ought to enable trackback myself in that case since, after all, fair's fair. It's showing me how out of date a lot of my templates and such are, though. I feel like I really ought to go and update them all.
Posted by Dan at 02:27 PM | Comments (1) | TrackBack

Poking under the hood

I've been peeking around under the hood of perl again, which is always an interesting undertaking. It was ultimately prompted by this blog entry that Jeremy Zawodny made that had an off-hand comment about Devel::Size, a module I wrote a while back. (Hanging off my CPAN directory) It calculates the size used by data structures in perl, so you can figure out how much memory your multidimensional nested tied hash widget really uses.

Anyway, the results Jeremy got were less than stellar, which wasn't good, since it mean that either the module had a problem or that something really odd was going on. That prompted a lot of poking about, since I don't really like the idea of having broken modules out there. The reality, as it so often is, was rather more complex.

Part of the problem that Jeremy was seeing was due directly to Devel::Size. When it runs it uses a hash to track structures that it's seen so it doesn't count them multiple times, and so it doesn't get into infinite loops chasing the tails of circular references. Hashes use up great gobs of memory--on the order of 60 bytes per hash entry, not counting the size of the key or value--so if you're looking at a large structure Devel::Size will use up a lot of temporary memory. The docs made minor reference to this, but not nearly strongly enough (since fixed in the 0.55 release).

This is a problem if you go to double-check Devel::Size by looking at the output of ps, because if you do a ps after a Devel::Size check you'll find your memory usage is extraordinary relative to what Devel::Size reported, because your process' memory space has been blown out to handle all the (since freed) temporary storage Devel::Size uses. This is arguably a problem with Devel::Size, since you could easily say it ought not use nearly that much memory, and that's valid, but at the moment beside the point.

Another issue that it raised, as I went poking around to see where the memory was used that I couldn't find, is how the C RTL handles and tracks memory allocation, and how perl gets memory for its structures. When you ask for a chunk of memory from malloc, it has to hand you a naturally aligned chunk, which in the general case means the memory allocation gets padded out to a natural alignment boundary, usually 4 or 8 bytes depending on the CPU. And, since the RTL needs to track the size, generally an extra 4 or 8 byte length field is silently prepended to the chunk of memory you asked for. In the worst case, this means the memory allocation is 15 bytes larger than you asked for. On a 10M allocation that's noise, but on a 17 byte allocation it's a darned sight more than just noise.

And that's part of what's going on with memory usage. Each key has a structure attached to it, and because this structure's variable length, perl has to ask the system for a separate chunk for each one. The structure's only two ints, two bytes, and the string data, so for short keys you can image that there's a lot of overhead. Even on a minimal, no waste key (one that's two characters) you see a 25% overhead on 32-bit machines. Larger keys slowly ramp down the overhead, but a 3 character key has 13 bytes in the struct and another 7 bytes in overhead (padding and the length field) which is more than 33% wastage. Ick.

I'm not sure what, short of completely redoing the key allocation system, can be done for this, at least in perl 5. The key structs really can't come from an arena, as they're variable length, and throwing in a pointer to the key string just bloats out the struct by 4 bytes which isn't much of a win either. Something to ponder for Parrot, I guess.

Posted by Dan at 01:45 PM | Comments (2) | TrackBack

February 13, 2003


Just for chuckles, and to see if I could see the dancing kami (and I couldn't--connection refused) I decided to turn on IPv6 on my laptop, as detailed here. While it worked, alas there was no dancing, since the remote server refused connection. Still, traceroute6 did show that I could get there, which was cool. I think I'll go enable IPv6 on all the macs here. Hopefully it'll stick across reboots. Enabling it on the Linux box, running 2.2.20, might be a bit more interesting, but still... It was nifty to see that it worked. I really ought to go get some V6 addresses for the server and get DNS handing them out, if that works. (I'm still not entirely sure that the IPv6 packets are actually going anywhere, and that I'm not just tunneling over IPv4--I haven't looked into v6 to see how things are supposed to work)
Posted by Dan at 12:10 PM | Comments (1) | TrackBack

February 12, 2003

Another dead end

Well, just heard back from Amazon, where I went for an interview this week. No joy. "We liked you, interview went well, we're not hiring you." Fuck.
Posted by Dan at 04:13 PM | Comments (0) | TrackBack

I really need to put server filters into the mail...

The one downside to going away is the mail backlog when you return. 1025 mail messages, in large part mailing list mail. I really ought to filter the less-interesting mailing lists with server-side rules, rather than client side rules. (Eudora has a nice feature of opening up any mailbox that has new mail filtered into it, which is how I keep track of what pending list mail I need to read)
Posted by Dan at 02:12 PM | Comments (1) | TrackBack

Sleepless after seattle

II'm sitting here in the Seattle airport, waiting to fly back home. It's Seatac to Dulles (Washington, DC) then on to Bradley for a really late arrival. Again. Doesn't seem possible to fly home from the west coast and get in before 10PM. Today I get in (well, will have gotten in) at 10:41. Assuming, of course, that I don't get lucky and bump in DC. Since I've no place in particular to be the next day or so, I don't see any reason to not take a bump if it's offered. I've been flying enough that it seems worth it.

I really need to start signing up and keeping track of frequent flyer miles. I never figured I'd need to, but over the past year I thinkI've racked up a hundred thousand or so air miles. Mostly on Southwest, so it doesn't actually count, but it's stupid to not track things when I can. (Well, I've signed up for American's program for the flight last year to europe, but...)

Travel is just extraordinarily tiring. When I'm on the road I feel constantly tired and dodgy, which I really hate. I can manage to keep going for parts of the day if I'm on stage somewhere (in Seattle it was for the Amazon interview, in Sebastapol it was the perl 6 design meet) but that makes the crash afterward even worse. Jet lag, I guess. Or maybe I'm just really out of shape. I better start exercising, since I've at least one more big trip this year--OSCON--and maybe more if things work out for YAPC::EU.

One thing I always forget is that travelling is, in itself, a lot of work. You'd not think so--heck, most of it is just sitting around, in cars, cabs, busses, planes, and hotel rooms. It is, though, and it means I end up not getting nearly as much done as I figured I would, since I discount the amount of work it takes to just travel. This time out, for example, I figured I'd finish the menus chapter for the cocoa book, write the rest of the parrot internals chapter for Allison's book, and catch up on my perl6-internals e-mail. Or... not, as it turned out.

There's also always this hope that the flight will have laptop power, and there'll be enough room to actually use the laptop. Unfortunately that almost never happens in coach (and I can't afford to fly anyother way) where there just isn't laptop power, and I'm lucky if there's enough room to sit down, let alone actually unlimber the laptop. This leg now (since I'm on the plane) is a pleasant exceptionfor, while there's no power, at least the person in the seat in front of me has left her seat up so I can put the laptop on the tray table and actually open the screen. Will wonders never cease. (Who knows, maybe I'll unlimber one of the DVDs I brought and watch a movie orsome anime, either of which would be better than the movie they're showing on the plane)

Ah, well, time to work on the menu chapter, then dig into the sum-up of the perl 6 workshop.

[Time passes, though my connecting flight through Dulles isn't]

Or, now that I keep going, maybe not. I actually got most of the rest of the menus chapter written, though it only worked out to three pages. Still, it's three more pages than I started with, so that's fine.

System delays have held up the flight to Bradley. It's the last leg for a flight originating somewhere else and if there's a delay on theother end we get a delay, since it's hard to fly without a plane. (Well, for long at least) We went from a 9:20 PM departure to a11:00 PM "We're figuring the plane will be on the ground then" time. We'll see if that pans out.

There's a good chance, since the plane's enroute, but the bigger question is whether the flight crew can actually take us to the end,since there are limits on how many hours a day a crew can fly. Which, honestly, is a good thing--while the computers can take off and land with some skill they don't work too well if the crew's too damn tired to turn them on and let them take over, and I'm pretty sure that while they can handle the landing, the dull middle bits, and probably even the takeoff, there's the post-takeoff and pre-landing bits to consider, and of course if the crew punches in the wrong data to the autopilot, well... splat. (It's safe to assume that I didn't splat if anyone's actually reading this)

Probably should try to finish the menu chapter. Or watch a Godzilla movie. I could go either way, and giant monster movies are awfully hard to pass up.

[More time passes]

Ah, another delay. Plane's landed, and it should only be another 45 minutes before it's ready to go, since the mechanic just needs to go over the thing for a check. Yeah, suuuuure... The ground crew insistst hat there's more than enough flight time left in the flight crew, but I'm somewhat dubious. The fact that about half an hour after the45 minute wait started they broke out the free soda and chips isn't what I'd generally consider a good sign.

[Even more time passes]

Well, the 45 minute time has just hit, with another 25 minute delay announcement. Whee! Who'dve thunk it?

I hope they pull our luggage when they put us up for the night, because there's no way we're getting off the ground tonight. Or, rather, if we do I'll be damned surprised.

At least my batteries'll be fully charged. That's something.

[less time passes]

No, wait! Only five minutes! Really, this time, they mean it!

And they did, too. After 20 minutes or so sitting on the ground, we're off. Got in only 4 hours late...

Posted by Dan at 11:33 AM | Comments (1) | TrackBack

February 03, 2003

And the challenge is on!

Okay, it's not much of a challenge, but last week I put up a challenge on python-dev to a speed shootout between python and parrot, running a bytecode compiled python program. Well, today Guido took me up on the offer, so we're on!

If I lose, it's $10, a round of beer for the pythonlabs/zope folks (we'll define who they are later). Plus, courtesy of this little gem, a cream pie at 10 paces. It's unsure what Guido's wagering, but being able to beat him in public and pitch a pie at him's definitely worth it. :)

Posted by Dan at 04:39 PM | Comments (0) | TrackBack

Network speeds

I've got my apartment wired up with twisted pair, which makes things nice. I've considered wireless, but since only one of the machines I've got running has wireless access, it seems like a bit much. Besides, I find I tend to work better when offline, and enabling wireless is just so tempting... :)

It's all 10Mb, due to a rather unpleasant event a year or so back with a defective machine and my (now dead) 10/100 switching hub. Normally it just isn't that big a deal--I'm not slinging enough data around to make a difference. And the link to the outside world's either 192Kb or 1.5Mb, so either way it outruns the DSL.

The one time I do feel it, though, is when getting ready to go places. Y'see, I've got our CD collection ripped and sitting on the central house server. It's a fair amount of data, given my wife and I have been collecting CDs since 1985, and I just don't keep it all on any local machine. A copy of some of it, sure, but since there's never enough space on the laptop drives, the MP3s are the first things to get tossed when I'm short of space. And, as I wanted to do a backup of the system recently (due to an unfortunate accident with some coffee and the laptop) I'd tossed them from the laptop so they wouldn't have to be backed up.

Well, now I'm heading off for a week of travel and work, so it's time to throw them all back onto the laptop, and that's when the speed difference really shows. Dropping 3G of MP3s over 100Mb takes a lot less time then doing it over 10Mb. Dammit. :(

Posted by Dan at 01:02 PM | Comments (0) | TrackBack