April 19, 2004

Why parrot in production?

A good question, and hopefully this is a good answer.

This is a response, of sorts, to some of the feedback generated by the Parrot compiler article. (And yep, it's likely that only one of the core parrot guys could write the article, which is why I did--now everyone's got a good chance at writing compilers for parrot :)

The compiler article didn't really get into why my work project's targeting parrot, just that it was doing so. No big surprise, since the opening bits were really meant as a way to draw people into the article as well as give the idea that it's both possible and reasonable to write a compiler for an old, crusty language as a way to transition to something that sucks much less than what you presently have. There are a lot of folks stuck with limited Domain-Specific Languages, or a code base written in some antique dialect of something (usually BASIC or PL/I), or some custom 4GL whose whiz-bang features neither whiz nor bang any more, and while you can just dump the whole wad and move to something new, well... that's expensive, very disruptive, and risky. More than one shop has found that moving a currently working "legacy" system to The Next Great Thing is a big and very expensive step backwards.

Not so say that I don't like moving to something new, but the crusty sysadmin in me wants a solid transition path, good fallback contingency planning, and a Plan B, if not a Plan C, just in case. It's often much better to transition the underlying infrastructure first, then refactor, rewrite, or just plain shoot the code in the nasty source language at your leisure. It's also a lot less work in the short term in many cases, which lets you move over to the new system quickly. Often in these cases the problem isn't the language, no matter how crappy it might be. Instead it's the runtime limitations--memory, database, screen handling, or whatever--that really get in the way, so the old crap language can work just fine, at least for a long time, if you can relieve the underlying runtime pressures.

Anyway, the explanation of the work project.

As I said in the article, our big issue was the database library for this language--standard ISAM DB with file size limits that made sense at the time but are starting to bite us, with no real good way to fix them. We had to move to something else, or we'd ultimately be in a lot of trouble.

The first plan was to write a compiler for DecisionPlus that turned it into Perl code. We'd run the conversion, have a big mass of somewhat ugly perl code, then shoot the original source and be done with it. All the code would then be Perl, we could work with it, and refactor it as we needed to so it didn't suck. (You can hold the snide comments on perl's inherent suckiness, thanks :) We fully expected the result to look somewhat nasty--hell, the language had no subs, the only conditional statement was if/then/else, and control flow was all gosubs and gotos. To labels. All variables were global, there was no scope at all, and, well... ewww. In a big way.

I was brought on because it's rumored I've got reasonably good perl skills, and I've done compiler work, so... I set in.

The initial compiler was written in perl because, well, it's a nice language for text processing if you like it, and it's got a lot of powerful tools (in the form of CPAN) handy. Yeah, I could've done it all in C with a yacc grammar, but it's a lot less effort to get that level of pain just smacking myself with a hammer. The first cut of the compiler didn't take too long, as these things go. A few months and I had something serviceable that would run against the whole source repository and work. There were still some issues with the generated code, but nothing bad.

Unfortunately... the output code was nasty. Not last act of Oedipus Rex nasty, but still... what I had to do in the perl code to maintain the semantics of the DecisionPlus source resulted in awfully ugly code, even with some reasonable formatting by the compiler. Lots of little subroutines (so gosub/return would work), lots of actual gotos, and because of the little subs lots of scopes all over the place to make refactoring painful. One thing I hadn't counted on was the extent of the spaghetti in the code--since there wasn't any syntax to restrict things, control flow was an insane mess full of bits and pieces done by people writing Clever Code. (Like labels that were sometimes goto-d and sometimes gosub-d, with exit paths decided based on if statements checking global variables)

There was another issue as well--typed data. DecisionPlus has a type system rather more restrictive than perl's, including length-restricted strings and bit-limited integers. And, while these things caused no end of obscure bugs, we had to be bug-for-bug compatible because there was code that used these behaviours to their advantage. To get that with perl meant using tied variables to impose the load/store semantics and overloaded operators to get the binary operations correct. Unfortunately ties and overloads are two places where perl 5 is really, really slow. Taking a look at what'd be needed to make this work, it became pretty clear there'd be a lot of overhead, and that the result would be potentially performing badly.

So, we gave a shot, and it became clear that the primary goal, getting an editable perl version of the source, wasn't feasible, and even using perl as a target for the compiler would be sub-optimal. That's when Plan B came in.

Plan B, if you hadn't guessed, was to target Parrot.

Now, I was actually pretty nervous about this -- I was not sure we were ready for prime time with parrot. It seemed like a good idea, but... we weren't sure. Really not sure. I've done the sysadmin thing, and I've been in the "it must never, ever go down" environment, and I know enough to make a sober risk assessment. And, doing so, the answer was... maybe.

Importantly, though, even if things failed, we wouldn't be wasting all our time. We'd still get a more robust and mature compiler, and a better idea of the issues involved with compiling the language, so... we set a very aggressive goal. If I could make it, we'd know parrot was up to the task, and if not, we'd go to Plan C with a better understanding of the problems and a compiler architecture ready to retarget to another back end. Plus we'd have shaken out many of the issues of switching to a new database system (Postgres rather than an ISAM setup) and screen handling system. (I spent a fair amount of time teasing out escape sequences from primitive character function databases, poring over VT220 and xterm escape sequence manuals, and back-translating them to curses functionality. Now that was fun...)

It worked out, of course. We wouldn't be at this point (writing the article and all) if it hadn't, though it was touch and go for a bit. Still, I beat the deadline by a day and two hours, which was cool. And a lot of bugs in parrot were shaken out, and some functionality prompted, because of this, which was good too--always good to have a real live application.

Oh, and Parrot got a mostly working Forth implementation too. So it was a win all around. :)

Posted by Dan at April 19, 2004 07:04 PM | TrackBack (0)
Comments

(Hunting for an unlocked, Forth-related posting to comment on...)

Have you looked at Joy? There's a tiny implementation for Squeak that might be useful for ideas.

There's also the horse's mouth: The History of Forth by Chuck Moore himself, and it was good to see Dan mention colorForth (I really like Chuck's tail recursion enlightenment/reductionism).

I think as long as your end result is either fun or functional, no one will really care about quirks. I'm glad this implementation isn't entirely dead!

Posted by: Jack at November 6, 2004 03:18 AM