June 13, 2005

WWIT: Fast interpretation

Parrot is, as an interpreter goes, pretty damn fast. Not as fast as it could possibly be, a lot faster than many interpreters at what it does, and can be faster still. (Heck, you can enable optimizations when building Parrot and get a good boost -- they're off right now since it's a pain to pull a core file for an optimized binary into a debugger and do anything useful with it) A lot of thought went into some of the low-level design specifically to support fast interpretation.

There are a couple of reasons.

The first, and probably biggest (though ultimately not the most important) is that I thought that building a cross-platform JIT was untenable. That turned out not to be the case, at least partly. Building a framework to allow this isn't as big a deal as I thought. That doesn't mean you get a JIT everywhere, though. (You want to write a Cray JIT?) Getting a functional JIT on a platform and maintaining it is definitely a big undertaking and, like any other all-volunteer project, Parrot's engineering resources are limited and somewhat unreliable. Getting an interpreter working relatively portably was a better allocation of those resources, leaving the JIT an add-on.

The second, and more important by far, reason is one of resource usage. The rest of this entry's about that.

Perl, Python, Ruby, and PHP are used heavily in server-based environments, and if you've ever done that you know they can be... slow. Oh, not all the time, and there are ways around the slowdown, but... slow. Slower by a factor of 200 in some cases. (Though if your code's that much slower it's normally a sign that you're really not playing to your language's strengths, but sometimes you can't do that) Needless to say, you'd prefer not to have that hit -- you want to go faster. I mean, who doesn't?

The normal answer to that is "JIT the code". It's a pretty good answer in a lot of cases. Except... except it's amazingly resource-heavy. First there's the CPU and transient memory cost of JITting the code in the first place.There are things you can do to make JITting cheaper (Parrot does some of those) but still... it's a non-trivial cost. Second, JITting the code turns what could be a shared resource (the bytecode) into a non-shared one. That's a very non-trivial cost. Yes, in a single-user system it makes no difference. In a multi-user system it makes a huge difference.

As a for example, $WORK_PROJECT has a parrot executable that's around 2M of bytecode. Firing it up without the JIT takes 0.06 seconds, and consumes about 10M of memory per-process. (15M total, but 5M is shared) Firing it up with the JIT takes 9 seconds and consumes 100M of memory per-process. (106M total, with 6M shared) On our current production machine we have 234 separate instances of this program running.

Needless to say, there's no way in hell we can possibly use the JIT. The server'd need somewhere around 23G of memory on it just for this one application alone. (Not to mention the memory needed for the other 400 programs being run, as when I last checked there were a bit over 600 total parrot-able programs running) The only way to make this feasible is to interpret the bytecode. (And is another reason for us to have directly executable bytecode, since it means we can mmap in the bytecode files and share that 2M file across those 200+ processes, and fire up Really Fast since it's already in memory and we don't have to bother reading in all 2M of the file anyway, just fault in the bits we need) Note that this isn't exclusively a parrot issue by any means -- any system with on-the-fly JITting (including the JVM and .NET) can suffer from it, though there are things you can do to alleviate it. (Such as caching the JITted code to disk or having special OS support for sharing the code, which general-purpose user-level programs just can't count on being able to do)

(Alternately you could use this as an argument for generating executables. That'd be reasonable too, assuming your program is amenable to compilation that way, which it might not be)

Then you've got the issue of enforcing runtime quotas and other sundry security issues. Some of these need to be done as part of regular interpretation (that is, you need to wedge yourself in between ops) and if you don't interpret quickly, well... you're doomed. It still isn't blazingly fast, as there's a cost involved you can't get around, but you can at least minimize that cost.

So what does fast interpretation get you? It gets you a performant portable engine, it gets you some significant resource savings, and it allows for fast security. Not a bad thing, overall. And good reasons to not be JIT-blind.

Posted by Dan at June 13, 2005 03:46 PM | TrackBack (0)
Comments

Great article, Dan: the problem (that JITs are expensive) is quite obvious, but the connection to the server space wasn't obvious to me at all. I'm a big JIT fan, but after reading your article, it's obvious that you still want a fast non-JIT VM too. Perhaps this is a reason why Java Servlets are doing so well in the server/enterprise area? Perhaps HotSpot is quite efficient considering what it's doing ...

Posted by: Andre Pang at June 13, 2005 05:09 PM

Java Servlets have the same problems -- that's one of the big reasons for the architectures you see in java systems. There's one large server process and all the individual chunks of java code run within it. The JITted code is shared between the different threads that run within the server and persist from connection to connection, conserving resources. Basically it's a webserver or some other 'persistent network appliance' type application, which is fine. (Hence the -let suffix -- applet, servlet, thinglet, etceteralet... :)

That's playing to JITted JVM strengths. It falls down really badly in the sort of environment I was putting up as an example, and has the same problems that Parrot does. That is, when used as an application language, rather than an applet language, you've got all that unshared per-process overhead. I think it may well be one of the reasons you don't see Java used so much in that way or, when you do, it tends to be Java code compiled to an executable rather than to bytecode.

Posted by: Dan at June 13, 2005 07:29 PM