May 12, 2006

Distribution of processing

So, I've been busy with other things, and Tornado's been ignored of late. That's done with, and it's time to get back to the machine. The delay has let some interesting ideas kind of percolate, though, which I'm happy about. Specifically dealing with distributed computing, which actually dovetails nicely with some of the other things that Tornado does.

Tornado's basic architecture -- a strongly sandboxed VM with isolated threads communicating via passed messages -- extends nicely to a distributed system. If the system's sandboxed well it's safe to fling code around to remote machines (and safe to receive it). If individual threads are mostly isolated there's not that much time lost synchronizing threads and their data. If you have thread pools that share a message port then it's easy to fling a lot of data at a remotely hosted pool so it's ready when the threads need some.

Tornado, handily, is strongly sandboxed, its threads are pretty damn isolated, and it's got thread pools. Woohoo! :)

Silly exuberance aside, it's not really that easy. While multithreaded programming and distributed systems have some things in common, you can't really treat remote and local consumers and producers identically. Bringing in remote threads adds in a whole new set of potential failure modes (most of which are actually recoverable, which is nifty if you're building reliable systems) and of course there's the issue of latency to take into account. Sending a message to another thread in the same process may take a microsecond, sending it to a thread in another process on the same machine may take a dozen or more microseconds, and sending it to a remote thread could take milliseconds (or seconds, or possibly minutes if you're really unlucky and have lots of data). Sure you could treat it all the same, but... I dunno, I think it's a bad idea to ignore potential sources of latency.

There are also quite a number of applications where you really do want to know. If you're doing crypto work (some of the algorithms parallelize and vectorize really nicely) you'd certainly not want to be slinging plaintext or keys across the network, or if you do it's because you explicitly chose to do so. If you're doing game programming you probably don't want the latency in communicating with a remote process. On the other hand, if you're doing numerical simulation work you may well want to take advantage of remote systems -- the extra millisecond or three to send data to a remote system's trivial if you've a computation that takes minutes, hours, or days, and being able to easily bring online extra computing resources is a very useful thing.

So, anyway, that argues for designing in the capability but making it more explicit -- programs can specify that a spawned thread can be remote, or a spawned threadpool can be remote -- and the VM can manage distributing them around as need be. (Which brings up the interesting idea of welding ZeroConf into the VM to allow it to track what other execution engines are around, but that's for another time) Possibly also allowing for message ports to be specified as external resources, though there's always the issue there of malice -- you can't necessarily tell what the remote system's running, and allowing it opens up a potential network port with the possibilities for malicious bytecode to attack remote systems, which is something that Tornado's trying to prevent.

Dunno if it'll make it in. It's an interesting thing to think about, though.

Posted by Dan at May 12, 2006 05:46 PM | TrackBack (0)

I've been waiting for tornado. Please, don't delay it for this. Yes, it's exciting, but it sounds like a nice 2.0 feature :-)

Posted by: Max Lybbert at May 15, 2006 01:50 PM