June 03, 2006

That whole 'outside world' thing sounds like a good idea

While I don't know that I've explicitly documented it yet, the Tornado engine has no built-in way to do disk IO, network IO, send signals, or otherwise interact with the outside world. This is on top of the engine monitoring memory use, thread creation, and potentially CPU use, with it being able to nuke a thread safely at a moment's notice. A nice sandbox, as secure and manageable as I can reasonably design in.

That's all great, since it helps provide you with a safe engine, and since we're really shooting to be able to run untrusted code that we get from someplace we don't entirely trust. Agents and calculation engines and macro programs and whatnot, and it's nice to have some assurance you're not going to have your system compromised because you downloaded and installed the WhizBang Dive Control plugin for the Aerial Deuces flight simulator.

Of course, the bad part about making it impossible to talk to the outside world is that you can't talk to the outside world. Which is, sometimes, a problem. Sometimes getting data at function start and returning data at function exit isn't enough -- you really want to call out into your environment or send back intermediate results. And while that's possible to do this way, you have to have the environment essentially send in seeker threads looking for data to return. (Which is actually kind of a cool concept, if you think about it. Damned awkward to actually write code for, though)

To get around this, Tornado has a mechanism where the environment can register callback functions that threads can invoke to do things. What the callbacks do is up to the environment, so it's a good idea to make sure that your Tornado programs match up with what your environment exports, but since we tag things that need verification we can at least check function signatures for callbacks. This way the code can have some access, and the host environment can regulate that access and enforce any sort of policy it might want. (Rate limiting, destination filtering, or whatever) When in doubt, punt, right?

This does have some issues, of course. The host does have to register and manage things for the engine, but something's gotta do that, so too bad there, especially since in the simple case it doesn't have to actually do much, just proxy the request. That's the simple case.

The complex case is when the host program actually needs to do some work to satisfy the request, and the things that makes that difficult are, of course, threads. Tornado code probably can't just invoke the callback function -- Tornado's heavily threaded and most host programs... aren't. I fully expect one of the common cases is for a single-threaded main program to embed a Tornado engine to do its thing, and even in a multithreaded case you don't generally want to have random threads calling into yourself. (Yes, I know, if you're embedding a threaded engine and providing callbacks you should be careful. But you should floss after every meal and exercise regularly too, and we all know how often people do that)

For that, then, Tornado allows you to note that a callback isn't thread-safe. If you do, Tornado gets mildly clever.

While it's not made clear to the calling program, each Tornado engine instance has a master thread attached to it. When you start an instance the thread is created, and when you make a call to have the engine do something the call's made to that thread, which then dispatches the call. When any thread running under that engine makes a callback, the callback request is routed to that master thread, which then makes the call into the embedding program but... only when the embedding program has said it's ready for the engine to execute its callbacks.

Basically the embedding program calls a Tornado function that drains the queue of waiting callback functions, so those functions get called from within the context of whatever host thread is draining the queue. (This way the embedding program can do the things that have to be done in a specific thread, like GUI or database calls, from the right thread)

Alternately the embedding program can note that a callback is always safe (because, for example, the embedding program is multithreaded and has properly protected the code), in which case Tornado code will just call them from whatever thread in the engine that happens to need to make it.

How the embedding program waits, and how it manages callbacks, is a matter for another post.

Posted by Dan at June 3, 2006 04:12 PM | TrackBack (0)
Comments

There is a way to ask for only specific callbacks to run when a queue flushed is requested by the host, right? Otherwise if a single-threaded caller has more than one possible context things would get quite awkward to coordinate.

Posted by: Aristotle Pagaltzis at June 4, 2006 08:44 PM

There is some control over what callbacks get run, yep. That's a matter for a separate blog post, though. :)

Posted by: Dan at June 4, 2006 09:04 PM