March 15, 2003

Polling sucks

Actually, I'd go further than that--a system that polls has something about it that's fundamentally broken.

Now, occasionally (very occasionally) that fundamental brokenness is the point of the system. Operations management tools that track the health of remote machines and devices work like this, since you generally can't count on a machine to let you know when it's died unexpectedly. That's a rare and very specific case. Some hardware devices require polling, but they're generally broken as well, often by design. Polling as a means of hardware interfacing is prone to data loss, and is usually done for cost, complexity, or competence reasons. (I.e. it costs too much, would be too complex, or someone's not competent enough to do it properly)

For software, though, if you're polling, you're busted.

What prompts this is a quick scan through my webserver logs. I've gotten into a number of people's aggregator systems, which is fine, but some of these damn things are checking every 10 or 15 minutes to see if there's anything new, and almost universally there isn't. This is a huge waste of everyone's time, bandwidth, and resources. Yeah, sure, I'm sure it was the easy way to do things, but it's not the right way to do things.

When you design software and you think polling's the right way, what you should really think of is how to use a push or ping method instead, as they're near-universally (modulo the busted hardware case) better. Yeah, I know, RSS feeds are a newish thing, and nobody'd given much thought to them, but maybe it's time to do so. Some of the aggregators, like blo.gs and weblogs.com, take a ping to reread, and that's fine. And it wouldn't be at all tough to set up a subscription system to sign up for blog update pings, or set a central server to subscribe to, or an NNTP-style push system with feeds, or something. Anything's got to be better than what we have now.

Posted by Dan at March 15, 2003 05:59 PM | TrackBack (3)
Comments

well, look on the bright side.

at least with a modern http server and rss aggregator, each hit isn't necessarily downloading the entire rss feed. many aggregators (NetNewsWire comes to mind) will send an If-Modified-Since header, and apache will reply with a 304 response if nothing has changed since the last time the feed was downloaded.

it still kind of sucks that they hit your site so often, but at least they aren't sucking down all that bandwidth each time.

Posted by: garrett at March 15, 2003 08:14 PM

There are a couple proposed methods for this -- the blo.gs folks have a Cloud interface described at http://www.blo.gs/cloud.php, and UserLand has rssCloud, linked to from the blo.gs page. But SOAP and XML-RPC are more complicated than GET, so...

Posted by: Todd Larason at March 15, 2003 11:18 PM

Yep, conditional helps a lot. Maybe better support for SkipHours would help, too. Your application can gather hours where there are actual postings, taken over some longer time, and put's skiphour elemts in a skiphours container into the channel description. Aggregators can use that information to only connect and check when there are postings and so respect your usual posting times. This would at least help if you don't post 24h a day ;-)

Ok, and aggregators shouldn't check more often than 30 minutes, 60 minutes being better, I think.

Posted by: Georg Bauer at March 16, 2003 06:31 AM

While conditionals help some, it's still polling, and it's still generally busted. There's no reason for aggregators to be polling--they ought to just register their interest and the blogging software should ping those with registered interest when things change.

It's an example of a larger set of problems, though, and the original entry addresses it poorly. I ought to go do it right.

Posted by: Dan at March 16, 2003 10:55 AM

Making the blog ping interested clients might be harder than it sounds. I for one, poll from my laptop, which frequently changes ip, and usually is behind some kind of NAT device. How do you propose that your blog notify me when you post? Notifying blo.gs or similar services just centralizes the problem, if blog clients are supposed to poll blo.gs (or whatever).

Posted by: Marcus at March 17, 2003 05:14 AM