Like a lot of other folks, I run Analog as my web logfile analyzer. I don't do anything fancy with it, just rip it through the logs with the proxy stuff stripped out. (Yeah, I should use Squid or something, but I don't. Apache's mod_proxy, locked down hard by IP, is good enough to not bother changing)
And, being the stats junkie I am, I run it with DNS lookup turned on. On the past few years worth of logs. Why? Well.. I have the disk space, and the server's not that busy, so why not? (Though starting a blog has upped the hit rate. OTOH, Amanda's Buffy Review Page still gets a lot more hits than I do, which puts it all in perspective :)
The one big why not is the length of time it takes to build up the report, since there's a lot of data in the log files. More to the point, there are a lot of IPs in the log files and the reverse lookups take ages (like hours), even with caching. Plus the cache file's crept up to ~30M.
Anyway, I just threw in an upgrade to Analog, since I was running 4.16, which predates all the "throw malicious crap in the referrer" hacks and it was time to upgrade. In doing so, I found a nifty little utility, DNSTran, which does the reverse lookups for you, potentially translating and compressing the log files in the process. While I ought to do that, I tried it just in the 'build Analog DNS cache' mode and... wow. This thing screams. What used to take Analog up to 10 hours to do took this thing somewhere on the order of 10 minutes, and that's with a brand new cache file. (And spending a lot of time waiting for the straggler lookups to finish on each of the 30 or so log files--I expect it'd be much faster end to end if I was working on a single source file) Sweet! Plus it got named to peg the server CPU, something I've never seen before. Guess my creaky hardware's not that happy doing 50-80 lookups a second, but that's fine.
Running Analog on the resulting prebuilt DNS cache took all of two minutes, 44 seconds. This is down from ~15-20 hours if it was starting from scratch. Yow!
Posted by Dan at March 2, 2003 03:08 PM | TrackBack (0)