January 11, 2004

Like crap, falling from the sky

Well, it's officially happened. Someone wrote an automated comment spammer for Movable Type. As I type this, I'm watching repeated netstat show port 80 connections, and seeing the "MT-Blacklist comment denial" messages show up in the log. (36 so far) After 37 of the damn things actually made it through (some while I was in the midst of de-spamming things).

Whups, make that 40 of the damn things. A few more just slipped through. And even more on the logs, and I'm not going to bother counting. (Okay, I did. 70 were blocked, 38 made it through (I mis-counted) in a 40 minute period)

This officially counts as insane. I think I'll go analyze the source IP addresses to see if there's any rhyme or reason, but unless they've managed to figure out how to spoof that it looks like these are coming from all over. Makes me wonder if someone's set off a zombie blog spam network, along the lines of those zombie mail spam nets.

I've added some pretty aggressive entries to the anti-spam list now (one of the joys of working for Northern Light -- I remember some of the common spam domain rules. Things like "two or more dashes and it's probably spam" and "The numeral 4? Probably spam" Probably here meaning 90% or better chance, so if you've got a domain with one of those, well... sorry) so hopefully this'll slow things down. http://www.sidhe.org/~dan/blog/blacklist.txt if you want it, 'specially as the master blacklist isn't being maintained at the moment.

People. Bah. Probably time to disable comments entirely, which is a shame.

Posted by Dan at January 11, 2004 06:13 PM | TrackBack (27)
Comments

Edit your page templates so that the URL: the user provides with the comment is *not* hyperlinked, but displayed in ASCII plaintext -- or set up the MT-Redirect plugin, so that hyperlinks in and attached to comments point to your server, thus removing most (90%+) of the Googlejuice associated with spamming you in the first place.

Posted by: Richard Soderberg at January 11, 2004 08:22 PM

Richard: how would that have prevented this automated attack? Why do you think it would prevent future automated attacks? Spammers don't read blogs, they only write to them.

Posted by: Mark at January 11, 2004 09:46 PM

time to implement a captcha type form before people can post to it. (so people have to type a word from an image before the post will go through)
or even better.. a email-validation process.. you type your comment, and it mails you with an activation link.

Posted by: Ian Holsman at January 11, 2004 10:05 PM

Ian, an email-validation link would be much worse than a captcha. First, it's easily automatable. Second, many commenters don't have easy access to their email or don't want to share an addess.

Posted by: Aaron Swartz at January 11, 2004 11:10 PM

Thanks for the heads up, and thanks for offering to share your blacklist. But do bear in mind that MT-Blacklist checks all comment input fields, not just the URL. Your blacklist includes several common words that, while often used in offensive (or offensively commercial) contexts, can just as easily appear in legitimate discussions. Those blacklist entries probably need to be tightened up so that they more closely resemble the specific forms that you encountered as spam comments. Obviously, I can't list those blacklist entries because then this comment will not be posted.

I mention this because I naively imported your blacklist into my MT-Blacklist installation, then ran the de-spam command, which immediately flagged a friend's completely innocuous comment in a discussion about a museum, of all things, as spam. :)

Posted by: jacob at January 12, 2004 12:05 AM

Ah, I didn't think about the list covering the body text as well as the URLs. It'd be nice to have that separate, or a separate list for URL-identified things, but that's just picky whining about an otherwise non-sucky solution.

I patched up the blacklist to remove the offending bare words and added in a more restricted regex for them that ought to only catch things in the middle of words with embedded periods and two or more character suffixes. Should be a bit better, I hope.

(I really want to build a regex optimizer for this thing now too--the list is big enough with enough overlap that it's burning a lot of CPU time where it doesn't have to. Something for parrot, I guess)

Posted by: dan at January 12, 2004 10:38 AM

I don't use MT so this is just hearsay, but is it not possible to rename your comment script so it is not mt-comment.cgi? This would presumably cut down on spam from scripts which depend on the comments URL.

Posted by: Damian Cugley at January 12, 2004 11:21 AM

Renaming the comment script will stop spam from those people who are just assuming you are using a normal MT install. If they bother to look at your site they can easily follow the hyperlinks to your renamed script though and change their tool settings appropriately.
Requiring authentication by way of registered users isn't foolproof as the spammers could sign up for an account (unless you moderate it).
Adding an image with text embeded which has to be typed in would stop most spam cold (and such a thing is available as an MT plugin I think) however this does mean that users not capable of seeing images are excluded.
MT-built in flood control might help a bit - say "You posted less than x mins ago, please wait another y minutes before posting again". That would also stop those acidental multiple posts caused by hitting the refresh button...

Posted by: Sam Newman at January 12, 2004 12:51 PM

The good news is that the sky isn't quite falling. They didn't just this minute write an automatic spam blaster, since I see by my archives that I first got blasted that way on October 27th. Of 2002.

Unless a spammer finds a clearly unmaintained blog, that sort of thing isn't actually useful, since even the most clueless of spam-leaving bloggers will notice suddenly getting a comment on every single entry, and will probably just shut off the comments, or shut down the whole blog as being too much trouble.

Interestingly enough, that last might well be a win for the spammers, who not only want your PageRank, but are very jealous of it, and very angry that Google thinks so highly of your (and my) maunderings, while thinking so little of their carefully crafted scam-sites. Something to consider, while looking for a way out of the spam nightmare we are in right now: anything (including not linking legit commenter's URLs) which makes you have less impact on Google is a win for them.

Posted by: Phil Ringnalda at January 12, 2004 01:33 PM

It is pretty easy to rename your comments CGI, and I recommend it. In fact, MT even has a placeholder tag for the comment script, so if that's in your MT templates, you only need to change the name of the file itself and change one line in your MT config file. This will help defeat scripts.

While it is imperfect, mt-blacklist also automates removal of comment spam, which was handy the day I got 100 spam comments in less than an hour.

Simon Willison has cleverly created a redirector so that all URLs show up as [his URL]/redirect?[real target URL]. Stops googlejuice cold. I'd like to see this plus a progressive throttle (ie, 30-second wait for the second comment, 60 for the third, etc) and automatic blacklisting for rapid hammering.

Posted by: Adam Rice at January 12, 2004 02:06 PM

Why not change the names of the comment form input fields? It'll be a while before a bot can decide that "flint" is really the "name" input field.

This will require the template and the form handling in mt-comments.cgi to be changed, obviously.

Posted by: Charles at January 12, 2004 02:09 PM

Charles: if I was a spammer, I would start comparing input fields with labels/preceding text. It slows them down, but not much...

Posted by: Stephen at January 12, 2004 02:42 PM

The stupid thing about all this is that I don't give a damn about page rank, nor really about the page rank of things I link to. Hell, if there was a way to add some sort of GoogleSlime to pages that actively decreased the ranking of linked-to pages I might well do that for any new comments. (And leave the crap for a bit, appropriately CSS-ed away to invisibility)

Actively modifying the filenames, code, and template contents is something I'd rather not do if I could avoid it, as it makes upgrades and maintenance releases a massive pain in the neck, possibly enough to warrant just killing comments altogether rather than dealing with it. Not that I want to kill comments--I'm making enough declarations of fact that it's good to have a way for corrections and notations of error to be thrown up by folks who aren't me.

Posted by: Dan at January 12, 2004 02:58 PM

I know that many of the MT installs won't be able to benefit from this tip, but I have found that the following robots.txt entry helps quite a bit in hiding from spammers:

User-agent: *
Disallow: /mt

(substitute /mt for your MT directory)

Of course if you're running a default MT install, your comment and trackback popups will be in the MT directory, so that will hurt your google juice. But I have the popups disabled and instead have everything one the individual archive page, thanks to SimpleComments.

Posted by: Scott Johnson at January 12, 2004 03:31 PM

I renamed my comments CGI, and from my access log stats thought that simple change was blocking a ton of attempted posts... until I looked at the real hits in the log and found I had forgotten to rebuild one of my weblogs, and three search engine spiders had been spidering the comments. I also set it up so you can still post with mt-comments.cgi, but instead of posting your comment it quietly adds you to that weblog's IP ban list, so that was especially egregious of me to overlook.

I also used a few other techniques, and while they do no harm I'm aware of and perhaps help for now, I could write a bot that can post in spite of them all. As Mark Pilgrim and others like Stephen above have written, it only delays the inevitable and forces spamming tools to grow more prickly.

Posted by: Mark Paschal at January 12, 2004 03:47 PM

You can block automated spam with a system of rotating form control names. For example:

URL:<br/><input type="text" name="url-srgenrgerfgdg" value="http://"/>

It would probably be a pretty big log managing what random name was served to whom, but it would probably eliminate mass-spamming, and coupled with an HTTP redirect (as in comment #1,) can stop spam altogether.

Posted by: Lenny at January 12, 2004 04:40 PM

Renaming the mt-* cgis has worked wonders for me. From what I gather, the spammers build up a bit list of CGI urls, akin to an email database, then go from there.

Basically I created a special subdomain for my MT installation and my spam went from 30/day to 0. With literally hundreds of 404s to my old mt-comments.cgi. The spam has just started up again so I'll rename the scripts again and hopefully everything will die down.

Posted by: Koz at January 12, 2004 05:47 PM

I've been using .htaccess to require mt-comments.cgi to be refered from one of my own pages (like the fix to prevent image hotlinking), and it seems to have slowed things down as most bots just call the cgi directly. Of course it has problems if a legitimate user agent doesn't correctly report the referer, but I don't think that's too serious of a problem.

Posted by: John at January 12, 2004 09:42 PM

Add this to robots.txt
User-agent: *
Disallow: /banme.cgi

Then put this on every page:
<a href="banme.cgi"><img alt=""></a>

By banning any robot not honoring your robots.txt, you should get rid of every spambot, leaving only the average cretin dong it manually (against whom I'm afraid you really can't do anything anyway). More interestingly, set the default file to post comments to ban people accessing it and block it from robots.txt as well. There are several ways to do this, but the important thing is: enforce your robots.txt: if they can't play nice, don't let them play at all.

Posted by: Effovex at January 12, 2004 10:22 PM

John, the htaccess thing is something I was thinking about yesterday. I don't suppose you could share the entries in your .htaccess file?

Posted by: sarah at January 13, 2004 12:24 AM

You could also use htaccess to password protect the mt-comments.cgi file and provide ramdom password for users.

Example: http://peter.mapledesign.co.uk/weblog

Posted by: sn at January 13, 2004 03:51 AM

Perhaps someone ought to keep a running list of IP addresses, domains, etc that comment spam frequently comes from?

John, I don't know too much about .htaccess files. How does one require mt-comments.cgi to be refered from one of your own pages?

Posted by: Robert at January 13, 2004 11:32 AM

This is just from the top of my head, but the following code should redirect all users that are not refered from within your site to your main page. Not tested, so it might need some tweaking.

RewriteCond %{HTTP_REFERER} !^http://www\.domain\.com(.*)
RewriteRule ^cgi-bin/mt-comments.cgi http://%{HTTP_HOST}/ [R,L]

Posted by: Basje at January 13, 2004 12:19 PM

I put in James Seng's CAPCHA-like extension. It presents a gray on gray with gray pattern image of a random number that must be entered with the comment. It stopped the blog spam cold.

Posted by: Ross at January 15, 2004 05:06 PM

While capcha things do cut down on automated attacks, they do so at the expense of the visually impared, and if I'm going to cut out one group of folks from legitimately commenting, I probably ought to cut them all out.

I'm going to go install MT 2.661 and see what that buys me for spam reductions. (Though the spam has slowed down a lot -- only a couple of blocked ones today, and a few yesterday) The flood on me looked very much like a googlehack attempt, though I know some other folks have been hit with just plain malicious attacks.

Posted by: Dan at January 15, 2004 05:35 PM

I was thinking about this today.

Posted by: Aristotle at January 19, 2004 06:24 AM

Sorry, I accidentally submitted too quickly, then my browser crashed I as I was writing my actual reply. Then I thought this would make a good though to write on my own "blog", so that's were I'm putting it. ( http://plasmasturm.org Jan 19 )

Posted by: Aristotle at January 19, 2004 06:37 AM

How is it?

Posted by: Kent at February 9, 2004 07:32 AM

Depends on what "it" is. :)

I've got 2.661 installed, though some of its spam features aren't functional with the version of MT-Blacklist I have installed. It seems to be helping. I also renamed the comment script as someone recommended, and that's seemed to do the absolute most for cutting down on spam attempts. The log only shows four blocked tries over the last five days, which is a lot less than it used to be. Dunno if anyone's actually checking to see that they're getting no direct google-linkage, though.

I did have to go twiddle some of the templates--my original MT install is so old that not all of the templates use the substitution variables, so I had to go edit them. Otherwise it seems to be working out OK.

Posted by: Dan at February 9, 2004 08:24 AM

You guys make all great points here. The spam is not going to stop. You just have to create smarter scripts. :) There are a few small steps you can do to make your scripts smarter from the average bot. You can add a referer check to to your script. If the script is accessed anywhere but your site's URL, then you exit. :)

Posted by: fd at February 23, 2004 03:59 AM

I'm going to go install MT 2.661 and see what that buys me for spam reductions. (Though the spam has slowed down a lot -- only a couple of blocked ones today, and a few yesterday) The flood on me looked very much like a googlehack attempt, though I know some other folks have been hit with just plain malicious attacks.

Posted by: Grest Gost at March 26, 2004 03:42 PM

nice site,

Joe

google

Posted by: joe at June 30, 2004 04:09 AM

The black lists not the correct approach, is necessary to create the centralized system for struggle with spam and to do blogs, closed for the not registered users.

Posted by: andry at July 18, 2004 07:21 PM