Tuesday, January 30, 2007

Stopping guestbook spam

My family web site has a guest book that a number of people, mostly family, have signed. Since I put it up a couple of years ago, I've gotten "guestbook spam" every now and again — messages sent by people we don't know, advertising their cheap V1@grA or whatever, though some just talk about what a great site it is, and how informative, and don't actually contain a link. I don't understand those ones, but whatever. Anyway, in the last few months, the number of spam entries I was getting increased astronomically, until a couple of weeks ago I started getting four or five of them a day. I wrote all the code for the website myself, and when someone adds a guestbook entry, I get sent an email containing who did it, when, and the text of the comment. When the spam started getting out of control, I changed it so that there's now a "delete this entry" link in the email. If I click the link, the entry gets deleted. Very easy, but still annoying.

I have no real idea how these messages were getting created, but I'm quite certain it wasn't someone actually sitting at a browser looking for guestbooks and adding entries when they find one. It had to be a bot of some kind. I figured that if I were writing a bot to do this, I might look at how the majority of guestbooks handle comments, and then write my bot accordingly. My guess was that they simply start requesting pages using POST, and sending "comment=<Some comment>&name=<Fake name>&email=<Fake email>" as the POST body. If it's a guestbook-type page, that may or may not enter a comment, and then the bot can move on to the next page. I suspected that the vast majority of guest books use "email" as the name of the email address field, "name" as the name field, and "comment" as the comment field, so I changed my page so that the names of these fields are hard-coded random strings. (If I wanted to, I could change it so that the strings are not hard-coded, but randomly generated at run-time, but that's just too much work.) The end result is that in the week since I made this change, I have not gotten a single spam entry in my guestbook.

It's certainly possible for a script to get the (HTML) source for a page, analyze it to find out what the actual field names are, and then submit spam entries that way, but I guess the bots aren't smart enough yet to do that. I'm sure it won't be long though...

Feb 5 update: Got two spam entries this morning. Oh well.

4 comments:

Anonymous said...

Interesting idea.

Speaking of trying to reduce spam... I noticed that you are posting user's email address in plain text on your comments page... not really a good idea since lots of bots scrape addresses from websites. If you really want to post the addresses, you can easily obscure them by embedding zero-space html within the string.

Graeme said...

Hadn't thought of that, MC - thanks. I have removed email addresses from the guestbook display, since there's no real need to display them.

Anonymous said...

Yep, you removed the addresses, and it looks like you removed everything else from the page - your guestbook page is completely blank.?

Graeme said...

Thanks again MC - somehow the php file on the server got overwritten to be 0 bytes. I'm sure I checked it after uploading the new version, so I'm not sure what happened.