I wanted to post an update to the Killing Referral Spam post I made a while back. First, I wanted to update some misconceptions that I had about the whys and what fors of referral spam. As such, let me quote my update from that post:
UPDATE: Many people, including myself, have come up with far more legitimate reasons why spammers do target referral logs. Among them are: * Many sites include the “most recent referrers” on their pages, so any referrer spammers will get links and corresponding Google Juice. * Many sites have public links to their referral pages which can be spidered. * Site stats can be bloated by referral spammers so that getting an accurate record of your readership is all but impossible (hat-tip to Ruby with a “B”). * Obviously the more referral spam hits you get the more bandwidth that gets used. For those on limited hosting plans or for hosts themselves, this can be costly and burdensome (hat-tip to glo). * Some blogs have scripts to pick up the most common referral addresses and link back to them (hat-tip to Shadow). Just wanted to set the record straight… it’s not as useless as I originally thought.
All of those reasons make keeping ahead of the spammers a little more necessary in my mind. In analyzing my referrer logs for the 6 days since March 3rd I determined that I had killed 518 referral spam attempts. Given the current size of my index — and given that spammers rarely pull the rest of the content, instead simply pulling the single page they’re targeting — that’s roughly 25M of bandwidth that would have been taken up solely by referral spammers. For those on limited hosting accounts, that 25M of bandwidth can make a difference.
In terms of false-positives there have been surprisingly few. Of those, most were from domain names that are impossible to tell from “good” domain names (note to bloggers – to avoid being blacklisted it’s sometimes advisable to not include the term “facial” in your domain name). Those that do get unfairly blacklisted have used the email I put on the page to let me know and I have them back in “good graces” in short order. I don’t consider it onerous and as far as I know the people that contacted me don’t either.
In terms of how many spammers get through, it’s still a game of whack-a-mole. On average I have to use my script and add an asshole to the list about once a day. I can live with that. It’s actually gotten enjoyable knowing that when I click that “add” button I won’t be seeing that particular turd again. So long, loser!
A lot of people suggest that I switch over to Referrer Karma. I’m sure it’s great. However, it’s a bit of a case of “not invented here” syndrome in that generally I can’t tell what it’s doing or why it’s doing it. I’ve read accounts of some overzealous protection and in cases like that it can be difficult to determine what went wrong and how to let people back in. I know that was one of my beefs with Spam Karma when I used it. Sometimes it would latch on to someone — generally a good friend of mine in real life — and consider him to be the human avatar of penultimate evil and not allow anything through. Reversing that process took longer than I felt comfortable with. So, for now, I’ll stick with my game of whack-a-mole.

3 Responses to “Referrer spam update”
Hey, thanks for the hat tip, but my name is Ruby (with a B).
Keep up the good fight!
Indeed it is, my apologies. It has been corrected
.
We report’em to Google, get’em banned, and when they figure it out, they stop spamming.
Well, except the worst of the bunch, who seem to keep trying – for now.