» Who knew that Mister T was such a fashion maven? (0)

» "And right then," Knox said, "I heard, 'Excuse me, would it be OK if we carried her around and she touched each bag?'" Sportsmanship defined. (0)

» Web-based sequence diagram generator. Whoda thunk? Next thing you know you'll be able to buy stuff online. (0)

Killing Referral SpamKilling Referral Spam

UPDATE: See an update to the rules here.

It all started with Paris Hilton. Though, really, what doesn’t?

Referral spam is silly to me. For those unaware, “referrer/referral spam” refers to the practice of sending a request for a web page — like my weblog here at http://www.coldforged.org — and changing the request headers such that the request appears to be coming from some site. At first glance, this doesn’t seem overly effective, right? What these cretins are trying to do is get you, the person who owns the web page, to see that referrer link and think to yourself, “Self, wow, someone is linking to me! I should visit and see who it is!” You therefore theoretically visit the site and see whatever it is that the spammer wishes you to see, be it porn, gambling, mortgages, or the aforementioned Ms. Hilton. That’s it. That’s their entire raison d’etre. It doesn’t help with Google-juice, it doesn’t reach your readership, it’s solely targeting the blog/web site author.

As such it’s generally useless shit. I mean really, if you run a web site and you have the ability and knowledge of how to even see your access log, what are the chances that you are going to be gullible enough to think that the monkeys at http://free-blowjobs-while-playing-texas-holdem.biz are really linking to your site and you therefore click on the link? Slim.

UPDATE: Many people, including myself, have come up with far more legitimate reasons why spammers do target referral logs. Among them are: * Many sites include the “most recent referrers” on their pages, so any referrer spammers will get links and corresponding Google Juice. * Many sites have public links to their referral pages which can be spidered. * Site stats can be bloated by referral spammers so that getting an accurate record of your readership is all but impossible (hat-tip to Rudy). * Obviously the more referral spam hits you get the more bandwidth that gets used. For those on limited hosting plans or for hosts themselves, this can be costly and burdensome (hat-tip to glo). * Some blogs have scripts to pick up the most common referral addresses and link back to them (hat-tip to Shadow). Just wanted to set the record straight… it’s not as useless as I originally thought.

So, if it’s so useless why do I even blather on about it? Because it irritates me, that’s why. Whatever I can do to make certain that these conniving turds’ links never make it in front of my face is time well-spent. As such, I found this guy’s approach and gave it a shot. To be honest, I already had some .htaccess rules in place and had haphazardly kept them up to date, but this seemed cleaner. And it would be, too, if my hosting provider had an Apache version that supported the SetEnvIfNoCase directive in .htaccess files. But they don’t.

So I rolled my own. If you’re interested in battling the evil forces of referral spam somewhat more easily, read within.

Implementation

First, you must have access to your .htaccess file. Generally, this file will be at the root of your web server. Now, make certain that the following lines are in their somewhere, preferably toward the top.

RewriteEngine On
RewriteBase /

This gets the old Apache server ready to bust some heads. Now, I’m going to start you out with a hefty list of spamming asses. Copy this entire list verbatim (except for the very first line which should be modified to reflect your domain) and dump it in your .htaccess somewhere under the previous stuff.

UPDATE: Had to update the .htaccess rules a bit when I realized that if people come in via a search engine query on any of the terms — I get a startling number of search requests for “sex toy” (which, much to the disappointment of would-be searches, doesn’t actually lead to a sex toy) which matches one of the rules — will get forbidden. In addition, if by some chance you have one of the terms as a title in your posts, we need to let those through as well. As such, we have to explicitly allow these, like so:

RewriteCond %{HTTP_REFERER} !.*coldforged\.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !.*del.\icio\.us.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)google\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)yahoo\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)msn\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)a9\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)lycos\..*$ [NC]
RewriteCond %{REQUEST_URI} !/403.shtml$ [NC]

(Note: The first period following the search engine name should be escaped with a backslash. However, my code prettifier obviously hates me.) Obviously, change the “coldforged.org” to be your own site’s domain name. The rest of the lines, with few exceptions, are made up like so:

RewriteCond %{HTTP_REFERER} ^.*cm3.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bobbemer.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adultcam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*web-cam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adulthost.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bostonticket.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*wlten.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*mp3search.*$ [OR]^M

This is an htaccess condition that is true if the referrer of the request contains the word “loans” anywhere in it. The [OR] at the end tells the web server to treat the condition as a logical OR with the next condition. So, read as English it would read “if the referrer contains the word ‘loans’ or…” and we keep going down the list of conditions. If the conditions match, which in our case means that the referrer contained any of the words in the many conditions in that file, we’ll eventually execute the rule specified.

RewriteRule .* - [F,L]

Which tells Apache that we’re going to Forbid the requester from seeing the requested content and that’s the Last thing I’m going to say about that request. The requester will get a 403 (forbidden error) returned and we’ll display our special 403 web page for him where we taunt him and call him a dimwit.

Make certain that you have a 403.shtml file sitting there as well. This is the file that will be displayed to the would-be referrer spammer when they try to get in (or — as happened to me — to the poor, innocent lad that tried to enter your site as your were experimenting with all of this). This is the source of the 403.shtml page that I use (you can see it in action here).

What’s that ugly hunk of code?

Note that you’ll want to provide your own email address in that 403 page to cover the case where you accidentally block someone you want to let in. That big lump of code in the 403 page that mentions something about the “hiveware_enkoder” is a snazzy bit of code that obfuscates my email address. If you’re creating a page that will mostly be seen by spammers, it wouldn’t be overly smart to have an email address that is easily harvested. That hunk of code actually generates a clickable link to send me an email. To get your own code to paste in there, visit the Hiveware page.

(This is as good a time as any to visit the topic of who will see your 403 page. Chances are nobody will usually. There are programs out there specifically for spamming referrer logs. By the thousands. They don’t care what kind of return code they get from the web server. However, on the off chance that a relatively naive spammer just spoofs the referrer from his browser software, they’ll get the pleasure of being called a dimwit.)

Can’t you automate this a bit?

All that’s dandy. Now you’ve denied a whole load of folks from accessing your site and sticking their links in your face. But what if you get more? I created a small script that makes adding additional jerks to the .htaccess file a bit easier. Bear in mind that this script is only intended for WordPress installations, in that I use the WordPress user information to determine if the script is allowed to run. If you know your way around PHP, this is easily remedied, but I just wanted to warn you. You can see the source here. Copy that source into a file called “addforbidden.php” in your weblog home. Now, open that guy up (http://your-blog-url/addforbidden.php). You should be presented with a very simple form. Type in the word that you want to add to the list of denied words, then submit. A new condition will be added to the list you already added. Just in case anything goes wrong, a copy of your old .htaccess is saved in the same location with a date and time stamp on it so you can replace it if things go screwy.

Teach those referrer spamming jerks a lesson! Yeah, right… like they’re going to even know. But at least you can have the satisfaction of having a reduced number of them in front of you, as well as the rare opportunity to call them a dimwit. Joy!

37 Responses to “Killing Referral Spam”

Pages: « 4 3 [2] 1 » (Show All)

  1. 20

    Kevin Leitch Says:

    Thanks very much for this - just implemented it so we’ll see how it goes!

  2. 19

    ColdForged Says:

    While it’s true that most will never know they are being spammed, they will pay for it in bandwidth cost as more spammers get on the referral bandwagon.

    Too true and a problem that I admittedly didn’t think about.

    I’m going to link to this information once I get my blog moved to new servers. The more bloggers that can pass on this info the better in my opinion.

    Thanks for the kind words, and I certainly agree with you.

    BTW, I like the comment preview feature on your blog. I haven’t looked at the rest of your blog so I’m hoping that there is info on this feature here.

    I would never do such a thing. ;) Though, in all honesty, it doesn’t detail all of it, like the live updating of the name and URL. But that’s relatively straightforward and I’d happily detail the technique if there was interest.

  3. 18

    glo Says:

    This looks promising and while I’ve stopped referral spam on my blog with some php code, this might be a better way to go simply because it will not use up as much server resources. The php code I use actually checks the referral’s domain for its IP and then sends the referral back to their logs. I love the idea and it has stopped them from even trying to spam my logs but as the number of spammers grow it will slow down my sites load time.

    I just wanted to comment on the cost of this kind of spam to the blog owners. While it’s true that most will never know they are being spammed, they will pay for it in bandwidth cost as more spammers get on the referral bandwagon. I run a small hosting company (a reseller plan) and can see the bandwidth increase with all the referral spam hitting mine and my clients blogs. While none of my clients have gone over their allowed bandwidth usage, a few have come close all because of referral spam with a small amount of legit referrals. That’s a non issue for now but I can see it becoming one if they aren’t stopped.

    I’m going to link to this information once I get my blog moved to new servers. The more bloggers that can pass on this info the better in my opinion. If you haven’t posted this on the WP message boards, you should.

    BTW, I like the comment preview feature on your blog. I haven’t looked at the rest of your blog so I’m hoping that there is info on this feature here. :)

  4. 17

    ColdForged Says:

    Thanks for the notice, Jinjiru :) .

  5. 16

    Jinjiru Says:

    Oops, sorry for these duplicated trackbacks, please remove one of them!

  6. 15

    Jinjiru's Retrobox Says:

    Trackbacks and Referrers Spam

    Expression Engine (which I use now with all my projects) has several spam-prevention features which one can use to reduce the amount of spam on his site. They include: Comments Spam Prevention Tools CAPTCHA - helps to check whether the person w…

  7. 14

    sensory output » Much Better Says:

    ng nonetheless when I look at my shortstats. Here are some solutions I will be looking at: one, two, and three. With time permitting later today, I will post Day 2 o […]

  8. 13

    ryan Says:

    Thank you. I hate spam.

  9. 12

    Dorothea Salo Says:

    Brilliantness! Love it. Will link to it.

    And yes, plugin, please! I so hate doing this by hand; you have NO idea. Even better — plugin that can hook into multiple WP blogs. Though that might be impossible.

  10. 11

    ColdForged Says:

    Ruby, I thought of making it a plugin, I really did and maybe I still will. It just doesn’t fit the format of a plugin in my brain — it does nothing with posts, nothing with comments, really provides no web page functionality whatsoever… aside from denying people. The only thing you’d have is an interface, which is what I tried to provide in an extremely rudimentary form.

Pages: « 4 3 [2] 1 » (Show All)

Leave a Reply

How do I get a cool icon like yours? Obviously "cool" is subjective, but you can have your own icon displayed here by signing up for a gravatar. Note that I currently accept up to an R-rated icon though that may change in the future.

You may use Markdown syntax in your comments.

Name

Mail (never published)

Website

In order to comply with COPPA and cover my own ass, you must be 13 or older to post a comment here. Period, no exceptions.

Comment Preview

  1. 38

    Someone Says: