» Who knew that Mister T was such a fashion maven? (0)

» "And right then," Knox said, "I heard, 'Excuse me, would it be OK if we carried her around and she touched each bag?'" Sportsmanship defined. (0)

» Web-based sequence diagram generator. Whoda thunk? Next thing you know you'll be able to buy stuff online. (0)

Killing Referral SpamKilling Referral Spam

UPDATE: See an update to the rules here.

It all started with Paris Hilton. Though, really, what doesn’t?

Referral spam is silly to me. For those unaware, “referrer/referral spam” refers to the practice of sending a request for a web page — like my weblog here at http://www.coldforged.org — and changing the request headers such that the request appears to be coming from some site. At first glance, this doesn’t seem overly effective, right? What these cretins are trying to do is get you, the person who owns the web page, to see that referrer link and think to yourself, “Self, wow, someone is linking to me! I should visit and see who it is!” You therefore theoretically visit the site and see whatever it is that the spammer wishes you to see, be it porn, gambling, mortgages, or the aforementioned Ms. Hilton. That’s it. That’s their entire raison d’etre. It doesn’t help with Google-juice, it doesn’t reach your readership, it’s solely targeting the blog/web site author.

As such it’s generally useless shit. I mean really, if you run a web site and you have the ability and knowledge of how to even see your access log, what are the chances that you are going to be gullible enough to think that the monkeys at http://free-blowjobs-while-playing-texas-holdem.biz are really linking to your site and you therefore click on the link? Slim.

UPDATE: Many people, including myself, have come up with far more legitimate reasons why spammers do target referral logs. Among them are: * Many sites include the “most recent referrers” on their pages, so any referrer spammers will get links and corresponding Google Juice. * Many sites have public links to their referral pages which can be spidered. * Site stats can be bloated by referral spammers so that getting an accurate record of your readership is all but impossible (hat-tip to Rudy). * Obviously the more referral spam hits you get the more bandwidth that gets used. For those on limited hosting plans or for hosts themselves, this can be costly and burdensome (hat-tip to glo). * Some blogs have scripts to pick up the most common referral addresses and link back to them (hat-tip to Shadow). Just wanted to set the record straight… it’s not as useless as I originally thought.

So, if it’s so useless why do I even blather on about it? Because it irritates me, that’s why. Whatever I can do to make certain that these conniving turds’ links never make it in front of my face is time well-spent. As such, I found this guy’s approach and gave it a shot. To be honest, I already had some .htaccess rules in place and had haphazardly kept them up to date, but this seemed cleaner. And it would be, too, if my hosting provider had an Apache version that supported the SetEnvIfNoCase directive in .htaccess files. But they don’t.

So I rolled my own. If you’re interested in battling the evil forces of referral spam somewhat more easily, read within.

Implementation

First, you must have access to your .htaccess file. Generally, this file will be at the root of your web server. Now, make certain that the following lines are in their somewhere, preferably toward the top.

RewriteEngine On
RewriteBase /

This gets the old Apache server ready to bust some heads. Now, I’m going to start you out with a hefty list of spamming asses. Copy this entire list verbatim (except for the very first line which should be modified to reflect your domain) and dump it in your .htaccess somewhere under the previous stuff.

UPDATE: Had to update the .htaccess rules a bit when I realized that if people come in via a search engine query on any of the terms — I get a startling number of search requests for “sex toy” (which, much to the disappointment of would-be searches, doesn’t actually lead to a sex toy) which matches one of the rules — will get forbidden. In addition, if by some chance you have one of the terms as a title in your posts, we need to let those through as well. As such, we have to explicitly allow these, like so:

RewriteCond %{HTTP_REFERER} !.*coldforged\.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !.*del.\icio\.us.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)google\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)yahoo\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)msn\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)a9\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)lycos\..*$ [NC]
RewriteCond %{REQUEST_URI} !/403.shtml$ [NC]

(Note: The first period following the search engine name should be escaped with a backslash. However, my code prettifier obviously hates me.) Obviously, change the “coldforged.org” to be your own site’s domain name. The rest of the lines, with few exceptions, are made up like so:

RewriteCond %{HTTP_REFERER} ^.*cm3.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bobbemer.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adultcam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*web-cam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adulthost.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bostonticket.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*wlten.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*mp3search.*$ [OR]^M

This is an htaccess condition that is true if the referrer of the request contains the word “loans” anywhere in it. The [OR] at the end tells the web server to treat the condition as a logical OR with the next condition. So, read as English it would read “if the referrer contains the word ‘loans’ or…” and we keep going down the list of conditions. If the conditions match, which in our case means that the referrer contained any of the words in the many conditions in that file, we’ll eventually execute the rule specified.

RewriteRule .* - [F,L]

Which tells Apache that we’re going to Forbid the requester from seeing the requested content and that’s the Last thing I’m going to say about that request. The requester will get a 403 (forbidden error) returned and we’ll display our special 403 web page for him where we taunt him and call him a dimwit.

Make certain that you have a 403.shtml file sitting there as well. This is the file that will be displayed to the would-be referrer spammer when they try to get in (or — as happened to me — to the poor, innocent lad that tried to enter your site as your were experimenting with all of this). This is the source of the 403.shtml page that I use (you can see it in action here).

What’s that ugly hunk of code?

Note that you’ll want to provide your own email address in that 403 page to cover the case where you accidentally block someone you want to let in. That big lump of code in the 403 page that mentions something about the “hiveware_enkoder” is a snazzy bit of code that obfuscates my email address. If you’re creating a page that will mostly be seen by spammers, it wouldn’t be overly smart to have an email address that is easily harvested. That hunk of code actually generates a clickable link to send me an email. To get your own code to paste in there, visit the Hiveware page.

(This is as good a time as any to visit the topic of who will see your 403 page. Chances are nobody will usually. There are programs out there specifically for spamming referrer logs. By the thousands. They don’t care what kind of return code they get from the web server. However, on the off chance that a relatively naive spammer just spoofs the referrer from his browser software, they’ll get the pleasure of being called a dimwit.)

Can’t you automate this a bit?

All that’s dandy. Now you’ve denied a whole load of folks from accessing your site and sticking their links in your face. But what if you get more? I created a small script that makes adding additional jerks to the .htaccess file a bit easier. Bear in mind that this script is only intended for WordPress installations, in that I use the WordPress user information to determine if the script is allowed to run. If you know your way around PHP, this is easily remedied, but I just wanted to warn you. You can see the source here. Copy that source into a file called “addforbidden.php” in your weblog home. Now, open that guy up (http://your-blog-url/addforbidden.php). You should be presented with a very simple form. Type in the word that you want to add to the list of denied words, then submit. A new condition will be added to the list you already added. Just in case anything goes wrong, a copy of your old .htaccess is saved in the same location with a date and time stamp on it so you can replace it if things go screwy.

Teach those referrer spamming jerks a lesson! Yeah, right… like they’re going to even know. But at least you can have the satisfaction of having a reduced number of them in front of you, as well as the rare opportunity to call them a dimwit. Joy!

37 Responses to “Killing Referral Spam”

Pages: [4] 3 2 1 » (Show All)

  1. 37

    John Says:

    Hi there
    Excellent data. I was wondering, with respect to this sample line of code that you provided where and how would one type the country identifier for a domain such as mine, “larkin.net.au”?

    RewriteCond %{HTTP_REFERER} !.coldforged.org/.$ [NC]

    Regards

    John

  2. 36

    Brent Says:

    I’m getting the following message when I click on the “this entire list” link:

    Warning: main() [function.main]: openbasedir restriction in effect. File(/home2/coldfor/publichtml/scgi-bin/refer/refer.php) is not within the allowed path(s): (/home/coldforg/:/usr/lib/php:/usr/local/lib/php:/tmp) in /home/coldforg/publichtml/source/viewsource.php on line 2

    Warning: main(/home2/coldfor/publichtml/scgi-bin/refer/refer.php) [function.main]: failed to open stream: Operation not permitted in /home/coldforg/publichtml/source/view_source.php on line 2

    Warning: main() [function.main]: openbasedir restriction in effect. File(/home2/coldfor/publichtml/scgi-bin/refer/refer.php) is not within the allowed path(s): (/home/coldforg/:/usr/lib/php:/usr/local/lib/php:/tmp) in /home/coldforg/publichtml/source/viewsource.php on line 2

    Warning: main(/home2/coldfor/publichtml/scgi-bin/refer/refer.php) [function.main]: failed to open stream: Operation not permitted in /home/coldforg/publichtml/source/view_source.php on line 2

    Warning: main() [function.include]: Failed opening ‘/home2/coldfor/publichtml/scgi-bin/refer/refer.php’ for inclusion (includepath=’.:/usr/lib/php:/usr/local/lib/php’) in /home/coldforg/publichtml/source/viewsource.php on line 2
    Unauthorized

  3. 35

    Meryl.net Says:

    Poker Sites Screws up Web Stats

    Jason talked about Disney showing up in his blog stats. The same thing happened to me around the time of his posting. Now, instead I’m seeing a flood of poker this, poker that, poker all the time showing up all over my stats. They showed up befor…

  4. 34

    ddhr.org Says:

    […] I tried using WP-ShortStat in the past, but stopped because I was getting tons and tons of referrer/referral spam, and all this spam was filling up my database. I looked into it and found a way to reject referrer spam by using .htaccess. I tried a few things out and it seems to be working great. I’ll be continually updating the keywords it uses to reject spam, but that’s something that I’ll probably be doing less and less over time. Tue, 07 Mar 2006 @ 11:15:54 in Updates Leave a comment Name: […]

  5. 33

    William Says:

    Thanks For your work.

    I think it’s useful to many people.

  6. 32

    The Life of Me. » Archives » Spaghetti & Spam Says:

    […] them as their own, I called upon my good buddy ColdForged and his awesome tutourial “Killing Referral Spam” using creative .htaccess rules to stop the Refe […]

  7. 31

    All Day Permanent Red » Blog Archive » Killing referral spam with htaccess Says:

    […] great majority of referral spam. Check out this post at […]

Pages: [4] 3 2 1 » (Show All)

Leave a Reply

How do I get a cool icon like yours? Obviously "cool" is subjective, but you can have your own icon displayed here by signing up for a gravatar. Note that I currently accept up to an R-rated icon though that may change in the future.

You may use Markdown syntax in your comments.

Name

Mail (never published)

Website

In order to comply with COPPA and cover my own ass, you must be 13 or older to post a comment here. Period, no exceptions.

Comment Preview

  1. 38

    Someone Says: