UPDATE: See an update to the rules here.
It all started with Paris Hilton. Though, really, what doesn’t?
Referral spam is silly to me. For those unaware, “referrer/referral spam” refers to the practice of sending a request for a web page — like my weblog here at http://www.coldforged.org — and changing the request headers such that the request appears to be coming from some site. At first glance, this doesn’t seem overly effective, right? What these cretins are trying to do is get you, the person who owns the web page, to see that referrer link and think to yourself, “Self, wow, someone is linking to me! I should visit and see who it is!” You therefore theoretically visit the site and see whatever it is that the spammer wishes you to see, be it porn, gambling, mortgages, or the aforementioned Ms. Hilton. That’s it. That’s their entire raison d’etre. It doesn’t help with Google-juice, it doesn’t reach your readership, it’s solely targeting the blog/web site author.
As such it’s generally useless shit. I mean really, if you run a web site and you have the ability and knowledge of how to even see your access log, what are the chances that you are going to be gullible enough to think that the monkeys at http://free-blowjobs-while-playing-texas-holdem.biz are really linking to your site and you therefore click on the link? Slim.
UPDATE: Many people, including myself, have come up with far more legitimate reasons why spammers do target referral logs. Among them are: * Many sites include the “most recent referrers” on their pages, so any referrer spammers will get links and corresponding Google Juice. * Many sites have public links to their referral pages which can be spidered. * Site stats can be bloated by referral spammers so that getting an accurate record of your readership is all but impossible (hat-tip to Rudy). * Obviously the more referral spam hits you get the more bandwidth that gets used. For those on limited hosting plans or for hosts themselves, this can be costly and burdensome (hat-tip to glo). * Some blogs have scripts to pick up the most common referral addresses and link back to them (hat-tip to Shadow). Just wanted to set the record straight… it’s not as useless as I originally thought.
So, if it’s so useless why do I even blather on about it? Because it irritates me, that’s why. Whatever I can do to make certain that these conniving turds’ links never make it in front of my face is time well-spent. As such, I found this guy’s approach and gave it a shot. To be honest, I already had some .htaccess rules in place and had haphazardly kept them up to date, but this seemed cleaner. And it would be, too, if my hosting provider had an Apache version that supported the SetEnvIfNoCase directive in .htaccess files. But they don’t.
So I rolled my own. If you’re interested in battling the evil forces of referral spam somewhat more easily, read within.
Implementation
First, you must have access to your .htaccess file. Generally, this file will be at the root of your web server. Now, make certain that the following lines are in their somewhere, preferably toward the top.
RewriteEngine On RewriteBase /
This gets the old Apache server ready to bust some heads. Now, I’m going to start you out with a hefty list of spamming asses. Copy this entire list verbatim (except for the very first line which should be modified to reflect your domain) and dump it in your .htaccess somewhere under the previous stuff.
UPDATE: Had to update the .htaccess rules a bit when I realized that if people come in via a search engine query on any of the terms — I get a startling number of search requests for “sex toy” (which, much to the disappointment of would-be searches, doesn’t actually lead to a sex toy) which matches one of the rules — will get forbidden. In addition, if by some chance you have one of the terms as a title in your posts, we need to let those through as well. As such, we have to explicitly allow these, like so:
RewriteCond %{HTTP_REFERER} !.*coldforged\.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !.*del.\icio\.us.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)google\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)yahoo\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)msn\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)a9\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)lycos\..*$ [NC]
RewriteCond %{REQUEST_URI} !/403.shtml$ [NC](Note: The first period following the search engine name should be escaped with a backslash. However, my code prettifier obviously hates me.) Obviously, change the “coldforged.org” to be your own site’s domain name. The rest of the lines, with few exceptions, are made up like so:
RewriteCond %{HTTP_REFERER} ^.*cm3.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bobbemer.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adultcam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*web-cam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adulthost.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bostonticket.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*wlten.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*mp3search.*$ [OR]^MThis is an htaccess condition that is true if the referrer of the request contains the word “loans” anywhere in it. The [OR] at the end tells the web server to treat the condition as a logical OR with the next condition. So, read as English it would read “if the referrer contains the word ‘loans’ or…” and we keep going down the list of conditions. If the conditions match, which in our case means that the referrer contained any of the words in the many conditions in that file, we’ll eventually execute the rule specified.
RewriteRule .* - [F,L]
Which tells Apache that we’re going to Forbid the requester from seeing the requested content and that’s the Last thing I’m going to say about that request. The requester will get a 403 (forbidden error) returned and we’ll display our special 403 web page for him where we taunt him and call him a dimwit.
Make certain that you have a 403.shtml file sitting there as well. This is the file that will be displayed to the would-be referrer spammer when they try to get in (or — as happened to me — to the poor, innocent lad that tried to enter your site as your were experimenting with all of this). This is the source of the 403.shtml page that I use (you can see it in action here).
What’s that ugly hunk of code?
Note that you’ll want to provide your own email address in that 403 page to cover the case where you accidentally block someone you want to let in. That big lump of code in the 403 page that mentions something about the “hiveware_enkoder” is a snazzy bit of code that obfuscates my email address. If you’re creating a page that will mostly be seen by spammers, it wouldn’t be overly smart to have an email address that is easily harvested. That hunk of code actually generates a clickable link to send me an email. To get your own code to paste in there, visit the Hiveware page.
(This is as good a time as any to visit the topic of who will see your 403 page. Chances are nobody will usually. There are programs out there specifically for spamming referrer logs. By the thousands. They don’t care what kind of return code they get from the web server. However, on the off chance that a relatively naive spammer just spoofs the referrer from his browser software, they’ll get the pleasure of being called a dimwit.)
Can’t you automate this a bit?
All that’s dandy. Now you’ve denied a whole load of folks from accessing your site and sticking their links in your face. But what if you get more? I created a small script that makes adding additional jerks to the .htaccess file a bit easier. Bear in mind that this script is only intended for WordPress installations, in that I use the WordPress user information to determine if the script is allowed to run. If you know your way around PHP, this is easily remedied, but I just wanted to warn you. You can see the source here. Copy that source into a file called “addforbidden.php” in your weblog home. Now, open that guy up (http://your-blog-url/addforbidden.php). You should be presented with a very simple form. Type in the word that you want to add to the list of denied words, then submit. A new condition will be added to the list you already added. Just in case anything goes wrong, a copy of your old .htaccess is saved in the same location with a date and time stamp on it so you can replace it if things go screwy.
Teach those referrer spamming jerks a lesson! Yeah, right… like they’re going to even know. But at least you can have the satisfaction of having a reduced number of them in front of you, as well as the rare opportunity to call them a dimwit. Joy!



AdamStac Says:March 31st, 2005 at 8:25 pm
CF, I finally got around to adding this solution since I have been inundated with spam comments, and spam referrers. Luckily I recently installed Spam Karma or I would be moderating over 114 spam comments in the last few days.
Two questions for you:
addforbidden.php can’t make a backup of the original .htaccess file. I get 2 errors, can you help with this? (CHMOD=666)
Would it be possible when accessing addforbidden.php that it would query the current .htaccess file to pull the current jerks already being blocked?
I wanted to get this installed before the month turns so I can see the difference in the stats of my web server. Very cool of you to share this with everyone.
Also, how did you add the “view source” feature?

kristy Says:March 24th, 2005 at 1:11 am
wowo nice work, i think i may use this!!

Shadow Says:March 8th, 2005 at 8:26 pm
Another reason people use referral spam is because some blogs have scripts to pick up the most common referral addresses and link back to them. Just thought you might want to mention this in your article, because it creates a much greater incentive for spammers to use this technique.

ColdForged Says:March 2nd, 2005 at 11:34 am
Clint, my apologies, your comment got marked as spam
. Teach you to use those bad words in a comment, eh?
It sounds as though the .htaccess file isn’t being used by your webserver. Make certain that it is, in fact, called .htaccess and that everyone has read permission on that file. Yes, those files will go in your WordPress folder. Make sure that your rewrite base is set correctly for your WordPress installation. WordPress should modify your .htaccess when you have it set up your permalinks… just put the stuff that I provide under that and the rewrite base should be dandy.
Those are there in case you diddle your .htaccess file completely somehow. You are more than welcome to delete them, though. They’re essentially a copy of the file before each change.

clint Says:February 27th, 2005 at 3:18 pm
excellent tutorial, have 2 questions I hope you can answer for me.
1) the same dirt merchant still appears on my referrers list, the- isacommie, buy-cialis, buy-ambien, musicbox1, and all of those are in the list.
2) it seems adding a url using the script- that it makes a copy of my htaccess file each time with string of numbers after it, is this some sort of temp cache file and can I delete them? after looking at my main htaccess file after adding a name the main file does have the name I just added.
and to be sure, all of these files, the htaccess, the 403 and the script are in my wordpress directory is that correct?

clint Says:February 26th, 2005 at 3:16 pm
thanks alot for this tutorial, ive been searching about alot lately as my referrals have been flooded by this creep. I followed your steps but have a couple questions.
1) looking at my referrals I still see him getting thru after adding to the forbidden list. mainly “musicbox1″ and the numerous “buy-bontril,buy-levitra, and isacommie.com’s” keep getting thru.
2) I notice that after adding a forbidden I get a copy of the htaccess added with a long string of numbers after it, are these temp files? should they be there and can I delete them?
just to confirm, i’m using wordpress and all of the following files reside in my wordpress directory, the htaccess, the 403.shtml, and the addforbidden.php
thanks for the help!

loid Says:February 24th, 2005 at 12:25 pm
Nice stuff with the .htaccess
I thought as someone else not fond of referral-log SpamLosers, you might be interested in a little study I just did on the biggest one hammering my blog month after month. Turns out he’s an affiliate marketer for 13 online casinos. The casinos belong to an association that says they’re all dead-set against spam. Read the post if you’re interested and have a minute. I think it gives bloggers a tool to hit this guy where it hurts - in his wallet. Here’s the link
loid

ColdForged Says:February 23rd, 2005 at 9:35 pm
Hi Ken. Unfortunately, I’m going to have to say I have a thought and that thought is “you’ve failed to properly follow my instructions”
. You need to make certain that all of this is in your .htaccess to start with. Then the script can be used to add additional losers to the denial list.
I should probably change it so that if that information isn’t in the .htaccess it is automatically added, but I just haven’t gotten that fancy with it.
Good luck in your battle against the insidious hordes.

Ken Says:February 23rd, 2005 at 7:33 pm
Coldforged, I love the idea of your automation script, but can’t seem to get it working properly! I keep getting “Couldn’t add the word.” Through FTP, I’ve verified that a new .htaccess is being written, and that the backup of it with date added are created, the only problem is that there are no new lines in the replacement .htaccess file written. I made sure to add the 403.shtml page to the blog directory, the blog is in a sub-directory, the main URL hosts static pages.
Do you have any thoughts as to where the problem may lie? (even if your thought is, “I’ve failed to properly follow your instructions”:-) We do not host the site on our own computer, rather we buy space on a shared server. Another clue we have is that if we break the .htaccess file in such a way that we deny ourselves access, our 403.shtml page is not returned, but rather some other default 403 error page. CPanel reports that .shtml is defined as a “system wide” server-parsed Apache handler.

MassiTwoSteps.net » L’eterna lotta tra il blogger e lo spammer (reprise) Says:February 22nd, 2005 at 11:35 am
[…] eare o modificare il file .htaccess: si può bannare i domini, gli IP e le parole chiave, o reindirizzare gli spammer verso un 403 forbidden err […]