UPDATE: See an update to the rules here.
It all started with Paris Hilton. Though, really, what doesn’t?
Referral spam is silly to me. For those unaware, “referrer/referral spam” refers to the practice of sending a request for a web page — like my weblog here at http://www.coldforged.org — and changing the request headers such that the request appears to be coming from some site. At first glance, this doesn’t seem overly effective, right? What these cretins are trying to do is get you, the person who owns the web page, to see that referrer link and think to yourself, “Self, wow, someone is linking to me! I should visit and see who it is!” You therefore theoretically visit the site and see whatever it is that the spammer wishes you to see, be it porn, gambling, mortgages, or the aforementioned Ms. Hilton. That’s it. That’s their entire raison d’etre. It doesn’t help with Google-juice, it doesn’t reach your readership, it’s solely targeting the blog/web site author.
As such it’s generally useless shit. I mean really, if you run a web site and you have the ability and knowledge of how to even see your access log, what are the chances that you are going to be gullible enough to think that the monkeys at http://free-blowjobs-while-playing-texas-holdem.biz are really linking to your site and you therefore click on the link? Slim.
UPDATE: Many people, including myself, have come up with far more legitimate reasons why spammers do target referral logs. Among them are: * Many sites include the “most recent referrers” on their pages, so any referrer spammers will get links and corresponding Google Juice. * Many sites have public links to their referral pages which can be spidered. * Site stats can be bloated by referral spammers so that getting an accurate record of your readership is all but impossible (hat-tip to Rudy). * Obviously the more referral spam hits you get the more bandwidth that gets used. For those on limited hosting plans or for hosts themselves, this can be costly and burdensome (hat-tip to glo). * Some blogs have scripts to pick up the most common referral addresses and link back to them (hat-tip to Shadow). Just wanted to set the record straight… it’s not as useless as I originally thought.
So, if it’s so useless why do I even blather on about it? Because it irritates me, that’s why. Whatever I can do to make certain that these conniving turds’ links never make it in front of my face is time well-spent. As such, I found this guy’s approach and gave it a shot. To be honest, I already had some .htaccess rules in place and had haphazardly kept them up to date, but this seemed cleaner. And it would be, too, if my hosting provider had an Apache version that supported the SetEnvIfNoCase directive in .htaccess files. But they don’t.
So I rolled my own. If you’re interested in battling the evil forces of referral spam somewhat more easily, read within.
Implementation
First, you must have access to your .htaccess file. Generally, this file will be at the root of your web server. Now, make certain that the following lines are in their somewhere, preferably toward the top.
RewriteEngine On RewriteBase /
This gets the old Apache server ready to bust some heads. Now, I’m going to start you out with a hefty list of spamming asses. Copy this entire list verbatim (except for the very first line which should be modified to reflect your domain) and dump it in your .htaccess somewhere under the previous stuff.
UPDATE: Had to update the .htaccess rules a bit when I realized that if people come in via a search engine query on any of the terms — I get a startling number of search requests for “sex toy” (which, much to the disappointment of would-be searches, doesn’t actually lead to a sex toy) which matches one of the rules — will get forbidden. In addition, if by some chance you have one of the terms as a title in your posts, we need to let those through as well. As such, we have to explicitly allow these, like so:
RewriteCond %{HTTP_REFERER} !.*coldforged\.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !.*del.\icio\.us.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)google\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)yahoo\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)msn\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)a9\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)lycos\..*$ [NC]
RewriteCond %{REQUEST_URI} !/403.shtml$ [NC](Note: The first period following the search engine name should be escaped with a backslash. However, my code prettifier obviously hates me.) Obviously, change the “coldforged.org” to be your own site’s domain name. The rest of the lines, with few exceptions, are made up like so:
RewriteCond %{HTTP_REFERER} ^.*cm3.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bobbemer.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adultcam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*web-cam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adulthost.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bostonticket.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*wlten.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*mp3search.*$ [OR]^MThis is an htaccess condition that is true if the referrer of the request contains the word “loans” anywhere in it. The [OR] at the end tells the web server to treat the condition as a logical OR with the next condition. So, read as English it would read “if the referrer contains the word ‘loans’ or…” and we keep going down the list of conditions. If the conditions match, which in our case means that the referrer contained any of the words in the many conditions in that file, we’ll eventually execute the rule specified.
RewriteRule .* - [F,L]
Which tells Apache that we’re going to Forbid the requester from seeing the requested content and that’s the Last thing I’m going to say about that request. The requester will get a 403 (forbidden error) returned and we’ll display our special 403 web page for him where we taunt him and call him a dimwit.
Make certain that you have a 403.shtml file sitting there as well. This is the file that will be displayed to the would-be referrer spammer when they try to get in (or — as happened to me — to the poor, innocent lad that tried to enter your site as your were experimenting with all of this). This is the source of the 403.shtml page that I use (you can see it in action here).
What’s that ugly hunk of code?
Note that you’ll want to provide your own email address in that 403 page to cover the case where you accidentally block someone you want to let in. That big lump of code in the 403 page that mentions something about the “hiveware_enkoder” is a snazzy bit of code that obfuscates my email address. If you’re creating a page that will mostly be seen by spammers, it wouldn’t be overly smart to have an email address that is easily harvested. That hunk of code actually generates a clickable link to send me an email. To get your own code to paste in there, visit the Hiveware page.
(This is as good a time as any to visit the topic of who will see your 403 page. Chances are nobody will usually. There are programs out there specifically for spamming referrer logs. By the thousands. They don’t care what kind of return code they get from the web server. However, on the off chance that a relatively naive spammer just spoofs the referrer from his browser software, they’ll get the pleasure of being called a dimwit.)
Can’t you automate this a bit?
All that’s dandy. Now you’ve denied a whole load of folks from accessing your site and sticking their links in your face. But what if you get more? I created a small script that makes adding additional jerks to the .htaccess file a bit easier. Bear in mind that this script is only intended for WordPress installations, in that I use the WordPress user information to determine if the script is allowed to run. If you know your way around PHP, this is easily remedied, but I just wanted to warn you. You can see the source here. Copy that source into a file called “addforbidden.php” in your weblog home. Now, open that guy up (http://your-blog-url/addforbidden.php). You should be presented with a very simple form. Type in the word that you want to add to the list of denied words, then submit. A new condition will be added to the list you already added. Just in case anything goes wrong, a copy of your old .htaccess is saved in the same location with a date and time stamp on it so you can replace it if things go screwy.
Teach those referrer spamming jerks a lesson! Yeah, right… like they’re going to even know. But at least you can have the satisfaction of having a reduced number of them in front of you, as well as the rare opportunity to call them a dimwit. Joy!



Ruby Sinreich Says:January 28th, 2005 at 9:42 am
Referrer spam isn’t harmless - it can completely trash your site stats. For example it makes it appear as if I am getting thousands of visitors daily to my WordPress blog, which is (sadly) not the case. I think it’s more like hundreds, but I have no reliable stats now.
I will try messing with this solution. Any chance it could be packaged as a plug-in or something nice-n-easy in the future?

ColdForged Says:January 27th, 2005 at 4:18 pm
Yes, Adam was the “poor, innocent lad” referenced in the story. Though I do have to share that I have received one additional request from someone, bless his soul.
He’s actually the reason I discovered the search engine problem. He sent me a message that said, and I quote:
Like I said, I get a lot of queries for “sex toy” and “sex chair” from Google Images because of this post. That picture just happens to have an alt text of “Not A Sex Toy”, which is enough to score me scads of prurient searchers that go away frustrated.
This guy just wanted his sex chair and was denied because of my search inadvertent engine blocking.

AdamStac Says:January 27th, 2005 at 4:08 pm
The bad thing about doing it that way is a case of a person like me browsing your blog, and somehow getting denied access through a “fluke”. Which is what happened here at CF. I was just navigating as normal, and BAM I get hit with the custom 403 page! CF would have been oblivious to this if it hadn’t been for the email link in that page. Nice touch BTW.
So you see, you may be alienating some of your visitors for no reason, and you would never know if your site continues to ban them.

marlyse.com » coldforged.org » Killing Referral Spam Says:January 27th, 2005 at 10:50 am
a script that works with WP to AUTOMATE the entry of new spammers into the .htaccess file: coldforged.org » Killing Referral Spam - ClickToPrint […]

marlyse Says:January 27th, 2005 at 10:48 am
I found this after you commented on BloggingPro on a comment I had made. The code you show here - and motivation why to use it in the first place, even if those spammers usually won’t even know about it - is the same as I have been using successfully (though I don’t redirect to a semi-polite 403 page but to another site, “youasshole.com”; that gives ME some satisfaction, thinking that maybe, maybe at least ONE person will get my point when being redirected).
My list has meanwhile gotten really long and your script will come in VERY handy. Also, thanks for explaining the code this well, finally I KNOW what I am actually putting into my .htaccess file.
This is great. Thanks.

Mike Says:January 26th, 2005 at 3:26 pm
It’s not as useless as you think. A lot of this referrer spamming is due to the fact that many websites list referrers or list their web stats publically, which is where the spammers also benefit because more links equals higher search engine ranking.

coldforged.org » Another adjustment to the referrer spam killing Says:January 26th, 2005 at 2:46 pm
Another adjustment to the referrer spam killing

ColdForged Says:January 26th, 2005 at 10:00 am
My pleasure, I hope you find it useful!
andNo, it doesn’t matter one bit really. (Actually, I’ve been thinking about it and the only other possible benefit they could be going after is the people that list the “Last XX Referrers” on their page. If they’re not filtered they get some exposure and possible Google-juice.) If you never look at your referrers and they are never displayed on your site to be crawled by search engines, they don’t really hurt anyone. They are, however, irritating to me which is why I try to kill them
.
Thanks, man! Just for you I’m adding “pwnz” and “btw” into the dictionary
.

agnOstos Says:January 26th, 2005 at 9:10 am
I’m going to be creating a page for a group of friends. Although I don’t think it’s going to be a blog format. Does it really matter if the referrals are false? Especially if you don’t look at them?
Spellcheck pwnz btw.

AdamStac Says:January 26th, 2005 at 12:59 am
I have access again…I thought that you had banned me for good (…kidding).
I will have to give this a far better read tomorrow…I need to rid myself of those bastards! But like you said they are pretty harmless, or are they? Hmm…
Awesome of you to hand out your preventative measures too CF.