» Who knew that Mister T was such a fashion maven? (0)

» "And right then," Knox said, "I heard, 'Excuse me, would it be OK if we carried her around and she touched each bag?'" Sportsmanship defined. (0)

» Web-based sequence diagram generator. Whoda thunk? Next thing you know you'll be able to buy stuff online. (0)

Killing Referral SpamKilling Referral Spam

UPDATE: See an update to the rules here.

It all started with Paris Hilton. Though, really, what doesn’t?

Referral spam is silly to me. For those unaware, “referrer/referral spam” refers to the practice of sending a request for a web page — like my weblog here at http://www.coldforged.org — and changing the request headers such that the request appears to be coming from some site. At first glance, this doesn’t seem overly effective, right? What these cretins are trying to do is get you, the person who owns the web page, to see that referrer link and think to yourself, “Self, wow, someone is linking to me! I should visit and see who it is!” You therefore theoretically visit the site and see whatever it is that the spammer wishes you to see, be it porn, gambling, mortgages, or the aforementioned Ms. Hilton. That’s it. That’s their entire raison d’etre. It doesn’t help with Google-juice, it doesn’t reach your readership, it’s solely targeting the blog/web site author.

As such it’s generally useless shit. I mean really, if you run a web site and you have the ability and knowledge of how to even see your access log, what are the chances that you are going to be gullible enough to think that the monkeys at http://free-blowjobs-while-playing-texas-holdem.biz are really linking to your site and you therefore click on the link? Slim.

UPDATE: Many people, including myself, have come up with far more legitimate reasons why spammers do target referral logs. Among them are: * Many sites include the “most recent referrers” on their pages, so any referrer spammers will get links and corresponding Google Juice. * Many sites have public links to their referral pages which can be spidered. * Site stats can be bloated by referral spammers so that getting an accurate record of your readership is all but impossible (hat-tip to Rudy). * Obviously the more referral spam hits you get the more bandwidth that gets used. For those on limited hosting plans or for hosts themselves, this can be costly and burdensome (hat-tip to glo). * Some blogs have scripts to pick up the most common referral addresses and link back to them (hat-tip to Shadow). Just wanted to set the record straight… it’s not as useless as I originally thought.

So, if it’s so useless why do I even blather on about it? Because it irritates me, that’s why. Whatever I can do to make certain that these conniving turds’ links never make it in front of my face is time well-spent. As such, I found this guy’s approach and gave it a shot. To be honest, I already had some .htaccess rules in place and had haphazardly kept them up to date, but this seemed cleaner. And it would be, too, if my hosting provider had an Apache version that supported the SetEnvIfNoCase directive in .htaccess files. But they don’t.

So I rolled my own. If you’re interested in battling the evil forces of referral spam somewhat more easily, read within.

Implementation

First, you must have access to your .htaccess file. Generally, this file will be at the root of your web server. Now, make certain that the following lines are in their somewhere, preferably toward the top.

RewriteEngine On
RewriteBase /

This gets the old Apache server ready to bust some heads. Now, I’m going to start you out with a hefty list of spamming asses. Copy this entire list verbatim (except for the very first line which should be modified to reflect your domain) and dump it in your .htaccess somewhere under the previous stuff.

UPDATE: Had to update the .htaccess rules a bit when I realized that if people come in via a search engine query on any of the terms — I get a startling number of search requests for “sex toy” (which, much to the disappointment of would-be searches, doesn’t actually lead to a sex toy) which matches one of the rules — will get forbidden. In addition, if by some chance you have one of the terms as a title in your posts, we need to let those through as well. As such, we have to explicitly allow these, like so:

RewriteCond %{HTTP_REFERER} !.*coldforged\.org/.*$ [NC]
RewriteCond %{HTTP_REFERER} !.*del.\icio\.us.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)google\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)altavista\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)yahoo\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)msn\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)a9\..*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://([^/]+)lycos\..*$ [NC]
RewriteCond %{REQUEST_URI} !/403.shtml$ [NC]

(Note: The first period following the search engine name should be escaped with a backslash. However, my code prettifier obviously hates me.) Obviously, change the “coldforged.org” to be your own site’s domain name. The rest of the lines, with few exceptions, are made up like so:

RewriteCond %{HTTP_REFERER} ^.*cm3.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bobbemer.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adultcam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*web-cam.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*adulthost.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*bostonticket.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*wlten.*$ [OR]^M
RewriteCond %{HTTP_REFERER} ^.*mp3search.*$ [OR]^M

This is an htaccess condition that is true if the referrer of the request contains the word “loans” anywhere in it. The [OR] at the end tells the web server to treat the condition as a logical OR with the next condition. So, read as English it would read “if the referrer contains the word ‘loans’ or…” and we keep going down the list of conditions. If the conditions match, which in our case means that the referrer contained any of the words in the many conditions in that file, we’ll eventually execute the rule specified.

RewriteRule .* - [F,L]

Which tells Apache that we’re going to Forbid the requester from seeing the requested content and that’s the Last thing I’m going to say about that request. The requester will get a 403 (forbidden error) returned and we’ll display our special 403 web page for him where we taunt him and call him a dimwit.

Make certain that you have a 403.shtml file sitting there as well. This is the file that will be displayed to the would-be referrer spammer when they try to get in (or — as happened to me — to the poor, innocent lad that tried to enter your site as your were experimenting with all of this). This is the source of the 403.shtml page that I use (you can see it in action here).

What’s that ugly hunk of code?

Note that you’ll want to provide your own email address in that 403 page to cover the case where you accidentally block someone you want to let in. That big lump of code in the 403 page that mentions something about the “hiveware_enkoder” is a snazzy bit of code that obfuscates my email address. If you’re creating a page that will mostly be seen by spammers, it wouldn’t be overly smart to have an email address that is easily harvested. That hunk of code actually generates a clickable link to send me an email. To get your own code to paste in there, visit the Hiveware page.

(This is as good a time as any to visit the topic of who will see your 403 page. Chances are nobody will usually. There are programs out there specifically for spamming referrer logs. By the thousands. They don’t care what kind of return code they get from the web server. However, on the off chance that a relatively naive spammer just spoofs the referrer from his browser software, they’ll get the pleasure of being called a dimwit.)

Can’t you automate this a bit?

All that’s dandy. Now you’ve denied a whole load of folks from accessing your site and sticking their links in your face. But what if you get more? I created a small script that makes adding additional jerks to the .htaccess file a bit easier. Bear in mind that this script is only intended for WordPress installations, in that I use the WordPress user information to determine if the script is allowed to run. If you know your way around PHP, this is easily remedied, but I just wanted to warn you. You can see the source here. Copy that source into a file called “addforbidden.php” in your weblog home. Now, open that guy up (http://your-blog-url/addforbidden.php). You should be presented with a very simple form. Type in the word that you want to add to the list of denied words, then submit. A new condition will be added to the list you already added. Just in case anything goes wrong, a copy of your old .htaccess is saved in the same location with a date and time stamp on it so you can replace it if things go screwy.

Teach those referrer spamming jerks a lesson! Yeah, right… like they’re going to even know. But at least you can have the satisfaction of having a reduced number of them in front of you, as well as the rare opportunity to call them a dimwit. Joy!

37 Responses to “Killing Referral Spam”

  1. 1

    AdamStac Says:

    I have access again…I thought that you had banned me for good (…kidding).

    I will have to give this a far better read tomorrow…I need to rid myself of those bastards! But like you said they are pretty harmless, or are they? Hmm…

    Awesome of you to hand out your preventative measures too CF.

  2. 2

    agnOstos Says:

    I’m going to be creating a page for a group of friends. Although I don’t think it’s going to be a blog format. Does it really matter if the referrals are false? Especially if you don’t look at them?

    Spellcheck pwnz btw.

  3. 3

    ColdForged Says:

    Awesome of you to hand out your preventative measures too CF.

    My pleasure, I hope you find it useful!

    But like you said they are pretty harmless, or are they? Hmm…

    and

    Does it really matter if the referrals are false? Especially if you don’t look at them?

    No, it doesn’t matter one bit really. (Actually, I’ve been thinking about it and the only other possible benefit they could be going after is the people that list the “Last XX Referrers” on their page. If they’re not filtered they get some exposure and possible Google-juice.) If you never look at your referrers and they are never displayed on your site to be crawled by search engines, they don’t really hurt anyone. They are, however, irritating to me which is why I try to kill them :) .

    Spellcheck pwnz btw.

    Thanks, man! Just for you I’m adding “pwnz” and “btw” into the dictionary ;) .

  4. 4

    coldforged.org » Another adjustment to the referrer spam killing Says:

    Another adjustment to the referrer spam killing

                So far so good on the referrer spam killer. Since I implemented it yesterday it’s killed 33  [...]
    
  5. 5

    Mike Says:

    It’s not as useless as you think. A lot of this referrer spamming is due to the fact that many websites list referrers or list their web stats publically, which is where the spammers also benefit because more links equals higher search engine ranking.

  6. 6

    marlyse Says:

    I found this after you commented on BloggingPro on a comment I had made. The code you show here - and motivation why to use it in the first place, even if those spammers usually won’t even know about it - is the same as I have been using successfully (though I don’t redirect to a semi-polite 403 page but to another site, “youasshole.com”; that gives ME some satisfaction, thinking that maybe, maybe at least ONE person will get my point when being redirected).

    My list has meanwhile gotten really long and your script will come in VERY handy. Also, thanks for explaining the code this well, finally I KNOW what I am actually putting into my .htaccess file.

    This is great. Thanks.

  7. 7

    marlyse.com » coldforged.org » Killing Referral Spam Says:

    a script that works with WP to AUTOMATE the entry of new spammers into the .htaccess file: coldforged.org » Killing Referral Spam - ClickToPrint […]

  8. 8

    AdamStac Says:

    (though I don’t redirect to a semi-polite 403 page but to another site, “youasshole.com”; that gives ME some satisfaction, thinking that maybe, maybe at least ONE person will get my point when being redirected)

    The bad thing about doing it that way is a case of a person like me browsing your blog, and somehow getting denied access through a “fluke”. Which is what happened here at CF. I was just navigating as normal, and BAM I get hit with the custom 403 page! CF would have been oblivious to this if it hadn’t been for the email link in that page. Nice touch BTW.

    So you see, you may be alienating some of your visitors for no reason, and you would never know if your site continues to ban them.

  9. 9

    ColdForged Says:

    Yes, Adam was the “poor, innocent lad” referenced in the story. Though I do have to share that I have received one additional request from someone, bless his soul.

    He’s actually the reason I discovered the search engine problem. He sent me a message that said, and I quote:

    Just looking for the sex chair image.

    Like I said, I get a lot of queries for “sex toy” and “sex chair” from Google Images because of this post. That picture just happens to have an alt text of “Not A Sex Toy”, which is enough to score me scads of prurient searchers that go away frustrated.

    This guy just wanted his sex chair and was denied because of my search inadvertent engine blocking.

  10. 10

    Ruby Sinreich Says:

    Referrer spam isn’t harmless - it can completely trash your site stats. For example it makes it appear as if I am getting thousands of visitors daily to my WordPress blog, which is (sadly) not the case. I think it’s more like hundreds, but I have no reliable stats now.

    I will try messing with this solution. Any chance it could be packaged as a plug-in or something nice-n-easy in the future?

  11. 11

    ColdForged Says:

    Ruby, I thought of making it a plugin, I really did and maybe I still will. It just doesn’t fit the format of a plugin in my brain — it does nothing with posts, nothing with comments, really provides no web page functionality whatsoever… aside from denying people. The only thing you’d have is an interface, which is what I tried to provide in an extremely rudimentary form.

  12. 12

    Dorothea Salo Says:

    Brilliantness! Love it. Will link to it.

    And yes, plugin, please! I so hate doing this by hand; you have NO idea. Even better — plugin that can hook into multiple WP blogs. Though that might be impossible.

  13. 13

    ryan Says:

    Thank you. I hate spam.

  14. 14

    sensory output » Much Better Says:

    ng nonetheless when I look at my shortstats. Here are some solutions I will be looking at: one, two, and three. With time permitting later today, I will post Day 2 o […]

  15. 15

    Jinjiru's Retrobox Says:

    Trackbacks and Referrers Spam

    Expression Engine (which I use now with all my projects) has several spam-prevention features which one can use to reduce the amount of spam on his site. They include: Comments Spam Prevention Tools CAPTCHA - helps to check whether the person w…

  16. 16

    Jinjiru Says:

    Oops, sorry for these duplicated trackbacks, please remove one of them!

  17. 17

    ColdForged Says:

    Thanks for the notice, Jinjiru :) .

  18. 18

    glo Says:

    This looks promising and while I’ve stopped referral spam on my blog with some php code, this might be a better way to go simply because it will not use up as much server resources. The php code I use actually checks the referral’s domain for its IP and then sends the referral back to their logs. I love the idea and it has stopped them from even trying to spam my logs but as the number of spammers grow it will slow down my sites load time.

    I just wanted to comment on the cost of this kind of spam to the blog owners. While it’s true that most will never know they are being spammed, they will pay for it in bandwidth cost as more spammers get on the referral bandwagon. I run a small hosting company (a reseller plan) and can see the bandwidth increase with all the referral spam hitting mine and my clients blogs. While none of my clients have gone over their allowed bandwidth usage, a few have come close all because of referral spam with a small amount of legit referrals. That’s a non issue for now but I can see it becoming one if they aren’t stopped.

    I’m going to link to this information once I get my blog moved to new servers. The more bloggers that can pass on this info the better in my opinion. If you haven’t posted this on the WP message boards, you should.

    BTW, I like the comment preview feature on your blog. I haven’t looked at the rest of your blog so I’m hoping that there is info on this feature here. :)

  19. 19

    ColdForged Says:

    While it’s true that most will never know they are being spammed, they will pay for it in bandwidth cost as more spammers get on the referral bandwagon.

    Too true and a problem that I admittedly didn’t think about.

    I’m going to link to this information once I get my blog moved to new servers. The more bloggers that can pass on this info the better in my opinion.

    Thanks for the kind words, and I certainly agree with you.

    BTW, I like the comment preview feature on your blog. I haven’t looked at the rest of your blog so I’m hoping that there is info on this feature here.

    I would never do such a thing. ;) Though, in all honesty, it doesn’t detail all of it, like the live updating of the name and URL. But that’s relatively straightforward and I’d happily detail the technique if there was interest.

  20. 20

    Kevin Leitch Says:

    Thanks very much for this - just implemented it so we’ll see how it goes!

  21. 21

    MassiTwoSteps.net » Leterna lotta tra il blogger e lo spammer (reprise) Says:

    […] eare o modificare il file .htaccess: si pu bannare i domini, gli IP e le parole chiave, o reindirizzare gli spammer verso un 403 forbidden err […]

  22. 22

    Ken Says:

    Coldforged, I love the idea of your automation script, but can’t seem to get it working properly! I keep getting “Couldn’t add the word.” Through FTP, I’ve verified that a new .htaccess is being written, and that the backup of it with date added are created, the only problem is that there are no new lines in the replacement .htaccess file written. I made sure to add the 403.shtml page to the blog directory, the blog is in a sub-directory, the main URL hosts static pages.

    Do you have any thoughts as to where the problem may lie? (even if your thought is, “I’ve failed to properly follow your instructions”:-) We do not host the site on our own computer, rather we buy space on a shared server. Another clue we have is that if we break the .htaccess file in such a way that we deny ourselves access, our 403.shtml page is not returned, but rather some other default 403 error page. CPanel reports that .shtml is defined as a “system wide” server-parsed Apache handler.

  23. 23

    ColdForged Says:

    Coldforged, I love the idea of your automation script, but can’t seem to get it working properly!

    Hi Ken. Unfortunately, I’m going to have to say I have a thought and that thought is “you’ve failed to properly follow my instructions” :) . You need to make certain that all of this is in your .htaccess to start with. Then the script can be used to add additional losers to the denial list.

    I should probably change it so that if that information isn’t in the .htaccess it is automatically added, but I just haven’t gotten that fancy with it.

    Good luck in your battle against the insidious hordes.

  24. 24

    loid Says:

    Nice stuff with the .htaccess

    I thought as someone else not fond of referral-log SpamLosers, you might be interested in a little study I just did on the biggest one hammering my blog month after month. Turns out he’s an affiliate marketer for 13 online casinos. The casinos belong to an association that says they’re all dead-set against spam. Read the post if you’re interested and have a minute. I think it gives bloggers a tool to hit this guy where it hurts - in his wallet. Here’s the link

    loid

  25. 25

    clint Says:

    thanks alot for this tutorial, ive been searching about alot lately as my referrals have been flooded by this creep. I followed your steps but have a couple questions.

    1) looking at my referrals I still see him getting thru after adding to the forbidden list. mainly “musicbox1″ and the numerous “buy-bontril,buy-levitra, and isacommie.com’s” keep getting thru.

    2) I notice that after adding a forbidden I get a copy of the htaccess added with a long string of numbers after it, are these temp files? should they be there and can I delete them?

    just to confirm, i’m using wordpress and all of the following files reside in my wordpress directory, the htaccess, the 403.shtml, and the addforbidden.php

    thanks for the help!

  26. 26

    clint Says:

    excellent tutorial, have 2 questions I hope you can answer for me.

    1) the same dirt merchant still appears on my referrers list, the- isacommie, buy-cialis, buy-ambien, musicbox1, and all of those are in the list.

    2) it seems adding a url using the script- that it makes a copy of my htaccess file each time with string of numbers after it, is this some sort of temp cache file and can I delete them? after looking at my main htaccess file after adding a name the main file does have the name I just added.

    and to be sure, all of these files, the htaccess, the 403 and the script are in my wordpress directory is that correct?

  27. 27

    ColdForged Says:

    Clint, my apologies, your comment got marked as spam :lol: . Teach you to use those bad words in a comment, eh?

    1) the same dirt merchant still appears on my referrers list

    It sounds as though the .htaccess file isn’t being used by your webserver. Make certain that it is, in fact, called .htaccess and that everyone has read permission on that file. Yes, those files will go in your WordPress folder. Make sure that your rewrite base is set correctly for your WordPress installation. WordPress should modify your .htaccess when you have it set up your permalinks… just put the stuff that I provide under that and the rewrite base should be dandy.

    is this some sort of temp cache file and can I delete them?

    Those are there in case you diddle your .htaccess file completely somehow. You are more than welcome to delete them, though. They’re essentially a copy of the file before each change.

  28. 28

    Shadow Says:

    Another reason people use referral spam is because some blogs have scripts to pick up the most common referral addresses and link back to them. Just thought you might want to mention this in your article, because it creates a much greater incentive for spammers to use this technique.

  29. 29

    kristy Says:

    wowo nice work, i think i may use this!!

  30. 30

    AdamStac Says:

    CF, I finally got around to adding this solution since I have been inundated with spam comments, and spam referrers. Luckily I recently installed Spam Karma or I would be moderating over 114 spam comments in the last few days.

    Two questions for you:

    addforbidden.php can’t make a backup of the original .htaccess file. I get 2 errors, can you help with this? (CHMOD=666)
    Would it be possible when accessing addforbidden.php that it would query the current .htaccess file to pull the current jerks already being blocked?

    I wanted to get this installed before the month turns so I can see the difference in the stats of my web server. Very cool of you to share this with everyone.

    Also, how did you add the “view source” feature?

  31. 31

    All Day Permanent Red » Blog Archive » Killing referral spam with htaccess Says:

    […] great majority of referral spam. Check out this post at […]

  32. 32

    The Life of Me. » Archives » Spaghetti & Spam Says:

    […] them as their own, I called upon my good buddy ColdForged and his awesome tutourial “Killing Referral Spam” using creative .htaccess rules to stop the Refe […]

  33. 33

    William Says:

    Thanks For your work.

    I think it’s useful to many people.

  34. 34

    ddhr.org Says:

    […] I tried using WP-ShortStat in the past, but stopped because I was getting tons and tons of referrer/referral spam, and all this spam was filling up my database. I looked into it and found a way to reject referrer spam by using .htaccess. I tried a few things out and it seems to be working great. I’ll be continually updating the keywords it uses to reject spam, but that’s something that I’ll probably be doing less and less over time. Tue, 07 Mar 2006 @ 11:15:54 in Updates Leave a comment Name: […]

  35. 35

    Meryl.net Says:

    Poker Sites Screws up Web Stats

    Jason talked about Disney showing up in his blog stats. The same thing happened to me around the time of his posting. Now, instead I’m seeing a flood of poker this, poker that, poker all the time showing up all over my stats. They showed up befor…

  36. 36

    Brent Says:

    I’m getting the following message when I click on the “this entire list” link:

    Warning: main() [function.main]: openbasedir restriction in effect. File(/home2/coldfor/publichtml/scgi-bin/refer/refer.php) is not within the allowed path(s): (/home/coldforg/:/usr/lib/php:/usr/local/lib/php:/tmp) in /home/coldforg/publichtml/source/viewsource.php on line 2

    Warning: main(/home2/coldfor/publichtml/scgi-bin/refer/refer.php) [function.main]: failed to open stream: Operation not permitted in /home/coldforg/publichtml/source/view_source.php on line 2

    Warning: main() [function.main]: openbasedir restriction in effect. File(/home2/coldfor/publichtml/scgi-bin/refer/refer.php) is not within the allowed path(s): (/home/coldforg/:/usr/lib/php:/usr/local/lib/php:/tmp) in /home/coldforg/publichtml/source/viewsource.php on line 2

    Warning: main(/home2/coldfor/publichtml/scgi-bin/refer/refer.php) [function.main]: failed to open stream: Operation not permitted in /home/coldforg/publichtml/source/view_source.php on line 2

    Warning: main() [function.include]: Failed opening ‘/home2/coldfor/publichtml/scgi-bin/refer/refer.php’ for inclusion (includepath=’.:/usr/lib/php:/usr/local/lib/php’) in /home/coldforg/publichtml/source/viewsource.php on line 2
    Unauthorized

  37. 37

    John Says:

    Hi there
    Excellent data. I was wondering, with respect to this sample line of code that you provided where and how would one type the country identifier for a domain such as mine, “larkin.net.au”?

    RewriteCond %{HTTP_REFERER} !.coldforged.org/.$ [NC]

    Regards

    John

Leave a Reply

How do I get a cool icon like yours? Obviously "cool" is subjective, but you can have your own icon displayed here by signing up for a gravatar. Note that I currently accept up to an R-rated icon though that may change in the future.

You may use Markdown syntax in your comments.

Name

Mail (never published)

Website

In order to comply with COPPA and cover my own ass, you must be 13 or older to post a comment here. Period, no exceptions.

Comment Preview

  1. 38

    Someone Says: