You may have noticed that I’ve been “closed for business”, so to speak, the past couple of days. Let me explain and beseech the community to help me get to the bottom of this.
The day before yesterday I found my hosting account had been suspended due to me using too many server resources. Upon pleading and promising to research what happened I got my account unsuspended and I pulled up my logs. What I noticed was… odd. I initially thought it was some kind of DoS attack, but it really didn’t look like it. I saw someone visit a page — unfortunately my logs are currently gone — then start requesting all of my monthly archive pages at a fast rate, re-requesting them when they didn’t get the data fast enough. Apache was returning a 200 success code at them, but apparently they weren’t getting their data fast enough as the kept re-requesting. That was when the account was suspended.
Yesterday at about the same time my account was suspended again. The pattern was about the same. Someone — from General Electric, apparently — visited me from Google. The requests were perfectly innocuous. About 12 seconds following the final ordinary request, the same IP initiated a flurry of requests for all of my monthly archives again. What occurred to me this time was that all of these archives were only linked via <link rel="archives" ...> tags in the header of the page. So, something was fetching/prefetching all of those links programatically at a high rate of speed.
I’ve heard of Google prefetching under certain circumstances, but only for <link rel="prefetch" ...> or <link rel="next" ...> tags. This makes me think that someone is trying to one-up Google and accelerate browsing even more, but they’re doing it in a particularly hamfisted way. I’ve seen a couple of add-ons for IE6 — Browster and something called KybIE GetEmAll — that will prefetch pages for you but I have yet to catch them pulling the links from this type of link.
That’s where you come in. Do you know of an Internet Explorer add-on or some wacky version of IE6 that performs prefetching to speed up access? If so, please post in the comments about your experiences so I can investigate. I’ve since removed those (useless) link tags from my header, so I think I’ll be okay. I’ve also removed some features — you’ll no longer get “welcomed back” to the site and have an indication of how many and which posts and comments are new — and I’m trying a static caching plugin to boot to try to reduce my overall burden on the server.
However, this is literally my last chance. My hosting provider has assured me that if I get suspended again because of this behavior, I’m gone. And I don’t have the desire or the cash to pay for a more robust hosting solution. Those of you that may recommend other hosting solutions — and I would love to hear your suggestions, believe me — please bear in mind that my requirements are 1G storage, at least 10G of traffic to allow for growth, multiple MySQL databases and a cost that’s not too much more than $5 a month… if it ain’t got all that I ain’t interested.
Thanks for your assistance and patience.
Addition: If Else has a wonderful writeup on the perils of prefetching. Well worth the read.




rohn Says:July 13th, 2005 at 1:19 pm
Where are you hosted now that you have such a deal? I’ll switch
I currently use phpwebhosting.com and have been mostly very pleased with them… no limits on storage or bandwidth, and you only need to request to get more MySQL databases, but they are more expensive than what you’re paying. $10 a month.

ColdForged Says:July 13th, 2005 at 1:21 pm
I’m at Ace-Host and I’ve been relatively happy with them as well. I actually don’t blame them for smacking me down, really, as I’m a cheap customer that was taking up most of the (4) CPUs.

If Else Says:July 13th, 2005 at 1:33 pm
I saw the “We’re closed” placeholder and are glad you’re back. Do you have any more information on the “bad bot”? I’ll assume that you’ve banned that particular IP.
“Those of you that may recommend other hosting solutions — and I would love to hear your suggestions, believe me”
How about 120 GB bandwidth at 9 bucks for the whole year?

If Else Says:July 13th, 2005 at 1:38 pm
Do you know of an Internet Explorer add-on or some wacky version of IE6 that performs prefetching to speed up access?
What’ll worry me is that this could all be down to a badly coded, self-written spider i.e. someone with just enough programming knowledge to be dangerous but not enough to be smart may have written this (probably as a personal offline archiver).

graphix Says:July 13th, 2005 at 3:10 pm
CF,
I’ve been using these guys and have been very happy:
http://www.darkstarllc.com/services/webhosting/
There’s been a little bit of growing pains for DarkStar (mainly with TS2 servers), but they have smart tech support guys and their heart is in the right place. That’s been good enough for me.
Check out that link and see what you think. Send me an E-mail if you want more details on the company. It has good bandwidth and connectivity to all of the locations I care about.

graphix Says:July 13th, 2005 at 3:14 pm
The Bronze package almost meets your requirements. 1GB with multiple mySQL databases. I pay $25 a quarter, which is a little over $8 a month. They may have a 6- or 12-month option that drops it down to the $7 range.
I like their server management stuff. You can log into my account and check it out if you’re interested. I just added the web hosting service to my existing TS2 server and haven’t finished setting it up yet.

ColdForged Says:July 13th, 2005 at 3:33 pm
Unfortunately you now know what I know. Equally unfortunately, banning by IP isn’t effective. I banned the first guy after my first occurrence but the second guy was a different IP. Nothing about the agent strings can be keyed on, they’re simply version of legitimate IE6 agent strings. They don’t even have the “signature” of a bot, since the initial requests for where they came in — the Google search in the second instance and I don’t recall the first one — were solid and they pulled all the files they were supposed to (i.e. CSS, images, javascripts). But then they went apeshit.
Thank you both for the hosting suggestions, I’ll take a gander.
That’s a possibility, but it seems to be at least slightly “in the wild” since two people in a row were using it. I suspect that this is a new version of some kind of add-on — or personal offline archiver — that just sprang up in the past few days. It’ll be interesting to see if there’s a new rash of “my account was suspended” questions on WordPress forums, as those
<link>elements are standard in the default theme of WP.
Chris Bunting Says:July 13th, 2005 at 6:20 pm
I wish I could help with your rogue site accessor problem, but I haven’t the slightest idea what that could be. As for a new hosting provider, if it comes down to that, try iPowerWeb. The cost (for 2 years) would be $6.95/month, and you get 3 GB of space, 50 GB of bandwidth, multiple databases, and much more. Good luck with finding the culprit!

erik Says:July 14th, 2005 at 10:18 am
great you’re back, hoping you can ‘kill’ the bad behaving!

Zach Says:July 14th, 2005 at 11:15 am
I don’t know what date and time specifically that you were first hit, but Browster went to version 1.0 on Wednesday: http://news.com.com/New+IE+plug-in+aims+to+speed+Web+searches/2100-1032_3-5785825.html?tag=cd.top
It is designed to prefetch search results (which would explain why you were hit immediately following a referral from Google). It could lead to some real hassle if this becomes a popular thing to do.

Dave Says:July 14th, 2005 at 2:57 pm
It could be the Google web accelerator. I think that visits all URLs it finds on a page and then caches them.

Matthew Says:July 20th, 2005 at 9:59 pm
http://www.site5.com/affiliates/idevaffiliate.php?id=5
6.95 a month
3 gigs of space
50 gigs of bandwidth a month
I’ve been using them for almost 5 years. I wouldn’t use any other company.
They are primo!
Good luck.
Matthew