Thursday, January 17th, 2008
Bill Harris posted a hilarious and surprisingly effective method of generating band names for EA’s game Rock Band among other things. I decided to slap together a script that automates the process. Go try it out!

Posted in Games, Development | 7 Comments »
Wednesday, December 6th, 2006
Things were just going too smoothly. I thought my SK2 plugin was doing dandy, catching everything it was supposed to. And it was, though my average daily take has been hovering about 5 or 10 now. What should have been obvious was that I actually haven’t received any comments recently. I finally discovered why: there was a domain blacklist that had a domain of, simply, “2006″. Due to the way the plugin handles domain blacklist denials, any attempted comment on any post that happened to be posted in the year 2006 would get denied. That’s what we refer to in the software industry as a “head slapper.” Well, that’s the most censored possibility of many.
I’ll have to redesign some things. mod_security should be able to squirrel me out of this, as it allows you to check particular fields within query strings, so I should be able to look at just the URL and actual comment text instead of the entire argument list.
The worst part of this is that the only way I found out was following a referral back to a forum post where some poor fellow mentioned that he was unable to post a comment. “Horseshit,” said I and decided to post a test comment. I got denied. Head slap. Sorry ava11 and whoever else.
Posted in General, Development, Plugins | 6 Comments »
Thursday, November 23rd, 2006
Happy Thanksgiving to all my readers! I hope you are amongst friends and family and that it’s a day of rest and peace for you all. We’re spending a quiet day with my Mom, cooking turkey and just relaxing. We’re not the kind to spend the entire day in the kitchen — it’s happened in the past but that’s not the vibe we wanted today — so it should be nice. It’s our first Thanksgiving since my father died but I think we’re all in a relatively good place.
I survived a reduction in force at work yesterday — though several of my favorite people didn’t — so I can use a bit of peace and quiet (he says as his daughter watches Cars in the other room).
An update on the plugin
I’ve gotten some good feedback on the proposed Spam Karma 2 mod_security plugin. The best feedback was from SK2’s creator, DrDave, who gave me some insight into his architecture and suggested some improvements and direction. Many thanks, DrDave! I’ve since modified the plugin to do the following:
- Block outright any IP address in the IP blacklist with a score greater than 90. Any request from them will get a 412 precondition failed.
- Block any domain in the domain blacklist with a score greater than 90 from appearing in a POST request. In other words, if someone tries to post a comment to the blog with one of the domains, they’ll get a 412 as well.
With these in place, my daily comment spam take has dropped from an average of 400 per day to an average of 3… two whole orders of magnitude. I like that. There are still a few things to do with it before it’s ready for external testing.
- Provide adjustable strength like the rest of the SK2 plugins for people not quite as nasty as me. Strength will affect both the minimum score a blacklisted item must have before it’s blocked as well as possibly changing whether domains are blocked at all. For instance, on “weak” strength we’d only block IPs with a score of 99 or higher and not domains. On “fearsome” we’d block what I currently block.
- Add in aging of blocked entries, reducing the score of items that are currently blocked so they can eventually be discarded from the blocked list. SK2’s scoring can’t account for this as it’s dependant on still getting the requests or spams… if I block them it can’t do it’s scoring adjustments. I just need to give it some help with that.
I’m really pleased so far. I’m going to be doing some access log mining to get some better statistics.
Posted in Blogging, Development, My Life | No Comments »
Friday, November 17th, 2006
I’ve used Spam Karma 2 on this blog forever, as it’s an effective and glorious piece of software that does its job well, as evidenced by the ~90,000 comment spams eaten. It’s elegantly designed unlike most of my plugins, and actually built to be extended by others with its plugin architecture.
I’ve been playing around with mod_security recently — though I’ve pined over it for quite longer — as it provides some rather hefty and glorious functionality for smacking spammers. I didn’t do this before because Dreamhost didn’t provide mod_security access. A Small Orange does, so I’ve converted most of my referral spam handling over to mod_security. That’s nice, but doesn’t do much over what mod_rewrite offers.
So I set about experimenting with fighting comment spam with mod_security since it’s capable of scanning POST payloads. It should be faster than Spam Karma 2 as it’s a compiled and linked module running in Apache rather than a interpreted — even though PHP and Zend do happy things with byte code compiling PHP code — script. I had a whole post written up similar to my original article based on mod_rewrite discussing what mod_security does, how to use it and how to keep up with the spammers.
Sharing resources
But then I had a sudden flash… why should I manually keep up with spammers? I’ve got some hot software that does it for me in Spam Karma. If I could leverage SK2’s blacklisting and moderation handling and automatically generate mod_security rules for me, wouldn’t that be much easier? I mean, SK2 has a ready list of over 4,800 domains that it has quite aptly determined are used for no good. Wouldn’t it make sense to scan all POST requests to my blog and screen out all of them that contain those domains? It does to me.
Thanks to a truly wonderful plugin architecture, it was a relatively painless endeavor. I have a working plugin in place now that keeps my mod_security rules in sync with my SK2 domain blacklist. It is relatively naive right now as, though he had great foresight in most of his plugin architecture, DrDave provided no hook into the blacklist insertion triggers. To his credit, I’m sure there was no evidence of need. But it would be helpful in this case.
Concerns
I do have some concerns and they’re all security-related. In order to allow the plugin to do its work the .htaccess file has to be writable by Apache. Most people do that anyway so WordPress can install permalink rewrites, but I typically don’t. I’m truly interested to hear what others think of the idea and the security implications. I don’t much care about false-positives… at all. I’m more interested in the merit of the idea and any possible downsides, before I release it for even limited release. I don’t mind being a Guinea pig on my own site, but it’s a whole separate thing when it’s in public release.
Any comments welcome.
Posted in Blogging, Development | 3 Comments »
Friday, November 10th, 2006
I’ve talked before about .htaccess and the hotness that is mod_rewrite for various purposes.
I ran across another one today that I hadn’t dealt with and figured it was worth discussing. I updated an old post with some new material. That in itself wasn’t overly odd. But I decided that the post was different enough and the demand still fresh enough — the damned thing accounts for 10% of the total traffic to my site, spam included — to warrant a new timestamp. So I bumped the timestamp to today. Lovely.
Almost lovely. So now the content has a different URL. I use dates in my URLs as Al Gore intended. What used to exist at
/archives/2005/10/26/xbox-360-high-definition-faq/
Now exists at
/archives/2006/11/10/xbox-360-high-definition-and-hd-dvd-faq/
That’s not necessarily a bad thing, just different. But guess what all the search engines have in them. Yes, the old one. If they go there now they’ll get a 404! That’s something I’ll have to address with an .htaccess redirect.
Redirecting from the old page to the new page
This is actually a fairly painless procedure in .htaccess. Let’s just take a look.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^archives/2005/10/26/xbox-360-high-definition-faq/?(.*) /archives/2006/11/10/xbox-360-high-definition-and-hd-dvd-faq/$1 [R=301,N]
</IfModule>
Most of this is a bit of work just to get ready to do the one line that counts. I’ve explained the other stuff previously, so let’s look at the line that does the actual work, starting with RewriteRule. Our RewriteRule does everything we need on one shot. It looks for matches to the first little clause — in this case starting at ^ and ending in ) — in the URL and, if it matches, “rewrites” the URL to match the second clause. In this case, we’re looking for:
^ - starts us at the beginning of the URL after the domain.
archives/2005/10/26/xbox-360-high-definition-faq - should be relatively self-explanatory. It’s the “old” URL that we’re looking to change.
/? - that looks for 0 or 1 occurrence of a / character. Just in case someone forgets to include it.
(.*) - grabs everything after the / and stores it off. It’s what’s termed a parenthetical expression that allows us to later create a back-reference to the text. For instance, I support paging of my comments. If someone links to a comment, the URL might look something like (I’m cutting off the beginning for word wrapping reasons) ...-high-definition-faq/comment-page-18/#comment-8789. In this case, I store comment-page-18/#comment-8789 in the back-reference.
/archives/2006/11/10/xbox-360-high-definition-and-hd-dvd-faq/ - this is simply appended directly to the domain when Apache rewrites the URL. So, we’re applying the new beginning of the URL we want to redirect to.
$1 - Remember the “back-reference” listed above? This $1 references that. It’s the first of the back-references we created — as it’s the only parenthetical expression in the matching regular expression — so we use the number 1. We can have many parenthetical expressions in our matching regular expression and hence many back-references but we just need one for this. Whatever was stored in our back-reference, like the comment-page-18 example above, will get plunked onto the back of our URL. If there was nothing in that back-reference, it doesn’t matter as we’ll just append “nothing” to the end of the URL.
[R=301, - this tells Apache to return a 301 redirect status code back to the browser that’s requesting the page when this rule is executed and the matching expression matches. A 301 tells the requester that the content has moved permanently, giving spiders and hence search engines the opportunity to correct their links when they spider.
N] - this tells Apache to start over from the beginning of the rewrite rules, but this time use the “new” URL. That way WordPress will get its shot at the URL in order to construct the things it needs to construct.
At the end of all this meandering, the user is redirected to the new content and she typically doesn’t even have to know. What shows up in the location bar at the top of the screen changes to match the updated location. It doesn’t get much simpler.
Here’s what my access log shows for these cases.

You can see the initial request come in from Google, which gets the 301 redirect response along with the new URL. Then comes the second request for the new URL and we respond with the 200 OK response and the actual content. Huzzah!
The spiders are happy. The users are happy. I’m happy. Let’s have pie.
Posted in Blogging, Hacks | No Comments »