Ever since I rebuilt my website with WordPress and imported my blog from blogspot I’ve kept an eye on my server error logs. I like to make sure people who arrive here from the old blog or from other links end up where they were looking to go. While I’ve redirects that catch almost all of the possible routes in I want to be sure and checking the error logs gives me a way to spot any I’ve missed.
I just went through last months logs. Of the around 250 error pages served there was a rather annonymous blog page that needed redirection setting up for. It accounted for 5 of the hits. 10 were typos or calls to deleted pages. 60 were bots trying crude hack attacks on the site trying known vulnerabilities in a variety of software. That left around 170 that I’d not been able to explain. Every one a call to a comments page that when I checked didn’t exist. The address was a unique combination of a unique comment number and the page the comment had been posted on.
I’ve seen 404s like this for months and couldn’t work out why they were appearing.
Then it struck me what they were. Those comment pages were for comments that Akismet caught as spam. The only way (short of brute force which would show up in the logs) that someone could know the combination of the comments unique ID and the page it had been posted on was to be the original poster (or the spam system that posted it) . Those 404 pages must either be the spam bot coming back later to see if it worked or some other system running quality control before paying out for links to a site having been created.
I’ve not had a chance to match up spam messages to 404s because I keep my spam logs clear but I’m going to keep an eye on it and see if they support the idea. I’m intrigued to see how often they check a comment, if they come fromt the same IP as the spam message and how long after posting they check.