Thanks goes out to the person from Paris who yesterday pointed his/her screenscraping software (which will remain unnamed to avoid promoting further infamy) at PHPkitchen.com and drained
78 MEGABYTES !!!!!!!!!!!!
off my server. Yes, your IP, 212.234.213.250, has been banned and you have been reported to your ISP and hopefully blacklisted.
It looks like the said software now has an option to override the disallow directive in robots.txt – any other webmasters know how to get around these types of nuisances?
Happy that I upgraded to a 40GB account,
Demian
April 4th, 2003 at 2:25 pm
Ick. =\\ *adds another ip to her ban list*
hehe
—
http://php-princess.net
April 4th, 2003 at 3:12 pm
seriously, the disallow in robots.txt *usually* works a lot better because even if you ban a bot, you can end up sending out 6 hours worth of 403\’s which is still bandwidth.
but what can you do when the bots don\’t respect disallows? firewall mods in my colo are not an option, I would be greatly pleased if anyone has any advice on this 🙂
April 8th, 2003 at 5:15 am
The User-Agent header is trivial to spoof. Use a .htaccess file that denys by IP address instead:
<limit GET POST>
order allow,deny
allow from all
deny from <address>
</limit>
April 9th, 2003 at 2:31 am
jhherren, thanks for the suggestion. What I was looking for was a generic way to disable certain nuisance bots.
Banning by IP is fine but usually you only discovered you\’ve had your site siphoned *after* the fact, so it\’s not much use.
The bot I\’m talking about, again I think it\’s best not to mention the name here, is a crappy little 2 MB program that any windows (always win98 in my experience) user can download for free and fire up.
April 10th, 2003 at 1:08 pm
Feel free to email me with the specifics, and I\’ll be glad to TRY to help out 🙂