Generally, most people with websites want to have their sites indexed so that people can find them.
However, if the bot is really really stupid, then maybe it isn’t in your best interest.
My experiences with msnbot have been pretty much negative.
This site has 15% of the searches supplied by Microsoft Live Search. Not an insignificant amount.
However, at this moment, msnbot and it’s entire class C networks are being blocked from my server because otherwise they’re going to DOS my poor box into oblivion.
In case you want this rule for yourself, here you go:
iptables -A INPUT -s 220.127.116.11/24 -j DROP
I checked on gerf.org and discovered that it was having trouble. A crew of at least 20 bots (ip addresses 18.104.22.168 through 199 inclusive) were requesting urls at about 1 a second.
Not a lot of traffic unless you notice that the URLs they were requesting were
all TRAC source and changeset URLs. These are cpu, disk and io intensive URLs
and I have them in my
robots.txt as URLs
that no bot should crawl.
I immediately went to the webmaster forums to see if others have had this problem (they have) and to complain.
Apparently, bots aren’t all they have trouble with:
“We apologize, but an unknown error has occurred in the forums. This error has been logged.”
Reading through the forums, it looks like msnbot has a really hard time
robots.txtfiles and honoring them. Really odd because I wrote
robots.txt parser just for fun way back in 1990-something and it’s really
they’re also coming from 22.214.171.124