Misfit wrote:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.
So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?
Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
Misfit
robots.txt is open for all to read. Otherwise, bots and crawlers could
not get it.
I have logged hits to some of my pages, including information about the
user agents accessing the pages. It takes some detective work on my
part to distinguish some bots from browsers. It's as if some bot
operators don't want anyone to know that their visits are examining Web
sites.
Rogue bots are a problem. They ignore robots.txt and also the
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
tag. See, for example, <http://www.kloth.net/internet/badbots.php>.
If I have content I don't want others to see, I don't put it on the Web.
--
David E. Ross
<http://www.rossde.com/>
I use SeaMonkey as my Web browser because I want
a browser that complies with Web standards. See
<http://www.mozilla.org/projects/seamonkey/>.