By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,057 Members | 1,346 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,057 IT Pros & Developers. It's quick & easy.

robots.txt being read by people

P: n/a
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?

Misfit

Feb 1 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Gazing into my crystal ball I observed "Misfit" <Mi**********@gmail.com>
writing in news:11**********************@l53g2000cwa.googlegr oups.com:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?

Misfit

No, it is a suggestion. For example, if you had a directory of some
file type that a robot could not crawl, you would disallow that
directory to keep the bot from wasting its time going there.

If you want to keep people/bots out of a directory, you have to do it
server side. Again, if the pages are password protected, would would
not want the bot wasting time with those either, so you would disallow.
--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Feb 1 '07 #2

P: n/a
Misfit wrote:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
No.

But if you have pages on the Web that the general public mustn't see,
then you should have them password-protected anyway, instead of relying
on people not finding them.
Feb 1 '07 #3

P: n/a
Misfit wrote:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?

Misfit
robots.txt is open for all to read. Otherwise, bots and crawlers could
not get it.

I have logged hits to some of my pages, including information about the
user agents accessing the pages. It takes some detective work on my
part to distinguish some bots from browsers. It's as if some bot
operators don't want anyone to know that their visits are examining Web
sites.

Rogue bots are a problem. They ignore robots.txt and also the
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
tag. See, for example, <http://www.kloth.net/internet/badbots.php>.

If I have content I don't want others to see, I don't put it on the Web.

--

David E. Ross
<http://www.rossde.com/>

I use SeaMonkey as my Web browser because I want
a browser that complies with Web standards. See
<http://www.mozilla.org/projects/seamonkey/>.
Feb 2 '07 #4

P: n/a
"Misfit" <Mi**********@gmail.comwrites:
Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
No encryption required. I just put /unlisted in robots.txt and keep all
my top-secret files in http://ourdoings.com/unlisted/topsecret/

They can't get "topsecret" from looking at robots.txt.

--

http://ourdoings.com/ Easily organize and disseminate news and
photos for your family or group.
Feb 2 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.