Connecting Tech Pros Worldwide Forums | Help | Site Map

robots.txt being read by people

Misfit
Guest
 
Posts: n/a
#1: Feb 1 '07
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?

Misfit


Adrienne Boswell
Guest
 
Posts: n/a
#2: Feb 1 '07

re: robots.txt being read by people


Gazing into my crystal ball I observed "Misfit" <MisfitRetard@gmail.com>
writing in news:1170289535.460955.320240@l53g2000cwa.googlegr oups.com:
Quote:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.
>
So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?
>
Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
>
Misfit
>
>
No, it is a suggestion. For example, if you had a directory of some
file type that a robot could not crawl, you would disallow that
directory to keep the bot from wasting its time going there.

If you want to keep people/bots out of a directory, you have to do it
server side. Again, if the pages are password protected, would would
not want the bot wasting time with those either, so you would disallow.
--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Harlan Messinger
Guest
 
Posts: n/a
#3: Feb 1 '07

re: robots.txt being read by people


Misfit wrote:
Quote:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.
>
So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?
>
Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
No.

But if you have pages on the Web that the general public mustn't see,
then you should have them password-protected anyway, instead of relying
on people not finding them.
David E. Ross
Guest
 
Posts: n/a
#4: Feb 2 '07

re: robots.txt being read by people


Misfit wrote:
Quote:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.
>
So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?
>
Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
>
Misfit
>
robots.txt is open for all to read. Otherwise, bots and crawlers could
not get it.

I have logged hits to some of my pages, including information about the
user agents accessing the pages. It takes some detective work on my
part to distinguish some bots from browsers. It's as if some bot
operators don't want anyone to know that their visits are examining Web
sites.

Rogue bots are a problem. They ignore robots.txt and also the
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
tag. See, for example, <http://www.kloth.net/internet/badbots.php>.

If I have content I don't want others to see, I don't put it on the Web.

--

David E. Ross
<http://www.rossde.com/>

I use SeaMonkey as my Web browser because I want
a browser that complies with Web standards. See
<http://www.mozilla.org/projects/seamonkey/>.
Bruce Lewis
Guest
 
Posts: n/a
#5: Feb 2 '07

re: robots.txt being read by people


"Misfit" <MisfitRetard@gmail.comwrites:
Quote:
Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
No encryption required. I just put /unlisted in robots.txt and keep all
my top-secret files in http://ourdoings.com/unlisted/topsecret/

They can't get "topsecret" from looking at robots.txt.

--

http://ourdoings.com/ Easily organize and disseminate news and
photos for your family or group.
Closed Thread