473,395 Members | 1,701 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

robots.txt being read by people

I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?

Misfit

Feb 1 '07 #1
4 2325
Gazing into my crystal ball I observed "Misfit" <Mi**********@gmail.com>
writing in news:11**********************@l53g2000cwa.googlegr oups.com:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?

Misfit

No, it is a suggestion. For example, if you had a directory of some
file type that a robot could not crawl, you would disallow that
directory to keep the bot from wasting its time going there.

If you want to keep people/bots out of a directory, you have to do it
server side. Again, if the pages are password protected, would would
not want the bot wasting time with those either, so you would disallow.
--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Feb 1 '07 #2
Misfit wrote:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
No.

But if you have pages on the Web that the general public mustn't see,
then you should have them password-protected anyway, instead of relying
on people not finding them.
Feb 1 '07 #3
Misfit wrote:
I have wondered, and I've tried this on a few random sites. I type
the name of a site. www.somesite.com and follow it with /robots.txt.
This can tell the robots not to bother indexing the /images/ directory
or something, but it can also tell script kiddies where to look for
stuff. For example the Disallow may read Disallow: /AdminPages/.

So, isn't that a simple way to tell someone to type that into their
browser and see if it is open?

Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?

Misfit
robots.txt is open for all to read. Otherwise, bots and crawlers could
not get it.

I have logged hits to some of my pages, including information about the
user agents accessing the pages. It takes some detective work on my
part to distinguish some bots from browsers. It's as if some bot
operators don't want anyone to know that their visits are examining Web
sites.

Rogue bots are a problem. They ignore robots.txt and also the
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
tag. See, for example, <http://www.kloth.net/internet/badbots.php>.

If I have content I don't want others to see, I don't put it on the Web.

--

David E. Ross
<http://www.rossde.com/>

I use SeaMonkey as my Web browser because I want
a browser that complies with Web standards. See
<http://www.mozilla.org/projects/seamonkey/>.
Feb 2 '07 #4
"Misfit" <Mi**********@gmail.comwrites:
Just wondering? Is there an encryption method, or some way the
spiders can read the text and leave it unreadable by script kiddies?
No encryption required. I just put /unlisted in robots.txt and keep all
my top-secret files in http://ourdoings.com/unlisted/topsecret/

They can't get "topsecret" from looking at robots.txt.

--

http://ourdoings.com/ Easily organize and disseminate news and
photos for your family or group.
Feb 2 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Craig Cockburn | last post by:
Hi I'm aware of the use of robots.txt and the use of <META NAME="ROBOTS" CONTENT="index,follow"> However, what would be more useful is to be able to control within a page which elements of...
56
by: Anonymous, quoting Philip Ronan | last post by:
Subject: Warning: robots.txt unreliable in Apache servers From: Philip Ronan <invalid@invalid.invalid> Newsgroups: alt.internet.search-engines Message-ID: <BF89BF33.39FDF%invalid@invalid.invalid>...
2
by: Janus Knudsen | last post by:
Hello Im collecting ideas for a piece of software I've in mind. I need to create an application which can be started with some parameters, the application have to be started in many instances...
5
by: John Nagle | last post by:
This bug, " robotparser interactively prompts for username and password", has been open since 2003. It killed a big batch job of ours last night. Module "robotparser" naively uses "urlopen" to...
5
by: John Nagle | last post by:
Python's "robots.txt" file parser may be misinterpreting a special case. Given a robots.txt file like this: User-agent: * Disallow: // Disallow: /account/registration Disallow: /account/mypro...
4
by: Les Caudle | last post by:
I'm noticing that web requrests are coming in with /robots.txt appended at the end: http://www.domain.com/ProductDetails.aspx?productID=527/robots.txt I can correct these, one by one for each...
2
by: John Nagle | last post by:
For some reason, Python's parser for "robots.txt" files doesn't like Wikipedia's "robots.txt" file: False The Wikipedia robots.txt file passes robots.txt validation, and it doesn't disallow...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.