468,535 Members | 1,671 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,535 developers. It's quick & easy.

What is going on with the Search Engines?

I notice that search engines are now finding robots.txt files and catalogue
their contents. Is this wise I wonder? Is it a possible security risk?

I even found the White House robots.txt file on Google. Surely disclosing
detains of the directory structure is an open invitation for hackers to 'take a
look'?

Does anyone else feel the same way? Should we be bringing this to the search
engines attention?

Comments appreciated.


Steve
Jul 23 '05 #1
8 1521
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?

I even found the White House robots.txt file on Google. Surely
disclosing detains of the directory structure is an open invitation for
hackers to 'take a look'?


What to do search engines have to do with it? robots.txt objects aren't
magically private, you know; you can view them even without a search
engine's help (e.g. http://whitehouse.gov/robots.txt).

Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API.
Jul 23 '05 #2
Leif K-Brooks wrote:
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?

I even found the White House robots.txt file on Google. Surely
disclosing detains of the directory structure is an open invitation
for hackers to 'take a look'?

What to do search engines have to do with it? robots.txt objects aren't
magically private, you know; you can view them even without a search
engine's help (e.g. http://whitehouse.gov/robots.txt).

Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API.


Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.

Steve
Jul 23 '05 #3
Steve wrote:
Leif K-Brooks wrote:
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?


Also, why would your directory structure be a security risk? If it
needs to be listed in a robots.txt, it presumably has a URI, and is
therefor a part of your public Web API.


Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what
you truly want to be public information.


If the directory structure is listed in robots.txt, it could presumably
be found by crawling your site even without robots.txt being available.
Do you propose creating Web sites without any internal links as a
security precaution?
Jul 23 '05 #4
Tim
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents.

They're nearly always *found* that file, as it's put there as a way to try
and limit what they'll do on websites. I'd suspect that deliberately
showing you the robots text file might be due to search engine programmers
not thinking about excluding it. Why not write directly to a search
engine, and point this out to them?

Leif K-Brooks wrote:
Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API.

Steve <pl***************@ireland.com> posted:
Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.


Because it's about the only way to limit what some robots will look at on a
site. There really is no way to prevent rogue ones from doing what they
like, though.

Anyway, if a search engine can assess a site, so can any hacker using their
own tools.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
Jul 23 '05 #5
On Tue, 15 Mar 2005, Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?


Possibly. It depends on how you write your robots.txt file.

It also depends on how secure you want to be. If you *really* want to
block access to a URL, then you need to use an access control of some
kind, rather than hoping that the existence of the URL will remain a
secret.

But let's suppose, for the moment, that you're trying for the weak
security approach of creating an unpublished URL, and telling only
your friends about it (i.e not linking to it from any of your public
pages).

If you use robots.txt to "Disallow" explicit paths whose existence was
supposed to be kept hidden like this, then obviously you have now
revealed the existence of those paths to anyone who cares to look.

But if you apply the disallow only in a wildcard fashion, without
revealing explicit URLs, then you haven't given much away. For
example if you "Disallow /private" , then everybody can guess that
you have a hierarchy called "/private", but they still have no idea
what the individual URLs in there are called. As long as you
configure your server to block directory listing or URL guessing (a la
mod_speling) then they'd have a hard time finding anything by chance.

If you "Disallow foo.html", then everybody can suspect that you have
file(s) called foo.html, but they don't know exactly where to look for
them.

But in all of these cases, keep in mind that just a single incautious
mention of one of these hidden URLs - in a published web page or in a
usenet posting or in an archived mailing list etc. - is all that it
takes. robots.txt addresses itself to properly-behaved robots, but it
in no way prevents rogues from doing whatever they please.
Jul 23 '05 #6
It was somewhere outside Barstow when Steve
<pl***************@ireland.com> wrote:
Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.


robots.txt _is_ public information. So is the content to which it
refers.

The function of robots.txt is to describe publically visible resources
so as to identify those which are worth indexing as potential entry
points to the site, and those which are available to the public, but
should not be treated as entry points.

/css/, /scripts/ and /photos/ should probably be in there and
forbidden, because you want these to be served to "the public", but
you don't want them treated as independent entry points to the site.

/intranet/, /extranet/ and /secret_server_config/ can either be in
there or not. If you want to keep these secure, you _must_ have some
independent means to secure them.

It's a basic principle of good security practice that it must not
matter if these "secrets" are identified by robots.txt etc. They must
have their security enforced independently. There's also a slight
recommendation that they shouldn't be listed, because this highlights
their existence for a minor increase in the risk of encouraging attack
(although the flakey _vti_cnf can be assumed to exist anyway, without
needing to be told about it)
I think reading some of Bruce Schneier's work, or Ross Anderson's
book, would be interesting for you.

--
Smert' spamionam
Jul 23 '05 #7
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?

Thanks all. Comments noted re security etc. I still think its silly, and
unnecessary, for the search engines to catalogue the robots.txt files - all
points considered.
Steve
Jul 23 '05 #8
Tim
On Tue, 15 Mar 2005 11:29:30 +0000,
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> posted:
But let's suppose, for the moment, that you're trying for the weak
security approach of creating an unpublished URL, and telling only
your friends about it (i.e not linking to it from any of your public
pages).

...[snip]...

But in all of these cases, keep in mind that just a single incautious
mention of one of these hidden URLs - in a published web page or in a
usenet posting or in an archived mailing list etc. - is all that it
takes.


All it takes is for someone to access that page with a certain browser for
it to become public knowledge, never mind deliberate disclosure.

For instance, the Google text advert sponsored version of the Opera web
browser reports the visited URIs back to Google, unless it's of a form that
appears it should be private (something like HTTPS, username and password
authenticated URIs, etc.). And I don't believe for one instant that Google
isn't going to use the information for their own purposes. Databasing is
their business.

And it's not the only thing that works that way. There's various plug-ins
for other browsers which do the same sort of thing, some of which are
installed surrepticiously, or the person simply has no idea about what
they're doing while installing things blindly.

I'd argued on the Opera news groups that this is an irresponsible thing to
do, but they just like to argue back that it's irresponsible for whomever
used URIs in that manner in the first place. Which it is, but that doesn't
excuse what they're doing, too.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
Jul 23 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by ken ullman | last post: by
121 posts views Thread by typingcat | last post: by
67 posts views Thread by Sandy.Pittendrigh | last post: by
15 posts views Thread by lucky | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.