Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old July 24th, 2005, 12:49 AM
Steve
Guest
 
Posts: n/a
Default What is going on with the Search Engines?

I notice that search engines are now finding robots.txt files and catalogue
their contents. Is this wise I wonder? Is it a possible security risk?

I even found the White House robots.txt file on Google. Surely disclosing
detains of the directory structure is an open invitation for hackers to 'take a
look'?

Does anyone else feel the same way? Should we be bringing this to the search
engines attention?

Comments appreciated.




Steve
  #2  
Old July 24th, 2005, 12:49 AM
Leif K-Brooks
Guest
 
Posts: n/a
Default Re: What is going on with the Search Engines?

Steve wrote:[color=blue]
> I notice that search engines are now finding robots.txt files and
> catalogue their contents. Is this wise I wonder? Is it a possible
> security risk?
>
> I even found the White House robots.txt file on Google. Surely
> disclosing detains of the directory structure is an open invitation for
> hackers to 'take a look'?[/color]

What to do search engines have to do with it? robots.txt objects aren't
magically private, you know; you can view them even without a search
engine's help (e.g. http://whitehouse.gov/robots.txt).

Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API.
  #3  
Old July 24th, 2005, 12:49 AM
Steve
Guest
 
Posts: n/a
Default Re: What is going on with the Search Engines?

Leif K-Brooks wrote:[color=blue]
> Steve wrote:
>[color=green]
>> I notice that search engines are now finding robots.txt files and
>> catalogue their contents. Is this wise I wonder? Is it a possible
>> security risk?
>>
>> I even found the White House robots.txt file on Google. Surely
>> disclosing detains of the directory structure is an open invitation
>> for hackers to 'take a look'?[/color]
>
>
> What to do search engines have to do with it? robots.txt objects aren't
> magically private, you know; you can view them even without a search
> engine's help (e.g. http://whitehouse.gov/robots.txt).
>
> Also, why would your directory structure be a security risk? If it needs
> to be listed in a robots.txt, it presumably has a URI, and is therefor a
> part of your public Web API.[/color]

Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.



Steve
  #4  
Old July 24th, 2005, 12:49 AM
Leif K-Brooks
Guest
 
Posts: n/a
Default Re: What is going on with the Search Engines?

Steve wrote:[color=blue]
> Leif K-Brooks wrote:
>[color=green]
>> Steve wrote:
>>[color=darkred]
>>> I notice that search engines are now finding robots.txt files and
>>> catalogue their contents. Is this wise I wonder? Is it a possible
>>> security risk?[/color]
>>
>> Also, why would your directory structure be a security risk? If it
>> needs to be listed in a robots.txt, it presumably has a URI, and is
>> therefor a part of your public Web API.[/color]
>
> Fair comment but I still feel that, in this day and age, it is silly to
> advertise anything about your site / server / records other than what
> you truly want to be public information.[/color]

If the directory structure is listed in robots.txt, it could presumably
be found by crawling your site even without robots.txt being available.
Do you propose creating Web sites without any internal links as a
security precaution?
  #5  
Old July 24th, 2005, 12:49 AM
Tim
Guest
 
Posts: n/a
Default Re: What is going on with the Search Engines?

Steve wrote:
[color=blue][color=green][color=darkred]
>>> I notice that search engines are now finding robots.txt files and
>>> catalogue their contents.[/color][/color][/color]

They're nearly always *found* that file, as it's put there as a way to try
and limit what they'll do on websites. I'd suspect that deliberately
showing you the robots text file might be due to search engine programmers
not thinking about excluding it. Why not write directly to a search
engine, and point this out to them?

Leif K-Brooks wrote:
[color=blue][color=green]
>> Also, why would your directory structure be a security risk? If it needs
>> to be listed in a robots.txt, it presumably has a URI, and is therefor a
>> part of your public Web API.[/color][/color]

Steve <pleasedonotspamme@ireland.com> posted:
[color=blue]
> Fair comment but I still feel that, in this day and age, it is silly to
> advertise anything about your site / server / records other than what you truly
> want to be public information.[/color]

Because it's about the only way to limit what some robots will look at on a
site. There really is no way to prevent rogue ones from doing what they
like, though.

Anyway, if a search engine can assess a site, so can any hacker using their
own tools.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
  #6  
Old July 24th, 2005, 12:49 AM
Alan J. Flavell
Guest
 
Posts: n/a
Default Re: What is going on with the Search Engines?

On Tue, 15 Mar 2005, Steve wrote:
[color=blue]
> I notice that search engines are now finding robots.txt files and
> catalogue their contents. Is this wise I wonder? Is it a possible
> security risk?[/color]

Possibly. It depends on how you write your robots.txt file.

It also depends on how secure you want to be. If you *really* want to
block access to a URL, then you need to use an access control of some
kind, rather than hoping that the existence of the URL will remain a
secret.

But let's suppose, for the moment, that you're trying for the weak
security approach of creating an unpublished URL, and telling only
your friends about it (i.e not linking to it from any of your public
pages).

If you use robots.txt to "Disallow" explicit paths whose existence was
supposed to be kept hidden like this, then obviously you have now
revealed the existence of those paths to anyone who cares to look.

But if you apply the disallow only in a wildcard fashion, without
revealing explicit URLs, then you haven't given much away. For
example if you "Disallow /private" , then everybody can guess that
you have a hierarchy called "/private", but they still have no idea
what the individual URLs in there are called. As long as you
configure your server to block directory listing or URL guessing (a la
mod_speling) then they'd have a hard time finding anything by chance.

If you "Disallow foo.html", then everybody can suspect that you have
file(s) called foo.html, but they don't know exactly where to look for
them.

But in all of these cases, keep in mind that just a single incautious
mention of one of these hidden URLs - in a published web page or in a
usenet posting or in an archived mailing list etc. - is all that it
takes. robots.txt addresses itself to properly-behaved robots, but it
in no way prevents rogues from doing whatever they please.
  #7  
Old July 24th, 2005, 12:49 AM
Andy Dingley
Guest
 
Posts: n/a
Default Re: What is going on with the Search Engines?

It was somewhere outside Barstow when Steve
<pleasedonotspamme@ireland.com> wrote:
[color=blue]
>Fair comment but I still feel that, in this day and age, it is silly to
>advertise anything about your site / server / records other than what you truly
>want to be public information.[/color]

robots.txt _is_ public information. So is the content to which it
refers.

The function of robots.txt is to describe publically visible resources
so as to identify those which are worth indexing as potential entry
points to the site, and those which are available to the public, but
should not be treated as entry points.

/css/, /scripts/ and /photos/ should probably be in there and
forbidden, because you want these to be served to "the public", but
you don't want them treated as independent entry points to the site.

/intranet/, /extranet/ and /secret_server_config/ can either be in
there or not. If you want to keep these secure, you _must_ have some
independent means to secure them.

It's a basic principle of good security practice that it must not
matter if these "secrets" are identified by robots.txt etc. They must
have their security enforced independently. There's also a slight
recommendation that they shouldn't be listed, because this highlights
their existence for a minor increase in the risk of encouraging attack
(although the flakey _vti_cnf can be assumed to exist anyway, without
needing to be told about it)


I think reading some of Bruce Schneier's work, or Ross Anderson's
book, would be interesting for you.

--
Smert' spamionam
  #8  
Old July 24th, 2005, 12:50 AM
Steve
Guest
 
Posts: n/a
Default Re: What is going on with the Search Engines?

Steve wrote:[color=blue]
> I notice that search engines are now finding robots.txt files and
> catalogue their contents. Is this wise I wonder? Is it a possible
> security risk?
>[/color]
Thanks all. Comments noted re security etc. I still think its silly, and
unnecessary, for the search engines to catalogue the robots.txt files - all
points considered.


Steve
  #9  
Old July 24th, 2005, 12:50 AM
Tim
Guest
 
Posts: n/a
Default Re: What is going on with the Search Engines?

On Tue, 15 Mar 2005 11:29:30 +0000,
"Alan J. Flavell" <flavell@ph.gla.ac.uk> posted:
[color=blue]
> But let's suppose, for the moment, that you're trying for the weak
> security approach of creating an unpublished URL, and telling only
> your friends about it (i.e not linking to it from any of your public
> pages).
>
> ...[snip]...
>
> But in all of these cases, keep in mind that just a single incautious
> mention of one of these hidden URLs - in a published web page or in a
> usenet posting or in an archived mailing list etc. - is all that it
> takes.[/color]

All it takes is for someone to access that page with a certain browser for
it to become public knowledge, never mind deliberate disclosure.

For instance, the Google text advert sponsored version of the Opera web
browser reports the visited URIs back to Google, unless it's of a form that
appears it should be private (something like HTTPS, username and password
authenticated URIs, etc.). And I don't believe for one instant that Google
isn't going to use the information for their own purposes. Databasing is
their business.

And it's not the only thing that works that way. There's various plug-ins
for other browsers which do the same sort of thing, some of which are
installed surrepticiously, or the person simply has no idea about what
they're doing while installing things blindly.

I'd argued on the Opera news groups that this is an irresponsible thing to
do, but they just like to argue back that it's irresponsible for whomever
used URIs in that manner in the first place. Which it is, but that doesn't
excuse what they're doing, too.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles