
July 24th, 2005, 12:49 AM
| | | What is going on with the Search Engines?
I notice that search engines are now finding robots.txt files and catalogue
their contents. Is this wise I wonder? Is it a possible security risk?
I even found the White House robots.txt file on Google. Surely disclosing
detains of the directory structure is an open invitation for hackers to 'take a
look'?
Does anyone else feel the same way? Should we be bringing this to the search
engines attention?
Comments appreciated.
Steve | 
July 24th, 2005, 12:49 AM
| | | Re: What is going on with the Search Engines?
Steve wrote:[color=blue]
> I notice that search engines are now finding robots.txt files and
> catalogue their contents. Is this wise I wonder? Is it a possible
> security risk?
>
> I even found the White House robots.txt file on Google. Surely
> disclosing detains of the directory structure is an open invitation for
> hackers to 'take a look'?[/color]
What to do search engines have to do with it? robots.txt objects aren't
magically private, you know; you can view them even without a search
engine's help (e.g. http://whitehouse.gov/robots.txt).
Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API. | 
July 24th, 2005, 12:49 AM
| | | Re: What is going on with the Search Engines?
Leif K-Brooks wrote:[color=blue]
> Steve wrote:
>[color=green]
>> I notice that search engines are now finding robots.txt files and
>> catalogue their contents. Is this wise I wonder? Is it a possible
>> security risk?
>>
>> I even found the White House robots.txt file on Google. Surely
>> disclosing detains of the directory structure is an open invitation
>> for hackers to 'take a look'?[/color]
>
>
> What to do search engines have to do with it? robots.txt objects aren't
> magically private, you know; you can view them even without a search
> engine's help (e.g. http://whitehouse.gov/robots.txt).
>
> Also, why would your directory structure be a security risk? If it needs
> to be listed in a robots.txt, it presumably has a URI, and is therefor a
> part of your public Web API.[/color]
Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.
Steve | 
July 24th, 2005, 12:49 AM
| | | Re: What is going on with the Search Engines?
Steve wrote:[color=blue]
> Leif K-Brooks wrote:
>[color=green]
>> Steve wrote:
>>[color=darkred]
>>> I notice that search engines are now finding robots.txt files and
>>> catalogue their contents. Is this wise I wonder? Is it a possible
>>> security risk?[/color]
>>
>> Also, why would your directory structure be a security risk? If it
>> needs to be listed in a robots.txt, it presumably has a URI, and is
>> therefor a part of your public Web API.[/color]
>
> Fair comment but I still feel that, in this day and age, it is silly to
> advertise anything about your site / server / records other than what
> you truly want to be public information.[/color]
If the directory structure is listed in robots.txt, it could presumably
be found by crawling your site even without robots.txt being available.
Do you propose creating Web sites without any internal links as a
security precaution? | 
July 24th, 2005, 12:49 AM
| | | Re: What is going on with the Search Engines?
Steve wrote:
[color=blue][color=green][color=darkred]
>>> I notice that search engines are now finding robots.txt files and
>>> catalogue their contents.[/color][/color][/color]
They're nearly always *found* that file, as it's put there as a way to try
and limit what they'll do on websites. I'd suspect that deliberately
showing you the robots text file might be due to search engine programmers
not thinking about excluding it. Why not write directly to a search
engine, and point this out to them?
Leif K-Brooks wrote:
[color=blue][color=green]
>> Also, why would your directory structure be a security risk? If it needs
>> to be listed in a robots.txt, it presumably has a URI, and is therefor a
>> part of your public Web API.[/color][/color]
Steve <pleasedonotspamme@ireland.com> posted:
[color=blue]
> Fair comment but I still feel that, in this day and age, it is silly to
> advertise anything about your site / server / records other than what you truly
> want to be public information.[/color]
Because it's about the only way to limit what some robots will look at on a
site. There really is no way to prevent rogue ones from doing what they
like, though.
Anyway, if a search engine can assess a site, so can any hacker using their
own tools.
--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.
This message was sent without a virus, please delete some files yourself. | 
July 24th, 2005, 12:49 AM
| | | Re: What is going on with the Search Engines?
On Tue, 15 Mar 2005, Steve wrote:
[color=blue]
> I notice that search engines are now finding robots.txt files and
> catalogue their contents. Is this wise I wonder? Is it a possible
> security risk?[/color]
Possibly. It depends on how you write your robots.txt file.
It also depends on how secure you want to be. If you *really* want to
block access to a URL, then you need to use an access control of some
kind, rather than hoping that the existence of the URL will remain a
secret.
But let's suppose, for the moment, that you're trying for the weak
security approach of creating an unpublished URL, and telling only
your friends about it (i.e not linking to it from any of your public
pages).
If you use robots.txt to "Disallow" explicit paths whose existence was
supposed to be kept hidden like this, then obviously you have now
revealed the existence of those paths to anyone who cares to look.
But if you apply the disallow only in a wildcard fashion, without
revealing explicit URLs, then you haven't given much away. For
example if you "Disallow /private" , then everybody can guess that
you have a hierarchy called "/private", but they still have no idea
what the individual URLs in there are called. As long as you
configure your server to block directory listing or URL guessing (a la
mod_speling) then they'd have a hard time finding anything by chance.
If you "Disallow foo.html", then everybody can suspect that you have
file(s) called foo.html, but they don't know exactly where to look for
them.
But in all of these cases, keep in mind that just a single incautious
mention of one of these hidden URLs - in a published web page or in a
usenet posting or in an archived mailing list etc. - is all that it
takes. robots.txt addresses itself to properly-behaved robots, but it
in no way prevents rogues from doing whatever they please. | 
July 24th, 2005, 12:49 AM
| | | Re: What is going on with the Search Engines?
It was somewhere outside Barstow when Steve
<pleasedonotspamme@ireland.com> wrote:
[color=blue]
>Fair comment but I still feel that, in this day and age, it is silly to
>advertise anything about your site / server / records other than what you truly
>want to be public information.[/color]
robots.txt _is_ public information. So is the content to which it
refers.
The function of robots.txt is to describe publically visible resources
so as to identify those which are worth indexing as potential entry
points to the site, and those which are available to the public, but
should not be treated as entry points.
/css/, /scripts/ and /photos/ should probably be in there and
forbidden, because you want these to be served to "the public", but
you don't want them treated as independent entry points to the site.
/intranet/, /extranet/ and /secret_server_config/ can either be in
there or not. If you want to keep these secure, you _must_ have some
independent means to secure them.
It's a basic principle of good security practice that it must not
matter if these "secrets" are identified by robots.txt etc. They must
have their security enforced independently. There's also a slight
recommendation that they shouldn't be listed, because this highlights
their existence for a minor increase in the risk of encouraging attack
(although the flakey _vti_cnf can be assumed to exist anyway, without
needing to be told about it)
I think reading some of Bruce Schneier's work, or Ross Anderson's
book, would be interesting for you.
--
Smert' spamionam | 
July 24th, 2005, 12:50 AM
| | | Re: What is going on with the Search Engines?
Steve wrote:[color=blue]
> I notice that search engines are now finding robots.txt files and
> catalogue their contents. Is this wise I wonder? Is it a possible
> security risk?
>[/color]
Thanks all. Comments noted re security etc. I still think its silly, and
unnecessary, for the search engines to catalogue the robots.txt files - all
points considered.
Steve | 
July 24th, 2005, 12:50 AM
| | | Re: What is going on with the Search Engines?
On Tue, 15 Mar 2005 11:29:30 +0000,
"Alan J. Flavell" <flavell@ph.gla.ac.uk> posted:
[color=blue]
> But let's suppose, for the moment, that you're trying for the weak
> security approach of creating an unpublished URL, and telling only
> your friends about it (i.e not linking to it from any of your public
> pages).
>
> ...[snip]...
>
> But in all of these cases, keep in mind that just a single incautious
> mention of one of these hidden URLs - in a published web page or in a
> usenet posting or in an archived mailing list etc. - is all that it
> takes.[/color]
All it takes is for someone to access that page with a certain browser for
it to become public knowledge, never mind deliberate disclosure.
For instance, the Google text advert sponsored version of the Opera web
browser reports the visited URIs back to Google, unless it's of a form that
appears it should be private (something like HTTPS, username and password
authenticated URIs, etc.). And I don't believe for one instant that Google
isn't going to use the information for their own purposes. Databasing is
their business.
And it's not the only thing that works that way. There's various plug-ins
for other browsers which do the same sort of thing, some of which are
installed surrepticiously, or the person simply has no idea about what
they're doing while installing things blindly.
I'd argued on the Opera news groups that this is an irresponsible thing to
do, but they just like to argue back that it's irresponsible for whomever
used URIs in that manner in the first place. Which it is, but that doesn't
excuse what they're doing, too.
--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.
This message was sent without a virus, please delete some files yourself. |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over network members.
|