472,958 Members | 2,143 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,958 software developers and data experts.

What is going on with the Search Engines?

I notice that search engines are now finding robots.txt files and catalogue
their contents. Is this wise I wonder? Is it a possible security risk?

I even found the White House robots.txt file on Google. Surely disclosing
detains of the directory structure is an open invitation for hackers to 'take a
look'?

Does anyone else feel the same way? Should we be bringing this to the search
engines attention?

Comments appreciated.


Steve
Jul 23 '05 #1
8 1641
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?

I even found the White House robots.txt file on Google. Surely
disclosing detains of the directory structure is an open invitation for
hackers to 'take a look'?


What to do search engines have to do with it? robots.txt objects aren't
magically private, you know; you can view them even without a search
engine's help (e.g. http://whitehouse.gov/robots.txt).

Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API.
Jul 23 '05 #2
Leif K-Brooks wrote:
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?

I even found the White House robots.txt file on Google. Surely
disclosing detains of the directory structure is an open invitation
for hackers to 'take a look'?

What to do search engines have to do with it? robots.txt objects aren't
magically private, you know; you can view them even without a search
engine's help (e.g. http://whitehouse.gov/robots.txt).

Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API.


Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.

Steve
Jul 23 '05 #3
Steve wrote:
Leif K-Brooks wrote:
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?


Also, why would your directory structure be a security risk? If it
needs to be listed in a robots.txt, it presumably has a URI, and is
therefor a part of your public Web API.


Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what
you truly want to be public information.


If the directory structure is listed in robots.txt, it could presumably
be found by crawling your site even without robots.txt being available.
Do you propose creating Web sites without any internal links as a
security precaution?
Jul 23 '05 #4
Tim
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents.

They're nearly always *found* that file, as it's put there as a way to try
and limit what they'll do on websites. I'd suspect that deliberately
showing you the robots text file might be due to search engine programmers
not thinking about excluding it. Why not write directly to a search
engine, and point this out to them?

Leif K-Brooks wrote:
Also, why would your directory structure be a security risk? If it needs
to be listed in a robots.txt, it presumably has a URI, and is therefor a
part of your public Web API.

Steve <pl***************@ireland.com> posted:
Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.


Because it's about the only way to limit what some robots will look at on a
site. There really is no way to prevent rogue ones from doing what they
like, though.

Anyway, if a search engine can assess a site, so can any hacker using their
own tools.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
Jul 23 '05 #5
On Tue, 15 Mar 2005, Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?


Possibly. It depends on how you write your robots.txt file.

It also depends on how secure you want to be. If you *really* want to
block access to a URL, then you need to use an access control of some
kind, rather than hoping that the existence of the URL will remain a
secret.

But let's suppose, for the moment, that you're trying for the weak
security approach of creating an unpublished URL, and telling only
your friends about it (i.e not linking to it from any of your public
pages).

If you use robots.txt to "Disallow" explicit paths whose existence was
supposed to be kept hidden like this, then obviously you have now
revealed the existence of those paths to anyone who cares to look.

But if you apply the disallow only in a wildcard fashion, without
revealing explicit URLs, then you haven't given much away. For
example if you "Disallow /private" , then everybody can guess that
you have a hierarchy called "/private", but they still have no idea
what the individual URLs in there are called. As long as you
configure your server to block directory listing or URL guessing (a la
mod_speling) then they'd have a hard time finding anything by chance.

If you "Disallow foo.html", then everybody can suspect that you have
file(s) called foo.html, but they don't know exactly where to look for
them.

But in all of these cases, keep in mind that just a single incautious
mention of one of these hidden URLs - in a published web page or in a
usenet posting or in an archived mailing list etc. - is all that it
takes. robots.txt addresses itself to properly-behaved robots, but it
in no way prevents rogues from doing whatever they please.
Jul 23 '05 #6
It was somewhere outside Barstow when Steve
<pl***************@ireland.com> wrote:
Fair comment but I still feel that, in this day and age, it is silly to
advertise anything about your site / server / records other than what you truly
want to be public information.


robots.txt _is_ public information. So is the content to which it
refers.

The function of robots.txt is to describe publically visible resources
so as to identify those which are worth indexing as potential entry
points to the site, and those which are available to the public, but
should not be treated as entry points.

/css/, /scripts/ and /photos/ should probably be in there and
forbidden, because you want these to be served to "the public", but
you don't want them treated as independent entry points to the site.

/intranet/, /extranet/ and /secret_server_config/ can either be in
there or not. If you want to keep these secure, you _must_ have some
independent means to secure them.

It's a basic principle of good security practice that it must not
matter if these "secrets" are identified by robots.txt etc. They must
have their security enforced independently. There's also a slight
recommendation that they shouldn't be listed, because this highlights
their existence for a minor increase in the risk of encouraging attack
(although the flakey _vti_cnf can be assumed to exist anyway, without
needing to be told about it)
I think reading some of Bruce Schneier's work, or Ross Anderson's
book, would be interesting for you.

--
Smert' spamionam
Jul 23 '05 #7
Steve wrote:
I notice that search engines are now finding robots.txt files and
catalogue their contents. Is this wise I wonder? Is it a possible
security risk?

Thanks all. Comments noted re security etc. I still think its silly, and
unnecessary, for the search engines to catalogue the robots.txt files - all
points considered.
Steve
Jul 23 '05 #8
Tim
On Tue, 15 Mar 2005 11:29:30 +0000,
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> posted:
But let's suppose, for the moment, that you're trying for the weak
security approach of creating an unpublished URL, and telling only
your friends about it (i.e not linking to it from any of your public
pages).

...[snip]...

But in all of these cases, keep in mind that just a single incautious
mention of one of these hidden URLs - in a published web page or in a
usenet posting or in an archived mailing list etc. - is all that it
takes.


All it takes is for someone to access that page with a certain browser for
it to become public knowledge, never mind deliberate disclosure.

For instance, the Google text advert sponsored version of the Opera web
browser reports the visited URIs back to Google, unless it's of a form that
appears it should be private (something like HTTPS, username and password
authenticated URIs, etc.). And I don't believe for one instant that Google
isn't going to use the information for their own purposes. Databasing is
their business.

And it's not the only thing that works that way. There's various plug-ins
for other browsers which do the same sort of thing, some of which are
installed surrepticiously, or the person simply has no idea about what
they're doing while installing things blindly.

I'd argued on the Opera news groups that this is an irresponsible thing to
do, but they just like to argue back that it's irresponsible for whomever
used URIs in that manner in the first place. Which it is, but that doesn't
excuse what they're doing, too.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
Jul 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: phpkid | last post by:
Howdy I've been given conflicting answers about search engines picking up urls like: http://mysite.com/index.php?var1=1&var2=2&var3=3 Do search engines pick up these urls? I've been considering...
1
by: WindAndWaves | last post by:
Hi Gurus How do I provide links that search engines can follow if my site is created in PHP and MySql and I always put the session ID in the link (header)? Are there any special tricks? ...
5
by: ken ullman | last post by:
1. what is robots.txt. I am looking at my log file and have been noticing that this file is not found..who creates it or how does it get created? 2. How do I get my web site to be recognized by...
121
by: typingcat | last post by:
First of all, I'm an Asian and I need to input Japanese, Korean and so on. I've tried many PHP IDEs today, but almost non of them supported Unicode (UTF-8) file. I've found that the only Unicode...
67
by: Sandy.Pittendrigh | last post by:
Here's a question I don't know the answer to: I have a friend who makes very expensive, hand-made bamboo flyrods. He's widely recognized (in the fishing industry) as one of the 3-5 'best' rod...
6
by: erebus- | last post by:
When learning the C programing languages, i have had and still am having the problem of not being able to find answers to many questions. Is their an overall guide/reference that someone knows?
5
by: Amelyan | last post by:
I am struggling here trying to determine what is a good programming practice as far as referencing your URLs. When you use Response.Redirect, do you use 1) Hard-coded string --...
11
by: emailus | last post by:
I am webmaster for the domain <www.alpha1.org.au>. Not being an expert in html, I take advantage of my domain Registrant's web building tool, 'Instant Website'. This tool is provided as part of...
15
by: lucky | last post by:
Hi Guys You are probably my last chance to avoid getting crazy To help you understand my problem I have put images online showing the issue I have: http://www.australix.net/images/pb I...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
2
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.