Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old July 24th, 2005, 12:53 AM
techguy_chicago@yahoo.com
Guest
 
Posts: n/a
Default why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

I just noticed on my website, with a link checker, that I have a bunch
of URL's that reference a directory *above* my document root directory,
but IE/Firefox/Opera never let on - they just seem to ignore the '../'
I have in front of my links. Can this behavior be correct?

So, my page is at this URL:

http://www.mydomain.com/links.html

And one of the links on that page, which has no 'base href' tags or
anything else, says:

<a href="../somedir/somepage.html">Link here</a>

My doc root is here:

/www

And my 'somedir' is here:

/www/somedir

but the URL, that I would expect to be broken, is not - it refers to:

/somedir

but the browser ignores the '../' directory references, apparently,
once it reaches document root, and then dives down. In the case above,
the initial page was served from document root, so there's no place
left to go, but down.
[color=blue]
>From quick testing, it also seems I can have a link with the following[/color]
that would *still* work:

<a href="../../../../../../../../../somedir/somepage.html">Link
here</a>

It just doesn't seem right. Basically, if the URL references something
higher than document root, then ignore that part of the URL?

I'm all for leniency, but this just doesn't make any sense to me. Do I
have it right? That the browsers just say 'ah, we knew what she meant
anyways'?

  #2  
Old July 24th, 2005, 12:53 AM
Ståle Sæbøe
Guest
 
Posts: n/a
Default Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

techguy_chicago@yahoo.com wrote:
[color=blue]
> I'm all for leniency, but this just doesn't make any sense to me. Do I
> have it right? That the browsers just say 'ah, we knew what she meant
> anyways'?
>[/color]
Well ... sort of. The browser gets the root directory from the server,
so it knows where it is relative to that, and it knows how high it can
go, then translates the URL to a valid one. This is essential if you
want to create portable web sites without rewriting all the links every
time you move it. Discarding abundant ../ is practical because your
server would probably not allow the user to browse your entire
filesystem anyway. I do not know if a user agent is required to do so or
should try to browse the server filesystem for a valid path above the
website root. The latter would probably be serious security flaw in my
opinion. If you need visitors to access files and folders outside the
website root folder you can use virtual folders (at least on IIS), but I
advice against it. It is much more practical to keep all web files in
your web root.

I am sure many of the participants here can give you a much more
detailed, and technical, information about this question.
  #3  
Old July 24th, 2005, 12:53 AM
John Dunlop
Guest
 
Posts: n/a
Default Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

Somebody wrote:
[color=blue]
> So, my page is at this URL:
>
> http://www.mydomain.com/links.html[/color]

Please use host names from RFC2606 in example URIs.

http://www.ietf.org/rfc/rfc2606
[color=blue]
> And one of the links on that page, which has no 'base href' tags or
> anything else, says:
>
> <a href="../somedir/somepage.html">Link here</a>[/color]

With a base URI of

http://host.invalid/

the abnormal relative-path reference ../foo/bar resolves to

http://host.invalid/foo/bar

RFC3986 sec. 5.2 describes an example algorithm for this; in
particular, sec. 5.2.4 offers one way of removing 'dot
segments'. More, sec. 5.4.2 shows abnormal examples, the
first of which might be of interest to you.

http://www.ietf.org/rfc/rfc3986
[color=blue]
> My doc root[/color]

You're confusing URI paths with filesystem paths.

--
Jock
  #4  
Old July 24th, 2005, 12:54 AM
Stan Brown
Guest
 
Posts: n/a
Default Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

"" wrote in comp.infosystems.www.authoring.html:[color=blue]
>So, my page is at this URL:
>
> http://www.mydomain.com/links.html[/color]

Not Found

The requested URL /links.html was not found on this server.

--

Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
  #5  
Old July 24th, 2005, 12:54 AM
Stan Brown
Guest
 
Posts: n/a
Default Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

"John Dunlop" wrote in comp.infosystems.www.authoring.html:[color=blue]
>Somebody wrote:
>[color=green]
>> So, my page is at this URL:
>>
>> http://www.mydomain.com/links.html[/color]
>
>Please use host names from RFC2606 in example URIs.[/color]

Or better yet, post the actual URL!

--

Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
  #6  
Old July 24th, 2005, 12:54 AM
David Ross
Guest
 
Posts: n/a
Default Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

John Dunlop wrote:[color=blue]
>
> Somebody wrote:
>[color=green]
> > So, my page is at this URL:
> >
> > http://www.mydomain.com/links.html[/color]
>
> Please use host names from RFC2606 in example URIs.
>
> http://www.ietf.org/rfc/rfc2606
>[color=green]
> > And one of the links on that page, which has no 'base href' tags or
> > anything else, says:
> >
> > <a href="../somedir/somepage.html">Link here</a>[/color]
>
> With a base URI of
>
> http://host.invalid/
>
> the abnormal relative-path reference ../foo/bar resolves to
>
> http://host.invalid/foo/bar
>
> RFC3986 sec. 5.2 describes an example algorithm for this; in
> particular, sec. 5.2.4 offers one way of removing 'dot
> segments'. More, sec. 5.4.2 shows abnormal examples, the
> first of which might be of interest to you.
>
> http://www.ietf.org/rfc/rfc3986
>[color=green]
> > My doc root[/color]
>
> You're confusing URI paths with filesystem paths.[/color]

However, if the reference is from a page NOT at the base, ../ at
the beginning of a relative path is indeed meaningful. Thus, my
own <URL:http://www.rossde.com/garden/diary/JanFeb05.html> contains
the following references:

<../garden_back.html>, which translates as
<URL:http://www.rossde.com/garden/garden_back.html>

<../../viewing_site.html>, which translates as
<URL:http://www.rossde.com/viewing_site.html>

The ../ is ignored only when it would translate to a path higher
than the base allowed by your server. Thus, there is an implied
base if you do not specify one.

--

David E. Ross
<URL:http://www.rossde.com/>

I use Mozilla as my Web browser because I want a browser that
complies with Web standards. See <URL:http://www.mozilla.org/>.
  #7  
Old July 24th, 2005, 12:54 AM
Harlan Messinger
Guest
 
Posts: n/a
Default Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

John Dunlop wrote:[color=blue]
> Somebody wrote:
>
>[color=green]
>>So, my page is at this URL:
>>
>> http://www.mydomain.com/links.html[/color]
>
>
> Please use host names from RFC2606 in example URIs.
>
> http://www.ietf.org/rfc/rfc2606
>[/color]

This is good to know--I didn't before--but this person isn't creating a
test suite that runs the risk of conflicting eventually with a real host
name on the public internet. It's just a written example.

[snip][color=blue]
> With a base URI of
>
> http://host.invalid/
>
> the abnormal relative-path reference ../foo/bar resolves to
>
> http://host.invalid/foo/bar
>
> RFC3986 sec. 5.2 describes an example algorithm for this; in
> particular, sec. 5.2.4 offers one way of removing 'dot
> segments'. More, sec. 5.4.2 shows abnormal examples, the
> first of which might be of interest to you.
>
> http://www.ietf.org/rfc/rfc3986
>
>[color=green]
>>My doc root[/color]
>
> You're confusing URI paths with filesystem paths.
>[/color]
I don't know about other servers, but IIS automatically maps URI path
components to like-named file system path components unless you
explicitly configure the subpaths otherwise. This applies as well to
.../, except that IIS can be set either to allow paths to places above
the host root or not.
  #8  
Old July 24th, 2005, 12:54 AM
Tim
Guest
 
Posts: n/a
Default Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

Somebody wrote:
[color=blue][color=green][color=darkred]
>>> So, my page is at this URL:
>>>
>>> http://www.mydomain.com/links.html[/color][/color][/color]

John Dunlop wrote:
[color=blue][color=green]
>> Please use host names from RFC2606 in example URIs.
>>
>> http://www.ietf.org/rfc/rfc2606[/color][/color]

Harlan Messinger <hmessinger.removethis@comcast.net> posted:
[color=blue]
> This is good to know--I didn't before--but this person isn't creating a
> test suite that runs the risk of conflicting eventually with a real host
> name on the public internet. It's just a written example.[/color]

But what they've done is write an example down somewhere where it'll be
databased.

Should someone actually own the allegedly faked domain name (which people
often don't check whether someone else really owns it), they can end up
causing unwanted traffic at that website (as robots index the posts, and
follow any links, as well as people trying out the links in the posts as
they're reading them).

The last things the owner of domain.com wants is a few thousand people
trying some example link to see why it doesn't do what the poster is trying
to do, when the poster's problem is really somewhere else.
[color=blue][color=green]
>> You're confusing URI paths with filesystem paths.[/color][/color]
[color=blue]
> I don't know about other servers, but IIS automatically maps URI path
> components to like-named file system path components unless you
> explicitly configure the subpaths otherwise. This applies as well to
> ../, except that IIS can be set either to allow paths to places above
> the host root or not.[/color]

Being able to escape from the root is a severe security breach. URIs
should only map to filepaths in a manner that's strictly controlled by the
server configuration. You don't want complete strangers being able to
specify any path that they like on your system, to read any file that they
like, merely by backing out of the server far enough.

Anybody reading this thread and contemplating it needs to spend quite some
time reading about why that's a seriously bad idea until they've been
convinced not to do it. I can't think of a single example of where it'd be
a good idea.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles