[Sorry, there isn't a newsgroup for discussing URLs as such - this
seemed a reasonably on-topic place to discuss it...?]
The story so far: on somewhat unrelated newsgroup, my attention
fell upon the URL:
http://www.speedtouchdsl.com/prod706.htm
which contains a link to the purported URL:
http://www.speedtouchdsl.com/pdf\dat...06WL-780WL.pdf
Comparing the latter with other URLs in that area, it appeared that
the "\" was a probable blunder for "/". However, since their web
server is IIS, it appears that their server silently fixes-up this
blunder[1], and delivers the intended content. My recollection of
RFC1738 was that an unencoded "\" ought not to appear in a URL, so I
was initially inclined to rate this URL as broken...
However, this then led me down the trail of RFC2396, which 'updates
and merges "Uniform Resource Locators" [RFC1738] and "Relative Uniform
Resource Locators" [RFC1808]', and RFC3986, which "obsoletes rfc 1808
and updates rfc 1738".
In RFC2396 2.4.3, the backslash is listed under "Excluded US-ASCII
characters", under the subcategory of "unwise", with the "must"
requirement:
|Data corresponding to excluded characters must be escaped in order to
|be properly represented within a URI.
So far, so good.
But in RFC3986, this character "\" seems to have been stealthily
dropped from the list of characters needint to be escaped. I find no
mention of this change in Appendix D, "Changes from RFC2396".
The only substantive mention of "\" which I can find is in section 7.3
under the main heading of "7. Security Considerations" :
|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]"
Aside from this potential security exposure, it appears to me that the
cited URL, which I would like to have categorised as defective, would
be rated as OK by this latest RFC. And since the server returns the
desired resource when this misbegotten URL is presented, I can't even
rate it as a blunder - can I?
Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the changes?
In
http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -
| > Shouldn't backslash itself be included in the must-be-escaped
| > list?
Shouldn't it?
regards
[1] Of course, this isn't a situation that I meet in my own
serveradmin-ing using Apache. If the author codes "\" instead of "/"
in a URL, and attempts to follow the link with a www-conforming
browser, the link does not work. If they use IE instead, however, it
appears that it silently fixes-up the error on the *client* side. It
seems from my tests that IE6 makes no attempt to access the cited URL
directly - it replaces the "\" by "/" before even trying (whereas
Mozilla replaces the "\" by "%5C", after which, Apache, he say "no").
So it looks as if MS give themselves two bites at this fuxup: once in
their browser-like object, and once in their web server.
(Another reason why authors are misguided if they use MS software as
their only test of their web pages. But I digress.)
--