473,772 Members | 2,391 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

RFC3986, backslash in URI/URLs


[Sorry, there isn't a newsgroup for discussing URLs as such - this
seemed a reasonably on-topic place to discuss it...?]

The story so far: on somewhat unrelated newsgroup, my attention
fell upon the URL:
http://www.speedtouchdsl.com/prod706.htm
which contains a link to the purported URL:
http://www.speedtouchdsl.com/pdf\dat...06WL-780WL.pdf

Comparing the latter with other URLs in that area, it appeared that
the "\" was a probable blunder for "/". However, since their web
server is IIS, it appears that their server silently fixes-up this
blunder[1], and delivers the intended content. My recollection of
RFC1738 was that an unencoded "\" ought not to appear in a URL, so I
was initially inclined to rate this URL as broken...

However, this then led me down the trail of RFC2396, which 'updates
and merges "Uniform Resource Locators" [RFC1738] and "Relative Uniform
Resource Locators" [RFC1808]', and RFC3986, which "obsoletes rfc 1808
and updates rfc 1738".

In RFC2396 2.4.3, the backslash is listed under "Excluded US-ASCII
characters", under the subcategory of "unwise", with the "must"
requirement:

|Data corresponding to excluded characters must be escaped in order to
|be properly represented within a URI.

So far, so good.

But in RFC3986, this character "\" seems to have been stealthily
dropped from the list of characters needint to be escaped. I find no
mention of this change in Appendix D, "Changes from RFC2396".

The only substantive mention of "\" which I can find is in section 7.3
under the main heading of "7. Security Considerations" :

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]"

Aside from this potential security exposure, it appears to me that the
cited URL, which I would like to have categorised as defective, would
be rated as OK by this latest RFC. And since the server returns the
desired resource when this misbegotten URL is presented, I can't even
rate it as a blunder - can I?

Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the changes?

In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?

regards

[1] Of course, this isn't a situation that I meet in my own
serveradmin-ing using Apache. If the author codes "\" instead of "/"
in a URL, and attempts to follow the link with a www-conforming
browser, the link does not work. If they use IE instead, however, it
appears that it silently fixes-up the error on the *client* side. It
seems from my tests that IE6 makes no attempt to access the cited URL
directly - it replaces the "\" by "/" before even trying (whereas
Mozilla replaces the "\" by "%5C", after which, Apache, he say "no").

So it looks as if MS give themselves two bites at this fuxup: once in
their browser-like object, and once in their web server.

(Another reason why authors are misguided if they use MS software as
their only test of their web pages. But I digress.)

--
Jun 10 '06 #1
19 6348
Alan J. Flavell inquired:

In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped | >
list?

Shouldn't it?
I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it is
to take its "normal" value in some expression.
It seems from my tests that IE6 makes no attempt to access the cited
URL directly - it replaces the "\" by "/" before even trying


Yes, IE mangles URLs from the address-bar in several ways before sending
them off over the interweb.

--
Jack.

[1] The expressions "obviously. .." and "it's obvious that..." are
frequently encountered when the author is about to perpetrate some
inadvertent fallacy.
Jun 10 '06 #2
Jack wrote:
Alan J. Flavell inquired:

In http://lists.w3.org/Archives/Public/...5May/0004.html , I
found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped | >
list?

Shouldn't it?


I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it is
to take its "normal" value in some expression.


How can you mean "regardless of language", given that which characters
are escape characters depends entirely on what language is in use?
Jun 10 '06 #3
Alan J. Flavell wrote:
The only substantive mention of "\" which I can find is in section 7.3
under the main heading of "7. Security Considerations" :

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]"

Aside from this potential security exposure, it appears to me that the
cited URL, which I would like to have categorised as defective, would
be rated as OK by this latest RFC. And since the server returns the
desired resource when this misbegotten URL is presented, I can't even
rate it as a blunder - can I?

Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the changes?


May I ask what the source of risk is? You mention that backslash being
the path delimiter on a back-end file system, but that can't be the
problem, since the forward slash is the path delimiter on other file
systems, and the interpretation of the forward slash in URIs as a path
delimiter doesn't create risk on that account.
Jun 10 '06 #4
Harlan Messinger <hm************ *******@comcast .net> scripsit:
Jack wrote:

- -
I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it
is to take its "normal" value in some expression.


How can you mean "regardless of language", given that which characters
are escape characters depends entirely on what language is in use?


Well, I'd say that the _principle_ of escaping an escape character,
regardless of language (notation), is adequate within broad limits. Jack's
error is that he assumes that the backslash is an escape character in the
"language" of URLs, i.e. URL syntax.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jun 10 '06 #5
On Sat, 10 Jun 2006, Harlan Messinger wrote:
Alan J. Flavell wrote:
The only substantive mention of "\" which I can find is in section
7.3 under the main heading of "7. Security Considerations" :

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]" [...]
Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the
changes?
May I ask what the source of risk is? You mention that backslash
being the path delimiter on a back-end file system,


Well, I might have done so, if I had thought about it; but in fairness
it wasn't *my* mention, it was a quote from the RFC. :-}
but that can't be the problem, since
the forward slash is the path delimiter on other file systems,
I don't agree. In principle, the "/" has a defined meaning in a URL
(it's a hierarchy separator, if I can put it loosely), and anyone
interpreting a URL is required to attribute that meaning to it - no
matter what their local file system separator might be.

Whereas "\" has no defined meaning in the structure of a URL, and
could (given an insufficiently paranoid parser) possibly find its way
into a filesystem reference. Which could have significant
consequences on, say, Windows.
and the interpretation of the forward slash in URIs as a path
delimiter doesn't create risk on that account.


Because the URL "/" (the one that functions as a URL hierarchy
separator) never gets that far. By then it would have been turned
into the filesystem hiararchy separator, whatever that might be.
Yes, it might sometimes be "/", but don't let that fool you. It might
just as well been turned into ":" for a different filesystem, or into
a hierarchical database query or whatever, in the general case.

I think that's the sort of thing that the RFC authors have in mind,
anyway.
Jun 10 '06 #6
Harlan Messinger wrote:
Jack wrote:
Alan J. Flavell inquired:

In
http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?


I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it
is to take its "normal" value in some expression.


How can you mean "regardless of language", given that which
characters are escape characters depends entirely on what language is
in use?


Try this:

"For any language x, that set of characters which are escape characters
in x should themselvesd be escaped if they are to take their normal
values in some expression."

I thought that was obviously my meaning, and it seems to require some
perverse gymnastics to get my original utterance to mean something
different.

--
Jack.
Jun 10 '06 #7
Alan J. Flavell
In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?


I suppose you could say backslashes *are* included in the
must-be-escaped list, if you recognise the list as implied. The
explicit list seems to have been silently dropped: the word 'excluded'
appears in RFC3986 only in unrelated contexts, and I can't find mention
of this removal anywhere in the changeover notes:

http://www.gbiv.com/protocols/uri/rev-2002/issues.html

Anyway, backslashes still can't occur in URLs, since no production
allows them.

--
Jock

Jun 10 '06 #8
John Dunlop wrote:
Alan J. Flavell
In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?


I suppose you could say backslashes *are* included in the
must-be-escaped list, if you recognise the list as implied. The
explicit list seems to have been silently dropped: the word 'excluded'
appears in RFC3986 only in unrelated contexts, and I can't find mention
of this removal anywhere in the changeover notes:

http://www.gbiv.com/protocols/uri/rev-2002/issues.html

Anyway, backslashes still can't occur in URLs, since no production
allows them.


The real questions are: What is the meaning of a back-slash in a URL?
Does it have a special meaning, the way reserved characters (/, $, &, ?,
etc) have? If it has a special meaning, where is it documented so that
browser developers and Web page authors will know about it?

Note that it cannot just another character in the name of a path or
file. RFC 3986, Appendix A, indicates a path can have a name consisting
only of alphabetic characters, numerals, -, +, ., _, ~, and
percent-encoded characters. A path may also have @, :, and certain
reserved characters; but all these have special meainings within a path
(taking us back to the third question in my first paragraph).

In a very loose sense, percent-encoding is a form of escaping a
character. However, a percent-encoded character might have a different
meaing in a URL than the related literal character. For example, "%25"
represents the character "%". Obviously, the former (just a character
in a string of characters) is not treated the same as the latter (the
signal for percent-encoding).

I have yet to see a use of back-slash in a URL that was not an error,
generally a typo by the Web page author.

--

David E. Ross
<http://www.rossde.com/>

Concerned about someone (e.g., Pres. Bush) snooping
into your E-mail? Use PGP.
See my <http://www.rossde.com/PGP/>
Jun 11 '06 #9
David E. Ross:
The real questions are:
I think these are different points of discussion, no more real or
imaginary than the original, but more off-topic. The original was
about the status of backslashes wrt URLs, which has a direct bearing on
whether or not a doc violates or conforms to the spec.
What is the meaning of a back-slash in a URL?
Since a sequence of characters containing a backslash can't be a URL,
I'll take that as a backslash percent-encoded, %5C. My answer then
would be it means pretty much whatever you want it to mean.
Does it have a special meaning, the way reserved characters (/, $, &, ?,
etc) have?
No, else it would be included in the Reserved set and its use would
be documented in RFC3986.
A path may also have @, :, and certain reserved characters; but all these have
special meainings within a path


Only scheme-specifically. 'Aside from dot-segments in hierarchical
paths, a path segment is considered opaque by the generic syntax.'
(sec. 3.3)

--
Jock

Jun 11 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
6914
by: Aloysio Figueiredo | last post by:
I need to replace every ocurrence of '/' in s by '\/' in order to create a file named s. My first attempt was: s = '\/'.join(s.split('/')) but it doesn't work: >>> s = 'a/b' >>> s = '\/'.join(s.split('/'))
3
22234
by: Terry Asher | last post by:
The following script does NOT escape the Apostrophe. Meaning when you mouseover the image the Alt tag says this: DMACC, It and then it stops. <SCRIPT Language="JavaScript"> var pos = "DMACC, It\'s the Smart Thing to Do."; document.write("<img name=img5 id=img5 src='/homepage/dmaccstudent" + Math.floor(Math.random() *20) + ".jpg' WIDTH=145 HEIGHT=230 border=0 ALT='"+pos+"'>"); </SCRIPT>
3
5755
by: Sathyaish | last post by:
In trying to replace character literals for their char constant, I am having difficulty printing the char constant for backslash. It instead prints the char literal. How do I resovle this? #include <stdio.h> /*A program that reads keyboard input and reproduces it on the monitor with some modifications. It displays all newline character occurences as \n, tabs as \t and
2
3247
by: John Dann | last post by:
I guess there must be some convention or Windows specification for whether the backslash immediately preceding the file name in a full path string to a file is formally part of the path string or of the file name. I suppose the options are: 1. Part of the path string, ie all returned path strings should have a trailing backslash. 2. Part of the file name, ie all file names should start with a backslash.
3
23013
by: Stef Mientki | last post by:
It looks like sometimes a single backslash is replaced by a double backslash, but sometimes it's not ??? See the error message below, the first backslash is somewhere (not explicitly in my code) replaced, but the second is not ??? Is it in general better to use double backslash in filepaths ? thanks, Stef Mientki
2
2181
by: Tobiah | last post by:
>>"'" "'" "'" "\\'" "\\'" This is quite different than any other language that I am used to. Normally, a double backslash takes away the special meaning of the last backslash, and so you are left with a single backslash.
5
15693
by: vlsidesign | last post by:
The printf function returns "warning: unknown escape sequence: \040" for a backslash-space combination. If the ascii decimal number for space is 32 and the backslash is 92, why this particular number 040? Is it a decimal number from the ASCII code chart? (compiling using gcc on SunOS 5.8, Sparc, Ultra-80) Just curious. Thanks.
4
2276
by: Razzbar | last post by:
I'm working on a bookmarklet that grabs information from a page and submits it to a server. Yet another social bookmarking application. I'm having trouble with page titles that include an apostrophe. I'm using encodeURIComponent() around the page title, and again around the URL. Apparently the browser is inserting a backslash before any apostrophe. I can see that when I write the $_GET data to a file in PHP on the server. When the GET...
4
3623
nithinpes
by: nithinpes | last post by:
I will boil down my exact requirement to this: I should print out lines that do not contain semi-colon, backslash and closing parentheses. The following one -liner works fine. perl -ne "unless(//) {print}" in.txt > out.txt Consider the following sample data: Msg_create(….); \ PSLogI18N\ Free( ….)
0
9454
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10261
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10038
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9911
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8934
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5482
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4007
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3609
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2850
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.