473,587 Members | 2,637 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

RFC3986, backslash in URI/URLs


[Sorry, there isn't a newsgroup for discussing URLs as such - this
seemed a reasonably on-topic place to discuss it...?]

The story so far: on somewhat unrelated newsgroup, my attention
fell upon the URL:
http://www.speedtouchdsl.com/prod706.htm
which contains a link to the purported URL:
http://www.speedtouchdsl.com/pdf\dat...06WL-780WL.pdf

Comparing the latter with other URLs in that area, it appeared that
the "\" was a probable blunder for "/". However, since their web
server is IIS, it appears that their server silently fixes-up this
blunder[1], and delivers the intended content. My recollection of
RFC1738 was that an unencoded "\" ought not to appear in a URL, so I
was initially inclined to rate this URL as broken...

However, this then led me down the trail of RFC2396, which 'updates
and merges "Uniform Resource Locators" [RFC1738] and "Relative Uniform
Resource Locators" [RFC1808]', and RFC3986, which "obsoletes rfc 1808
and updates rfc 1738".

In RFC2396 2.4.3, the backslash is listed under "Excluded US-ASCII
characters", under the subcategory of "unwise", with the "must"
requirement:

|Data corresponding to excluded characters must be escaped in order to
|be properly represented within a URI.

So far, so good.

But in RFC3986, this character "\" seems to have been stealthily
dropped from the list of characters needint to be escaped. I find no
mention of this change in Appendix D, "Changes from RFC2396".

The only substantive mention of "\" which I can find is in section 7.3
under the main heading of "7. Security Considerations" :

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]"

Aside from this potential security exposure, it appears to me that the
cited URL, which I would like to have categorised as defective, would
be rated as OK by this latest RFC. And since the server returns the
desired resource when this misbegotten URL is presented, I can't even
rate it as a blunder - can I?

Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the changes?

In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?

regards

[1] Of course, this isn't a situation that I meet in my own
serveradmin-ing using Apache. If the author codes "\" instead of "/"
in a URL, and attempts to follow the link with a www-conforming
browser, the link does not work. If they use IE instead, however, it
appears that it silently fixes-up the error on the *client* side. It
seems from my tests that IE6 makes no attempt to access the cited URL
directly - it replaces the "\" by "/" before even trying (whereas
Mozilla replaces the "\" by "%5C", after which, Apache, he say "no").

So it looks as if MS give themselves two bites at this fuxup: once in
their browser-like object, and once in their web server.

(Another reason why authors are misguided if they use MS software as
their only test of their web pages. But I digress.)

--
Jun 10 '06 #1
19 6311
Alan J. Flavell inquired:

In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped | >
list?

Shouldn't it?
I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it is
to take its "normal" value in some expression.
It seems from my tests that IE6 makes no attempt to access the cited
URL directly - it replaces the "\" by "/" before even trying


Yes, IE mangles URLs from the address-bar in several ways before sending
them off over the interweb.

--
Jack.

[1] The expressions "obviously. .." and "it's obvious that..." are
frequently encountered when the author is about to perpetrate some
inadvertent fallacy.
Jun 10 '06 #2
Jack wrote:
Alan J. Flavell inquired:

In http://lists.w3.org/Archives/Public/...5May/0004.html , I
found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped | >
list?

Shouldn't it?


I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it is
to take its "normal" value in some expression.


How can you mean "regardless of language", given that which characters
are escape characters depends entirely on what language is in use?
Jun 10 '06 #3
Alan J. Flavell wrote:
The only substantive mention of "\" which I can find is in section 7.3
under the main heading of "7. Security Considerations" :

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]"

Aside from this potential security exposure, it appears to me that the
cited URL, which I would like to have categorised as defective, would
be rated as OK by this latest RFC. And since the server returns the
desired resource when this misbegotten URL is presented, I can't even
rate it as a blunder - can I?

Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the changes?


May I ask what the source of risk is? You mention that backslash being
the path delimiter on a back-end file system, but that can't be the
problem, since the forward slash is the path delimiter on other file
systems, and the interpretation of the forward slash in URIs as a path
delimiter doesn't create risk on that account.
Jun 10 '06 #4
Harlan Messinger <hm************ *******@comcast .net> scripsit:
Jack wrote:

- -
I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it
is to take its "normal" value in some expression.


How can you mean "regardless of language", given that which characters
are escape characters depends entirely on what language is in use?


Well, I'd say that the _principle_ of escaping an escape character,
regardless of language (notation), is adequate within broad limits. Jack's
error is that he assumes that the backslash is an escape character in the
"language" of URLs, i.e. URL syntax.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jun 10 '06 #5
On Sat, 10 Jun 2006, Harlan Messinger wrote:
Alan J. Flavell wrote:
The only substantive mention of "\" which I can find is in section
7.3 under the main heading of "7. Security Considerations" :

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]" [...]
Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the
changes?
May I ask what the source of risk is? You mention that backslash
being the path delimiter on a back-end file system,


Well, I might have done so, if I had thought about it; but in fairness
it wasn't *my* mention, it was a quote from the RFC. :-}
but that can't be the problem, since
the forward slash is the path delimiter on other file systems,
I don't agree. In principle, the "/" has a defined meaning in a URL
(it's a hierarchy separator, if I can put it loosely), and anyone
interpreting a URL is required to attribute that meaning to it - no
matter what their local file system separator might be.

Whereas "\" has no defined meaning in the structure of a URL, and
could (given an insufficiently paranoid parser) possibly find its way
into a filesystem reference. Which could have significant
consequences on, say, Windows.
and the interpretation of the forward slash in URIs as a path
delimiter doesn't create risk on that account.


Because the URL "/" (the one that functions as a URL hierarchy
separator) never gets that far. By then it would have been turned
into the filesystem hiararchy separator, whatever that might be.
Yes, it might sometimes be "/", but don't let that fool you. It might
just as well been turned into ":" for a different filesystem, or into
a hierarchical database query or whatever, in the general case.

I think that's the sort of thing that the RFC authors have in mind,
anyway.
Jun 10 '06 #6
Harlan Messinger wrote:
Jack wrote:
Alan J. Flavell inquired:

In
http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?


I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it
is to take its "normal" value in some expression.


How can you mean "regardless of language", given that which
characters are escape characters depends entirely on what language is
in use?


Try this:

"For any language x, that set of characters which are escape characters
in x should themselvesd be escaped if they are to take their normal
values in some expression."

I thought that was obviously my meaning, and it seems to require some
perverse gymnastics to get my original utterance to mean something
different.

--
Jack.
Jun 10 '06 #7
Alan J. Flavell
In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?


I suppose you could say backslashes *are* included in the
must-be-escaped list, if you recognise the list as implied. The
explicit list seems to have been silently dropped: the word 'excluded'
appears in RFC3986 only in unrelated contexts, and I can't find mention
of this removal anywhere in the changeover notes:

http://www.gbiv.com/protocols/uri/rev-2002/issues.html

Anyway, backslashes still can't occur in URLs, since no production
allows them.

--
Jock

Jun 10 '06 #8
John Dunlop wrote:
Alan J. Flavell
In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?


I suppose you could say backslashes *are* included in the
must-be-escaped list, if you recognise the list as implied. The
explicit list seems to have been silently dropped: the word 'excluded'
appears in RFC3986 only in unrelated contexts, and I can't find mention
of this removal anywhere in the changeover notes:

http://www.gbiv.com/protocols/uri/rev-2002/issues.html

Anyway, backslashes still can't occur in URLs, since no production
allows them.


The real questions are: What is the meaning of a back-slash in a URL?
Does it have a special meaning, the way reserved characters (/, $, &, ?,
etc) have? If it has a special meaning, where is it documented so that
browser developers and Web page authors will know about it?

Note that it cannot just another character in the name of a path or
file. RFC 3986, Appendix A, indicates a path can have a name consisting
only of alphabetic characters, numerals, -, +, ., _, ~, and
percent-encoded characters. A path may also have @, :, and certain
reserved characters; but all these have special meainings within a path
(taking us back to the third question in my first paragraph).

In a very loose sense, percent-encoding is a form of escaping a
character. However, a percent-encoded character might have a different
meaing in a URL than the related literal character. For example, "%25"
represents the character "%". Obviously, the former (just a character
in a string of characters) is not treated the same as the latter (the
signal for percent-encoding).

I have yet to see a use of back-slash in a URL that was not an error,
generally a typo by the Web page author.

--

David E. Ross
<http://www.rossde.com/>

Concerned about someone (e.g., Pres. Bush) snooping
into your E-mail? Use PGP.
See my <http://www.rossde.com/PGP/>
Jun 11 '06 #9
David E. Ross:
The real questions are:
I think these are different points of discussion, no more real or
imaginary than the original, but more off-topic. The original was
about the status of backslashes wrt URLs, which has a direct bearing on
whether or not a doc violates or conforms to the spec.
What is the meaning of a back-slash in a URL?
Since a sequence of characters containing a backslash can't be a URL,
I'll take that as a backslash percent-encoded, %5C. My answer then
would be it means pretty much whatever you want it to mean.
Does it have a special meaning, the way reserved characters (/, $, &, ?,
etc) have?
No, else it would be included in the Reserved set and its use would
be documented in RFC3986.
A path may also have @, :, and certain reserved characters; but all these have
special meainings within a path


Only scheme-specifically. 'Aside from dot-segments in hierarchical
paths, a path segment is considered opaque by the generic syntax.'
(sec. 3.3)

--
Jock

Jun 11 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
6898
by: Aloysio Figueiredo | last post by:
I need to replace every ocurrence of '/' in s by '\/' in order to create a file named s. My first attempt was: s = '\/'.join(s.split('/')) but it doesn't work: >>> s = 'a/b' >>> s = '\/'.join(s.split('/'))
3
22224
by: Terry Asher | last post by:
The following script does NOT escape the Apostrophe. Meaning when you mouseover the image the Alt tag says this: DMACC, It and then it stops. <SCRIPT Language="JavaScript"> var pos = "DMACC, It\'s the Smart Thing to Do."; document.write("<img name=img5 id=img5 src='/homepage/dmaccstudent" + Math.floor(Math.random() *20) + ".jpg' WIDTH=145 HEIGHT=230 border=0 ALT='"+pos+"'>"); </SCRIPT>
3
5745
by: Sathyaish | last post by:
In trying to replace character literals for their char constant, I am having difficulty printing the char constant for backslash. It instead prints the char literal. How do I resovle this? #include <stdio.h> /*A program that reads keyboard input and reproduces it on the monitor with some modifications. It displays all newline character occurences as \n, tabs as \t and
2
3236
by: John Dann | last post by:
I guess there must be some convention or Windows specification for whether the backslash immediately preceding the file name in a full path string to a file is formally part of the path string or of the file name. I suppose the options are: 1. Part of the path string, ie all returned path strings should have a trailing backslash. 2. Part of the file name, ie all file names should start with a backslash.
3
22969
by: Stef Mientki | last post by:
It looks like sometimes a single backslash is replaced by a double backslash, but sometimes it's not ??? See the error message below, the first backslash is somewhere (not explicitly in my code) replaced, but the second is not ??? Is it in general better to use double backslash in filepaths ? thanks, Stef Mientki
2
2159
by: Tobiah | last post by:
>>"'" "'" "'" "\\'" "\\'" This is quite different than any other language that I am used to. Normally, a double backslash takes away the special meaning of the last backslash, and so you are left with a single backslash.
5
15665
by: vlsidesign | last post by:
The printf function returns "warning: unknown escape sequence: \040" for a backslash-space combination. If the ascii decimal number for space is 32 and the backslash is 92, why this particular number 040? Is it a decimal number from the ASCII code chart? (compiling using gcc on SunOS 5.8, Sparc, Ultra-80) Just curious. Thanks.
4
2269
by: Razzbar | last post by:
I'm working on a bookmarklet that grabs information from a page and submits it to a server. Yet another social bookmarking application. I'm having trouble with page titles that include an apostrophe. I'm using encodeURIComponent() around the page title, and again around the URL. Apparently the browser is inserting a backslash before any apostrophe. I can see that when I write the $_GET data to a file in PHP on the server. When the GET...
4
3608
nithinpes
by: nithinpes | last post by:
I will boil down my exact requirement to this: I should print out lines that do not contain semi-colon, backslash and closing parentheses. The following one -liner works fine. perl -ne "unless(//) {print}" in.txt > out.txt Consider the following sample data: Msg_create(….); \ PSLogI18N\ Free( ….)
0
7927
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8220
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8352
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8222
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6632
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
5723
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5396
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
3846
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2367
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.