By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,320 Members | 2,109 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,320 IT Pros & Developers. It's quick & easy.

Commas in URLs?

P: n/a
AES
Encountered a URL containing a comma the other day -- the first time
I've ever noticed that, so far as I can recall. It worked fine,
however, and I gather commas are legal in URLs.

Out of curiosity, did a quick scan of an ASCII file of the 542 URLs in
my personal bookmark file and discovered exactly 3 that contained commas
(two with a single comma, one with three commas) -- so I guess they're
pretty rarely used, even if legal.

Seems like this is an unfortunate possibility and a potential source for
unexpected hassles, given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).

Out of further curiosity, are there other reasons this usage is so rare,
even if it's legal?
Aug 16 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
AES <si*****@stanford.edu> wrote:
- - I gather commas are legal in URLs.
The relevant specification is Internet-standard STD 66 (RFC 3986),
available as simple hypertext at
http://www.apps.ietf.org/rfc/rfc3986.html
and it says that the comma is a "reserved character". This makes the issue
complicated; anyway, as a general statement, "commas are legal in URLs" is
false.
Seems like this is an unfortunate possibility and a potential source for
unexpected hassles,
There's no shortage of such possibilities in the sublunar world, and...
given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).
.... I think you just added to the confusion by mentioning things that have
nothing to do with URL syntax.
Out of further curiosity, are there other reasons this usage is so rare,
even if it's legal?


There's little need for comma as a separator in URLs.

Followups trimmed.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Aug 16 '05 #2

P: n/a
<posted & mailed>

AES wrote:
Encountered a URL containing a comma the other day -- the first time
I've ever noticed that, so far as I can recall. It worked fine,
however, and I gather commas are legal in URLs.

Out of curiosity, did a quick scan of an ASCII file of the 542 URLs in
my personal bookmark file and discovered exactly 3 that contained commas
(two with a single comma, one with three commas) -- so I guess they're
pretty rarely used, even if legal.

Seems like this is an unfortunate possibility and a potential source for
unexpected hassles, given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).

Out of further curiosity, are there other reasons this usage is so rare,
even if it's legal?


Commas are legal characters in URIs (e.g. in path segments) and my own
observation is that they are at least not uncommon. I often see long,
cryptic URLs like
<http://www.tagesschau.de/thema/0,1186,OID4589294_REF1_NAV_BAB,00.html>.
It is used by a WCMS as one out of several possible delimiters to encode
data in a URL. Not sure which WCMS is at work on that site - perhaps my
observation is different, because this product is popular only in
Germany...

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Aug 16 '05 #3

P: n/a
Jukka K. Korpela wrote:
AES <si*****@stanford.edu> wrote:
- - I gather commas are legal in URLs.


The relevant specification is Internet-standard STD 66 (RFC 3986),
available as simple hypertext at
http://www.apps.ietf.org/rfc/rfc3986.html
and it says that the comma is a "reserved character". This makes the issue
complicated; anyway, as a general statement, "commas are legal in URLs" is
false.


Certainly not legal in any position of an URI, but legal in path segments:

(from the document above, page 50)

[snip]

segment = *pchar

[snip]

pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

[snip]

sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Aug 16 '05 #4

P: n/a
AES
In article <Xn****************************@193.229.0.31>,
"Jukka K. Korpela" <jk******@cs.tut.fi> wrote:
given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).


... I think you just added to the confusion by mentioning things that have
nothing to do with URL syntax.


Perhaps I wasn't clear enough about what I meant.

As seen by ordinary users, URLs appear as text strings. Many people
probably collect them in text files and process those text files using
text editors or other programs.

For example I've collected arrays of data that included URLs and other
stuff in multi-line comma-delimited text files and loaded these arrays
into Excel spreadsheets or Word tables, just because this was a fast,
simple, convenient way to sort and manipulate the data. Tab-delimited
entries are also commonly used for this purpose, but if you have many
entries per line the page gets very wide with tab-delimited files.

Many bibliographic citations contain URLs these days, and the widely
used bibliographic database app EndNote has a "URL" field in each entry.
It's also not unlikely, however, that someone would use a URL as one of
the multiple keywords in EndNote Keywords field, and the default
delimiter for individual keywords in that field is the comma.

Mathematica -- a powerful tool for manipulating text as well as numbers
-- uses the comma as the standard separator in its Lists; and it would
be very hard to change that.

I'm not saying that the people who defined the syntax for URLs had any
obligation to rule out commas because of this. But given the _very_
widespread use of the comma as a default separator, if they'd done so,
they would have avoided causing hassles in cases like these; and the
fact that so few URLs actually contain commas seems to indicate it might
not have been a bad decision.
Aug 17 '05 #5

P: n/a
AES wrote:
I'm not saying that the people who defined the syntax for URLs had any
obligation to rule out commas because of this. But given the _very_
widespread use of the comma as a default separator, if they'd done so,
they would have avoided causing hassles in cases like these; and the
fact that so few URLs actually contain commas seems to indicate it might
not have been a bad decision.


If you're using separator-delimited data, you enclose character-type
fields in quotation marks. That takes care of that problem.

Wanting to exclude commas from data is wishful thinking without
possibility of fulfillment. Free-form text is also often stored in
database tables: requests from customer service forms, notes entered in
issue tracking systems, and so on. These items are going to include
punctuation.
Aug 17 '05 #6

P: n/a
<posted & mailed>

AES wrote:
In article <Xn****************************@193.229.0.31>,
"Jukka K. Korpela" <jk******@cs.tut.fi> wrote:
> given that not a few applications and databases use
> the comma as a separator or delimiter to separate text and/or numerical
> entries into tables, database keyword files, and the like (not to
> mention newsgroup names in the Newsgroups: field immediately above, or
> email addresses in email clients).


... I think you just added to the confusion by mentioning things that
have nothing to do with URL syntax.


Perhaps I wasn't clear enough about what I meant.

As seen by ordinary users, URLs appear as text strings. Many people
probably collect them in text files and process those text files using
text editors or other programs.

For example I've collected arrays of data that included URLs and other
stuff in multi-line comma-delimited text files and loaded these arrays
into Excel spreadsheets or Word tables, just because this was a fast,
simple, convenient way to sort and manipulate the data. Tab-delimited
entries are also commonly used for this purpose, but if you have many
entries per line the page gets very wide with tab-delimited files.

Many bibliographic citations contain URLs these days, and the widely
used bibliographic database app EndNote has a "URL" field in each entry.
It's also not unlikely, however, that someone would use a URL as one of
the multiple keywords in EndNote Keywords field, and the default
delimiter for individual keywords in that field is the comma.

Mathematica -- a powerful tool for manipulating text as well as numbers
-- uses the comma as the standard separator in its Lists; and it would
be very hard to change that.

I'm not saying that the people who defined the syntax for URLs had any
obligation to rule out commas because of this. But given the _very_
widespread use of the comma as a default separator, if they'd done so,
they would have avoided causing hassles in cases like these; and the
fact that so few URLs actually contain commas seems to indicate it might
not have been a bad decision.


There are almost always issues, if you mix 'languages' where some characters
have special meaning. Just take URIs in HTML where the commonly used
character & has to be escaped. It's impossible to avoid this.

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Aug 18 '05 #7

P: n/a
On Thu, 18 Aug 2005, Benjamin Niemann wrote:
There are almost always issues, if you mix 'languages' where some characters
have special meaning.
Absolutely. (In fact it can be even worse if you aren't mixing
"languages", when you're trying to use one level of data in another
level. The principle is the same, really.)
Just take URIs in HTML where the commonly used
character & has to be escaped. It's impossible to avoid this.


As often as not, the failure to perform the escaping correctly, and/or
parsing the escaped data wrongly, will turn out to have security
implications, sometimes seriously so. It needs to be taken very
carefully, if the results are to be exposed to the web. Be careful
out there! (as they say).
Aug 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.