Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old August 16th, 2005, 10:35 PM
AES
Guest
 
Posts: n/a
Default Commas in URLs?

Encountered a URL containing a comma the other day -- the first time
I've ever noticed that, so far as I can recall. It worked fine,
however, and I gather commas are legal in URLs.

Out of curiosity, did a quick scan of an ASCII file of the 542 URLs in
my personal bookmark file and discovered exactly 3 that contained commas
(two with a single comma, one with three commas) -- so I guess they're
pretty rarely used, even if legal.

Seems like this is an unfortunate possibility and a potential source for
unexpected hassles, given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).

Out of further curiosity, are there other reasons this usage is so rare,
even if it's legal?
  #2  
Old August 16th, 2005, 11:05 PM
Jukka K. Korpela
Guest
 
Posts: n/a
Default Re: Commas in URLs?

AES <siegman@stanford.edu> wrote:
[color=blue]
> - - I gather commas are legal in URLs.[/color]

The relevant specification is Internet-standard STD 66 (RFC 3986),
available as simple hypertext at
http://www.apps.ietf.org/rfc/rfc3986.html
and it says that the comma is a "reserved character". This makes the issue
complicated; anyway, as a general statement, "commas are legal in URLs" is
false.
[color=blue]
> Seems like this is an unfortunate possibility and a potential source for
> unexpected hassles,[/color]

There's no shortage of such possibilities in the sublunar world, and...
[color=blue]
> given that not a few applications and databases use
> the comma as a separator or delimiter to separate text and/or numerical
> entries into tables, database keyword files, and the like (not to
> mention newsgroup names in the Newsgroups: field immediately above, or
> email addresses in email clients).[/color]

.... I think you just added to the confusion by mentioning things that have
nothing to do with URL syntax.
[color=blue]
> Out of further curiosity, are there other reasons this usage is so rare,
> even if it's legal?[/color]

There's little need for comma as a separator in URLs.

Followups trimmed.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

  #3  
Old August 16th, 2005, 11:05 PM
Benjamin Niemann
Guest
 
Posts: n/a
Default Re: Commas in URLs?

Jukka K. Korpela wrote:
[color=blue]
> AES <siegman@stanford.edu> wrote:
>[color=green]
>> - - I gather commas are legal in URLs.[/color]
>
> The relevant specification is Internet-standard STD 66 (RFC 3986),
> available as simple hypertext at
> http://www.apps.ietf.org/rfc/rfc3986.html
> and it says that the comma is a "reserved character". This makes the issue
> complicated; anyway, as a general statement, "commas are legal in URLs" is
> false.[/color]

Certainly not legal in any position of an URI, but legal in path segments:

(from the document above, page 50)

[snip]

segment = *pchar

[snip]

pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

[snip]

sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="


--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
  #4  
Old August 16th, 2005, 11:05 PM
Benjamin Niemann
Guest
 
Posts: n/a
Default Re: Commas in URLs?

<posted & mailed>

AES wrote:
[color=blue]
> Encountered a URL containing a comma the other day -- the first time
> I've ever noticed that, so far as I can recall. It worked fine,
> however, and I gather commas are legal in URLs.
>
> Out of curiosity, did a quick scan of an ASCII file of the 542 URLs in
> my personal bookmark file and discovered exactly 3 that contained commas
> (two with a single comma, one with three commas) -- so I guess they're
> pretty rarely used, even if legal.
>
> Seems like this is an unfortunate possibility and a potential source for
> unexpected hassles, given that not a few applications and databases use
> the comma as a separator or delimiter to separate text and/or numerical
> entries into tables, database keyword files, and the like (not to
> mention newsgroup names in the Newsgroups: field immediately above, or
> email addresses in email clients).
>
> Out of further curiosity, are there other reasons this usage is so rare,
> even if it's legal?[/color]

Commas are legal characters in URIs (e.g. in path segments) and my own
observation is that they are at least not uncommon. I often see long,
cryptic URLs like
<http://www.tagesschau.de/thema/0,1186,OID4589294_REF1_NAV_BAB,00.html>.
It is used by a WCMS as one out of several possible delimiters to encode
data in a URL. Not sure which WCMS is at work on that site - perhaps my
observation is different, because this product is popular only in
Germany...

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
  #5  
Old August 17th, 2005, 03:55 AM
AES
Guest
 
Posts: n/a
Default Re: Commas in URLs?

In article <Xns96B59316C628jkorpelacstutfi@193.229.0.31>,
"Jukka K. Korpela" <jkorpela@cs.tut.fi> wrote:
[color=blue][color=green]
> > given that not a few applications and databases use
> > the comma as a separator or delimiter to separate text and/or numerical
> > entries into tables, database keyword files, and the like (not to
> > mention newsgroup names in the Newsgroups: field immediately above, or
> > email addresses in email clients).[/color]
>
> ... I think you just added to the confusion by mentioning things that have
> nothing to do with URL syntax.
>[/color]

Perhaps I wasn't clear enough about what I meant.

As seen by ordinary users, URLs appear as text strings. Many people
probably collect them in text files and process those text files using
text editors or other programs.

For example I've collected arrays of data that included URLs and other
stuff in multi-line comma-delimited text files and loaded these arrays
into Excel spreadsheets or Word tables, just because this was a fast,
simple, convenient way to sort and manipulate the data. Tab-delimited
entries are also commonly used for this purpose, but if you have many
entries per line the page gets very wide with tab-delimited files.

Many bibliographic citations contain URLs these days, and the widely
used bibliographic database app EndNote has a "URL" field in each entry.
It's also not unlikely, however, that someone would use a URL as one of
the multiple keywords in EndNote Keywords field, and the default
delimiter for individual keywords in that field is the comma.

Mathematica -- a powerful tool for manipulating text as well as numbers
-- uses the comma as the standard separator in its Lists; and it would
be very hard to change that.

I'm not saying that the people who defined the syntax for URLs had any
obligation to rule out commas because of this. But given the _very_
widespread use of the comma as a default separator, if they'd done so,
they would have avoided causing hassles in cases like these; and the
fact that so few URLs actually contain commas seems to indicate it might
not have been a bad decision.
  #6  
Old August 17th, 2005, 03:15 PM
Harlan Messinger
Guest
 
Posts: n/a
Default Re: Commas in URLs?

AES wrote:[color=blue]
> I'm not saying that the people who defined the syntax for URLs had any
> obligation to rule out commas because of this. But given the _very_
> widespread use of the comma as a default separator, if they'd done so,
> they would have avoided causing hassles in cases like these; and the
> fact that so few URLs actually contain commas seems to indicate it might
> not have been a bad decision.[/color]

If you're using separator-delimited data, you enclose character-type
fields in quotation marks. That takes care of that problem.

Wanting to exclude commas from data is wishful thinking without
possibility of fulfillment. Free-form text is also often stored in
database tables: requests from customer service forms, notes entered in
issue tracking systems, and so on. These items are going to include
punctuation.
  #7  
Old August 18th, 2005, 01:35 PM
Benjamin Niemann
Guest
 
Posts: n/a
Default Re: Commas in URLs?

<posted & mailed>

AES wrote:
[color=blue]
> In article <Xns96B59316C628jkorpelacstutfi@193.229.0.31>,
> "Jukka K. Korpela" <jkorpela@cs.tut.fi> wrote:
>[color=green][color=darkred]
>> > given that not a few applications and databases use
>> > the comma as a separator or delimiter to separate text and/or numerical
>> > entries into tables, database keyword files, and the like (not to
>> > mention newsgroup names in the Newsgroups: field immediately above, or
>> > email addresses in email clients).[/color]
>>
>> ... I think you just added to the confusion by mentioning things that
>> have nothing to do with URL syntax.
>>[/color]
>
> Perhaps I wasn't clear enough about what I meant.
>
> As seen by ordinary users, URLs appear as text strings. Many people
> probably collect them in text files and process those text files using
> text editors or other programs.
>
> For example I've collected arrays of data that included URLs and other
> stuff in multi-line comma-delimited text files and loaded these arrays
> into Excel spreadsheets or Word tables, just because this was a fast,
> simple, convenient way to sort and manipulate the data. Tab-delimited
> entries are also commonly used for this purpose, but if you have many
> entries per line the page gets very wide with tab-delimited files.
>
> Many bibliographic citations contain URLs these days, and the widely
> used bibliographic database app EndNote has a "URL" field in each entry.
> It's also not unlikely, however, that someone would use a URL as one of
> the multiple keywords in EndNote Keywords field, and the default
> delimiter for individual keywords in that field is the comma.
>
> Mathematica -- a powerful tool for manipulating text as well as numbers
> -- uses the comma as the standard separator in its Lists; and it would
> be very hard to change that.
>
> I'm not saying that the people who defined the syntax for URLs had any
> obligation to rule out commas because of this. But given the _very_
> widespread use of the comma as a default separator, if they'd done so,
> they would have avoided causing hassles in cases like these; and the
> fact that so few URLs actually contain commas seems to indicate it might
> not have been a bad decision.[/color]

There are almost always issues, if you mix 'languages' where some characters
have special meaning. Just take URIs in HTML where the commonly used
character & has to be escaped. It's impossible to avoid this.

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
  #8  
Old August 18th, 2005, 02:35 PM
Alan J. Flavell
Guest
 
Posts: n/a
Default Re: Commas in URLs?

On Thu, 18 Aug 2005, Benjamin Niemann wrote:
[color=blue]
> There are almost always issues, if you mix 'languages' where some characters
> have special meaning.[/color]

Absolutely. (In fact it can be even worse if you aren't mixing
"languages", when you're trying to use one level of data in another
level. The principle is the same, really.)
[color=blue]
> Just take URIs in HTML where the commonly used
> character & has to be escaped. It's impossible to avoid this.[/color]

As often as not, the failure to perform the escaping correctly, and/or
parsing the escaped data wrongly, will turn out to have security
implications, sometimes seriously so. It needs to be taken very
carefully, if the results are to be exposed to the web. Be careful
out there! (as they say).
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles