473,911 Members | 6,170 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Commas in URLs?

AES
Encountered a URL containing a comma the other day -- the first time
I've ever noticed that, so far as I can recall. It worked fine,
however, and I gather commas are legal in URLs.

Out of curiosity, did a quick scan of an ASCII file of the 542 URLs in
my personal bookmark file and discovered exactly 3 that contained commas
(two with a single comma, one with three commas) -- so I guess they're
pretty rarely used, even if legal.

Seems like this is an unfortunate possibility and a potential source for
unexpected hassles, given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).

Out of further curiosity, are there other reasons this usage is so rare,
even if it's legal?
Aug 16 '05 #1
7 8327
AES <si*****@stanfo rd.edu> wrote:
- - I gather commas are legal in URLs.
The relevant specification is Internet-standard STD 66 (RFC 3986),
available as simple hypertext at
http://www.apps.ietf.org/rfc/rfc3986.html
and it says that the comma is a "reserved character". This makes the issue
complicated; anyway, as a general statement, "commas are legal in URLs" is
false.
Seems like this is an unfortunate possibility and a potential source for
unexpected hassles,
There's no shortage of such possibilities in the sublunar world, and...
given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).
.... I think you just added to the confusion by mentioning things that have
nothing to do with URL syntax.
Out of further curiosity, are there other reasons this usage is so rare,
even if it's legal?


There's little need for comma as a separator in URLs.

Followups trimmed.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Aug 16 '05 #2
<posted & mailed>

AES wrote:
Encountered a URL containing a comma the other day -- the first time
I've ever noticed that, so far as I can recall. It worked fine,
however, and I gather commas are legal in URLs.

Out of curiosity, did a quick scan of an ASCII file of the 542 URLs in
my personal bookmark file and discovered exactly 3 that contained commas
(two with a single comma, one with three commas) -- so I guess they're
pretty rarely used, even if legal.

Seems like this is an unfortunate possibility and a potential source for
unexpected hassles, given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).

Out of further curiosity, are there other reasons this usage is so rare,
even if it's legal?


Commas are legal characters in URIs (e.g. in path segments) and my own
observation is that they are at least not uncommon. I often see long,
cryptic URLs like
<http://www.tagesschau. de/thema/0,1186,OID45892 94_REF1_NAV_BAB ,00.html>.
It is used by a WCMS as one out of several possible delimiters to encode
data in a URL. Not sure which WCMS is at work on that site - perhaps my
observation is different, because this product is popular only in
Germany...

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Aug 16 '05 #3
Jukka K. Korpela wrote:
AES <si*****@stanfo rd.edu> wrote:
- - I gather commas are legal in URLs.


The relevant specification is Internet-standard STD 66 (RFC 3986),
available as simple hypertext at
http://www.apps.ietf.org/rfc/rfc3986.html
and it says that the comma is a "reserved character". This makes the issue
complicated; anyway, as a general statement, "commas are legal in URLs" is
false.


Certainly not legal in any position of an URI, but legal in path segments:

(from the document above, page 50)

[snip]

segment = *pchar

[snip]

pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

[snip]

sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Aug 16 '05 #4
AES
In article <Xn************ *************** *@193.229.0.31> ,
"Jukka K. Korpela" <jk******@cs.tu t.fi> wrote:
given that not a few applications and databases use
the comma as a separator or delimiter to separate text and/or numerical
entries into tables, database keyword files, and the like (not to
mention newsgroup names in the Newsgroups: field immediately above, or
email addresses in email clients).


... I think you just added to the confusion by mentioning things that have
nothing to do with URL syntax.


Perhaps I wasn't clear enough about what I meant.

As seen by ordinary users, URLs appear as text strings. Many people
probably collect them in text files and process those text files using
text editors or other programs.

For example I've collected arrays of data that included URLs and other
stuff in multi-line comma-delimited text files and loaded these arrays
into Excel spreadsheets or Word tables, just because this was a fast,
simple, convenient way to sort and manipulate the data. Tab-delimited
entries are also commonly used for this purpose, but if you have many
entries per line the page gets very wide with tab-delimited files.

Many bibliographic citations contain URLs these days, and the widely
used bibliographic database app EndNote has a "URL" field in each entry.
It's also not unlikely, however, that someone would use a URL as one of
the multiple keywords in EndNote Keywords field, and the default
delimiter for individual keywords in that field is the comma.

Mathematica -- a powerful tool for manipulating text as well as numbers
-- uses the comma as the standard separator in its Lists; and it would
be very hard to change that.

I'm not saying that the people who defined the syntax for URLs had any
obligation to rule out commas because of this. But given the _very_
widespread use of the comma as a default separator, if they'd done so,
they would have avoided causing hassles in cases like these; and the
fact that so few URLs actually contain commas seems to indicate it might
not have been a bad decision.
Aug 17 '05 #5
AES wrote:
I'm not saying that the people who defined the syntax for URLs had any
obligation to rule out commas because of this. But given the _very_
widespread use of the comma as a default separator, if they'd done so,
they would have avoided causing hassles in cases like these; and the
fact that so few URLs actually contain commas seems to indicate it might
not have been a bad decision.


If you're using separator-delimited data, you enclose character-type
fields in quotation marks. That takes care of that problem.

Wanting to exclude commas from data is wishful thinking without
possibility of fulfillment. Free-form text is also often stored in
database tables: requests from customer service forms, notes entered in
issue tracking systems, and so on. These items are going to include
punctuation.
Aug 17 '05 #6
<posted & mailed>

AES wrote:
In article <Xn************ *************** *@193.229.0.31> ,
"Jukka K. Korpela" <jk******@cs.tu t.fi> wrote:
> given that not a few applications and databases use
> the comma as a separator or delimiter to separate text and/or numerical
> entries into tables, database keyword files, and the like (not to
> mention newsgroup names in the Newsgroups: field immediately above, or
> email addresses in email clients).


... I think you just added to the confusion by mentioning things that
have nothing to do with URL syntax.


Perhaps I wasn't clear enough about what I meant.

As seen by ordinary users, URLs appear as text strings. Many people
probably collect them in text files and process those text files using
text editors or other programs.

For example I've collected arrays of data that included URLs and other
stuff in multi-line comma-delimited text files and loaded these arrays
into Excel spreadsheets or Word tables, just because this was a fast,
simple, convenient way to sort and manipulate the data. Tab-delimited
entries are also commonly used for this purpose, but if you have many
entries per line the page gets very wide with tab-delimited files.

Many bibliographic citations contain URLs these days, and the widely
used bibliographic database app EndNote has a "URL" field in each entry.
It's also not unlikely, however, that someone would use a URL as one of
the multiple keywords in EndNote Keywords field, and the default
delimiter for individual keywords in that field is the comma.

Mathematica -- a powerful tool for manipulating text as well as numbers
-- uses the comma as the standard separator in its Lists; and it would
be very hard to change that.

I'm not saying that the people who defined the syntax for URLs had any
obligation to rule out commas because of this. But given the _very_
widespread use of the comma as a default separator, if they'd done so,
they would have avoided causing hassles in cases like these; and the
fact that so few URLs actually contain commas seems to indicate it might
not have been a bad decision.


There are almost always issues, if you mix 'languages' where some characters
have special meaning. Just take URIs in HTML where the commonly used
character & has to be escaped. It's impossible to avoid this.

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Aug 18 '05 #7
On Thu, 18 Aug 2005, Benjamin Niemann wrote:
There are almost always issues, if you mix 'languages' where some characters
have special meaning.
Absolutely. (In fact it can be even worse if you aren't mixing
"languages" , when you're trying to use one level of data in another
level. The principle is the same, really.)
Just take URIs in HTML where the commonly used
character & has to be escaped. It's impossible to avoid this.


As often as not, the failure to perform the escaping correctly, and/or
parsing the escaped data wrongly, will turn out to have security
implications, sometimes seriously so. It needs to be taken very
carefully, if the results are to be exposed to the web. Be careful
out there! (as they say).
Aug 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
3760
by: Marek Mänd | last post by:
<style type="text/css"> q:after{content:',"'} </style> <q>This will be the shame of CSS</q> claimed Marek Mänd and added that <q>consumers expect to create generated content via CSS where there would be no comma right after HERE</q>
14
2315
by: Mike N. | last post by:
Hello: I have a form that contains a multiple-select field that has 12 options in it. I would like the user to be able to select UP TO FOUR of those options. If they select more than four, I would like to alert them of the error. To do this, I figure that counting commas would be the easiest method (i.e., IF commas > 3 THEN alert user). NOTE: I have an existing "validateForm" function for this form and I'd like to add this IF-THEN...
22
13802
by: ineedyourluvin1 | last post by:
Hello all! I've been looking for a way to strip characters from strings such as a comma. This would be great for using a comma as a delimiter. I show you what I have right now. #include<iostream> #include<string> int main(int argc, char *argv) {
27
2071
by: Peter Ammon | last post by:
My code obfuscator gave me this: char buff; to which gcc retorted: "ISO C90 forbids variable-size array 'buff'" and checking the standard, it appears that commas are indeed forbidden from being in a constant expression.
3
7637
by: Robert Scheer | last post by:
Hi. I have a regularexpression validator control on a page. This regular expression validates a textbox to accept only numbers and commas: validationexpression="*" I am trying to modify this expression to not allow commas at the beginning and at the end of the expression without success. It needs to allow commas only between the numbers. How can I do that?
4
2495
by: striker | last post by:
I have a comma delimited text file that has multiple instances of multiple commas. Each file will contain approximatley 300 lines. For example: one, two, three,,,,four,five,,,,six one, two, three,four,,,,,,,,,,eighteen, and so on. There is one time when multiple commas are allowed. Just prior to the letters ADMNSRC there should be one instance of 4 commas. ( ,eight,,,,ADMNSRC,thirteen, ). The text ADMNSRC is NOT in the same
5
4022
by: hprYeV | last post by:
I have done a reasonable amount of programming in C++, but the other day I was talking to someone after a lecture in a course on Java who said that they had not been used to the syntax of the Java for loop because they always had been programming in C++. I asked them what it was they had not been used to, and they said that in C++ you can use commas to separate the initial statement, the condition, and the loop statement like this: ...
9
8911
by: conspireagainst | last post by:
I'm having quite a time with this particular problem: I have users that enter tag words as form input, let's say for a photo or a topic of discussion. They are allowed to delimit tags with spaces and commas, and can use quotes to encapsulate multiple words. An example: tag1, tag2 tag3, "tag4 tag4, tag4" tag5, "tag6 tag6" So, as we can see here anything is allowed, but the problem is that splitting on commas obviously destroys tag4...
11
10513
by: Dooza | last post by:
Using ASP/VB I need to remove unwanted commas from the end of a field that will be use in an array. There are items in the field that are comma separated, so I don't want to remove them, just the end ones, could be anywhere up to 5 unwanted commas. Any ideas? Cheers, Steve
0
9879
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11349
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10921
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10541
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9727
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7250
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
6142
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4776
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3360
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.