Re: Help add commas to int on console output

On Apr 11, 6:48 am, Jerry Coffin <jerry.cof...@gmail.comwrote:

On Apr 1, 2:42 am, James Kanze <james.ka...@gmail.comwrote:

[ ... ]

I'd also explain why do_grouping returns a string with binary
values, except that I can't figure that one out myself; it just
does.

I'm not sure which you're referring to: that it returns a
string, or that it uses binary values, or both.

The combination of the two.

Using binary values makes them independent of the character
encoding used. Since the locale is supposed to encapsulate
such things as the character encoding, having it depend on the
character encoding would sort of defeat the purpose.

Using a string instead of a pointer to const char is a bit
harder to be certain about. I suspect it was just somebody who
thought "we're designing this cool string class, why not use
it?"

*IF* it is appropriate to use a container here, that container
would be std::vector<>, not std::string. What is being returned
is not a string, it is an array. Calling it a string is just
obfuscation.

Arguably, given the use, it should have been an int const* in C.
Obviously, make it char const* saves some space (since no one
will ever have a thousands separation of more than 127), but
we're talking here of only a couple of bytes. But it is clear
that in C, what is being returned is an "array", not a string.

In all cases, I can't really imagine a case where it wouldn't be
a constant. Something like:

char const _thousands_sep[] = { 3, 0 } ;

So even in C++, I'd have gone with either char const* or int
const*. (Probably char const*, with the idea that this might
allow reusing some of the C implementation.)

Using a container (string or pointer to char) that allows
multiple values, as opposed to the single char returned by
other members like do_thousands_sep and do_decimal_point is
because it supports different grouping widths.

For example, you could have the first group contain two
digits, and the remainder contain three digits. I'm not sure
who uses this, but given its difference from the other
members, I'm pretty sure somebody must have thought it was
really needed.

I'm aware of this. I'm not personally aware of any locale with
such a grouping, but I seem to remember someone vaguely saying
that one existed using 4, 2, 0; or something like that. (Of
course, it may be a case of premature genericity. But in a
standard, you can't go back and make it more generic if the need
later arises.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #1

Subscribe Post Reply

1891

peter koch

On 11 Apr., 11:32, James Kanze <james.ka...@gmail.comwrote:

On Apr 11, 6:48 am, Jerry Coffin <jerry.cof...@gmail.comwrote:

On Apr 1, 2:42 am, James Kanze <james.ka...@gmail.comwrote:
[ ... ]
I'd also explain why do_grouping returns a string with binary
values, except that I can't figure that one out myself; it just
does.
I'm not sure which you're referring to: that it returns a
string, or that it uses binary values, or both.

The combination of the two.

Using binary values makes them independent of the character
encoding used. Since the locale is supposed to encapsulate
such things as the character encoding, having it depend on the
character encoding would sort of defeat the purpose.
Using a string instead of a pointer to const char is a bit
harder to be certain about. I suspect it was just somebody who
thought "we're designing this cool string class, why not use
it?"

*IF* it is appropriate to use a container here, that container
would be std::vector<>, not std::string. *What is being returned
is not a string, it is an array. *Calling it a string is just
obfuscation.

Arguably, given the use, it should have been an int const* in C.
Obviously, make it char const* saves some space (since no one
will ever have a thousands separation of more than 127), but
we're talking here of only a couple of bytes. *But it is clear
that in C, what is being returned is an "array", not a string.

In all cases, I can't really imagine a case where it wouldn't be
a constant. *Something like:

* * char const _thousands_sep[] = { 3, 0 } ;

So even in C++, I'd have gone with either char const* or int
const*. *(Probably char const*, with the idea that this might
allow reusing some of the C implementation.)

Using a container (string or pointer to char) that allows
multiple values, as opposed to the single char returned by
other members like do_thousands_sep and do_decimal_point is
because it supports different grouping widths.
For example, you could have the first group contain two
digits, and the remainder contain three digits. I'm not sure
who uses this, but given its difference from the other
members, I'm pretty sure somebody must have thought it was
really needed.

I'm aware of this. *I'm not personally aware of any locale with
such a grouping, but I seem to remember someone vaguely saying
that one existed using 4, 2, 0; or something like that. *(Of
course, it may be a case of premature genericity. *But in a
standard, you can't go back and make it more generic if the need
later arises.)

http://en.wikipedia.org/wiki/Thousan...ands_separator
gives the answer.
/Peter

Jun 27 '08 #2

Jerry Coffin

In article <d5f63e5c-53bc-439f-9dbf-6c652cb1a3f2@
8g2000hse.googlegroups.com>, ja*********@gmail.com says...

[ ... ]

Using a string instead of a pointer to const char is a bit
harder to be certain about. I suspect it was just somebody who
thought "we're designing this cool string class, why not use
it?"

*IF* it is appropriate to use a container here, that container
would be std::vector<>, not std::string. What is being returned
is not a string, it is an array. Calling it a string is just
obfuscation.

True -- I'm pretty sure that's a historical matter though. The string
class was added relatively early in the standardization process. The
vector template wasn't added until a _lot_ later. I suspect by the time
vector was added, nobody had the inclination to redesign locales to use
them -- especially since doing so probably would have delayed the
standard by quite a while (a year wouldn't surprise me at all...)

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 27 '08 #3

James Kanze

On 11 avr, 13:19, peter koch <peter.koch.lar...@gmail.comwrote:

On 11 Apr., 11:32, James Kanze <james.ka...@gmail.comwrote:

[...]

I'm aware of this. I'm not personally aware of any locale with
such a grouping, but I seem to remember someone vaguely saying
that one existed using 4, 2, 0; or something like that. (Of
course, it may be a case of premature genericity. But in a
standard, you can't go back and make it more generic if the need
later arises.)

http://en.wikipedia.org/wiki/Thousan...ands_separator
gives the answer.

Not to why the value is a string. It does raise some
interesting issues, however: depending on the context, you may
or may not want thousand separators after the decimal point.

Also, the usual thousands separators in France are spaces, which
according to the above, is what ISO recommends. This means,
however, that you can't reread what you've written. (Maybe
non-breaking spaces? 0xA0 in Unicode?)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #4

James Kanze

On 11 avr, 19:25, Jerry Coffin <jcof...@taeus.comwrote:

In article <d5f63e5c-53bc-439f-9dbf-6c652cb1a3f2@
8g2000hse.googlegroups.com>, james.ka...@gmail.com says...

[ ... ]

Using a string instead of a pointer to const char is a bit
harder to be certain about. I suspect it was just somebody who
thought "we're designing this cool string class, why not use
it?"

*IF* it is appropriate to use a container here, that container
would be std::vector<>, not std::string. What is being returned
is not a string, it is an array. Calling it a string is just
obfuscation.

True -- I'm pretty sure that's a historical matter though.

Maybe. But the locale stuff is a pure invention of the
committee; it didn't exist before, so backward compatibility was
no issue. And the locales don't mind using char* elsewhere
where string would really be more appropriate, e.g.
ctype<>::toupper.

The string class was added relatively early in the
standardization process. The vector template wasn't added
until a _lot_ later. I suspect by the time vector was added,
nobody had the inclination to redesign locales to use them --
especially since doing so probably would have delayed the
standard by quite a while (a year wouldn't surprise me at
all...)

Changing std::string to std::vector<charcertainly wouldn't
have delayed the standard, but as you say, probably no one
wanted to ever see <localeagain. The real question is why
std::string to begin with, rather than char const*.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #5

peter koch

On 12 Apr., 09:58, James Kanze <james.ka...@gmail.comwrote:

On 11 avr, 13:19, peter koch <peter.koch.lar...@gmail.comwrote:

On 11 Apr., 11:32, James Kanze <james.ka...@gmail.comwrote:

* * [...]

I'm aware of this. *I'm not personally aware of any locale with
such a grouping, but I seem to remember someone vaguely saying
that one existed using 4, 2, 0; or something like that. *(Of
course, it may be a case of premature genericity. *But in a
standard, you can't go back and make it more generic if the need
later arises.)
http://en.wikipedia.org/wiki/Thousan...ands_separator
gives the answer.

Not to why the value is a string. *It does raise some
interesting issues, however: depending on the context, you may
or may not want thousand separators after the decimal point.

Also, the usual thousands separators in France are spaces, which
according to the above, is what ISO recommends. *This means,
however, that you can't reread what you've written. *(Maybe
non-breaking spaces? 0xA0 in Unicode?)

Oh - I only intended to provide a list of countries, where there was
not always three characters in each group. I remembered Tibet or
Nepal, but the link above indicates that this is far more widespread.
It sounds like a good idea with the non-breaking space, but the
problem is that you will be unable to write formatted numbers in
ASCII, which does not contain such a character. On the other hand,
numbers formatted this way are intended for human reading, and I would
not mind so much if reading them by a program was not directly
supported.

/Peter

Jun 27 '08 #6

Jerry Coffin

In article <f7668950-c3bd-4dee-9e32-38fee74d72e9
@b64g2000hsa.googlegroups.com>, ja*********@gmail.com says...

On 11 avr, 19:25, Jerry Coffin <jcof...@taeus.comwrote:

[ ... ]

True -- I'm pretty sure that's a historical matter though.

Maybe. But the locale stuff is a pure invention of the
committee; it didn't exist before, so backward compatibility was
no issue. And the locales don't mind using char* elsewhere
where string would really be more appropriate, e.g.
ctype<>::toupper.

Oh, I don't mean history before the standardization effort -- only
during it.

[ ... ]

Changing std::string to std::vector<charcertainly wouldn't
have delayed the standard,

By itself, no -- except that doing so would have reopened the whole
subject of locales, and I can hardly imagine a way to even discuss them
without leaving somebody (usually quite a few somebodys) quite rightly
feeling that their needs are being slighted or ignored completely.

Don't get me wrong: when C was new, even mandating that a character set
have both lower- and uppercase English characters was asking for a lot
(and, in fact, C89 didn't mandate it). Given those meager beginnings, I
think they've done an almost amazing job of grafting some degree of
support of I18n on long after the fact. Nonetheless, it is grafted on
and (as I'm sure you know better than I) for almost anybody outside the
US, things can get clumsy in a hurry. Heck, even for us inside the US,
things get clumsy in a hurry -- in fact, the original subject of this
very thread applies equally in the US as elsewhere.

The bottom line is that I'm reasonably certain that if the subject of
locales had been reopened at all, it would have been almost impossible
to just agree to use std::vector where appropriate, and leave it at
that. As to why they didn't use std::string elsewhere, I can't say for
sure, though as I recall Andrew Koenig once explained that passing an
std::string as the name of a file to open wasn't added because it led to
discussions of I18N of file names that nobody felt could be resolved at
that point, so it was dropped entirely. I suspect (even if it was never
made official) that much the same thinking went into deciding to just
leave locales as they were, and be done with it.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 27 '08 #7

James Kanze

On 12 avr, 15:11, peter koch <peter.koch.lar...@gmail.comwrote:

On 12 Apr., 09:58, James Kanze <james.ka...@gmail.comwrote:

[...]

It sounds like a good idea with the non-breaking space, but
the problem is that you will be unable to write formatted
numbers in ASCII, which does not contain such a character.

So when was the last time you saw anyone using ASCII?
ISO 8859-1 has been pretty much standard everywhere I've worked,
for the last 15 or so years, although UTF-8 seems to be
replacing it---very slowly. Windows has been Unicode for a long
time as well.

On the other hand, numbers formatted this way are intended for
human reading, and I would not mind so much if reading them by
a program was not directly supported.

I more or less agree. Anytime you are writing for the machine,
you should use the "C" locale (and limit yourself to characters
in the basic execution character set). But it does happen that
output is designed for both---our log files are definitely read
by humans (and contain large enough numbers that thousands
separators would be nice) and are parsed by various programs as
well.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #8

James Kanze

On 13 avr, 03:52, Jerry Coffin <jcof...@taeus.comwrote:

In article <f7668950-c3bd-4dee-9e32-38fee74d72e9
@b64g2000hsa.googlegroups.com>, james.ka...@gmail.com says...

On 11 avr, 19:25, Jerry Coffin <jcof...@taeus.comwrote:

[ ... ]

Changing std::string to std::vector<charcertainly wouldn't
have delayed the standard,

By itself, no -- except that doing so would have reopened the
whole subject of locales, and I can hardly imagine a way to
even discuss them without leaving somebody (usually quite a
few somebodys) quite rightly feeling that their needs are
being slighted or ignored completely.

Yes. You're certainly right about that. I guess my real
question is more along the lines of why they even used string to
begin with, given that the abstraction behind the char* in C
wasn't a string.

Don't get me wrong: when C was new, even mandating that a
character set have both lower- and uppercase English
characters was asking for a lot (and, in fact, C89 didn't
mandate it). Given those meager beginnings, I think they've
done an almost amazing job of grafting some degree of support
of I18n on long after the fact. Nonetheless, it is grafted on
and (as I'm sure you know better than I) for almost anybody
outside the US, things can get clumsy in a hurry. Heck, even
for us inside the US, things get clumsy in a hurry -- in fact,
the original subject of this very thread applies equally in
the US as elsewhere.

Quite. Starting with the fact that plain char can be signed
(resulting in characters having negative values).

Globally, given the context, I think that the C committee did a
pretty good job. (The context was that the ANSI C committee was
ready to adopt the standard without any i18n support, and
someone from ISO mentionned that ISO would have to add it
anyway, thus making ISO C different from ANSI C. So the
committee went back and added all of the original i18n support
in a year, so that ISO C could be ANSI C.) I'm a lot less
enthousiastic about the C++ locales, although it does recognize
one thing missing in C, that you need a stream specific locale.
But the entire <localeheader seems to be designed to be
difficult to understand and to use.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #9

Ben Bacarisse

Jerry Coffin <jc*****@taeus.comwrites:

Don't get me wrong: when C was new, even mandating that a character set
have both lower- and uppercase English characters was asking for a lot
(and, in fact, C89 didn't mandate it).

Small point: C89 did mandate it. Both the execution and source "basic
characters sets" must include both upper and lower case English
letters. In fact, it went further and allowed (different) multibyte
encodings in both the source and execution character sets.

--
Ben.

Jun 27 '08 #10

Jerry Coffin

In article <87************@bsb.me.uk>, be********@bsb.me.uk says...

Jerry Coffin <jc*****@taeus.comwrites:

Don't get me wrong: when C was new, even mandating that a character set
have both lower- and uppercase English characters was asking for a lot
(and, in fact, C89 didn't mandate it).

Small point: C89 did mandate it. Both the execution and source "basic
characters sets" must include both upper and lower case English
letters. In fact, it went further and allowed (different) multibyte
encodings in both the source and execution character sets.

That much is true. What it didn't mandate is that you be able to
communicate those characters to the outside world -- for example, it
specifically allows input on the command line to always appear in one
case -- in which case (no pun intended) it's always supposed to look
like lowercase regardless of what the user actually entered.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 27 '08 #11

Jerry Coffin

In article <ab5276a6-35c1-46e8-87ef-
b7**********@c65g2000hsa.googlegroups.com>, ja*********@gmail.com
says...

On 13 avr, 03:52, Jerry Coffin <jcof...@taeus.comwrote:

[ ... ]

By itself, no -- except that doing so would have reopened the
whole subject of locales, and I can hardly imagine a way to
even discuss them without leaving somebody (usually quite a
few somebodys) quite rightly feeling that their needs are
being slighted or ignored completely.

Yes. You're certainly right about that. I guess my real
question is more along the lines of why they even used string to
begin with, given that the abstraction behind the char* in C
wasn't a string.

Right -- and I honestly doubt anybody really knows the answer to that
anymore. Even the people who designed have probably lost conscious
memory of it (though their spouses might be able to tell us about times
they wake up in a cold sweat without an explanation...)

[ ... ]

Globally, given the context, I think that the C committee did a
pretty good job. (The context was that the ANSI C committee was
ready to adopt the standard without any i18n support, and
someone from ISO mentionned that ISO would have to add it
anyway, thus making ISO C different from ANSI C. So the
committee went back and added all of the original i18n support
in a year, so that ISO C could be ANSI C.) I'm a lot less
enthousiastic about the C++ locales, although it does recognize
one thing missing in C, that you need a stream specific locale.
But the entire <localeheader seems to be designed to be
difficult to understand and to use.

I think it tends to show some of the shortcomings of OOP in general --
it went along with the idea that you'd just derive a new class, override
a function or two, and be on your way.

In a way, they even did reasonably well: ignoring the overhead of a
class header and such, you can often create and use a facet (for
example) in half dozen lines of code or so. The problem, of course, is
that you need to read and understand hundreds of pages of dense
documentation before you figure out how to write those half dozen lines
of code -- and even then, I think most who do it have just worked out a
template that works, and modify a few specific parts to do what we want;
real understanding of the whole mechanism is relatively rare.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 27 '08 #12

Re: Help add commas to int on console output

Similar topics