472,096 Members | 1,099 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,096 software developers and data experts.

Portable 'lowercase' function for stl string?

Hi,
I'm re-writing some code that had relied on some platform/third-party
dependent utility functions, as I want to make it more portable.
Is there a standard C/C++/stl routine for changing an stl string to all
lowercase?
(I know how to do it manually, but in the interests of portability...)

Thanks

Steve
Feb 21 '06 #1
30 3136
Steve Edwards a écrit :
Hi,
I'm re-writing some code that had relied on some platform/third-party
dependent utility functions, as I want to make it more portable.
Is there a standard C/C++/stl routine for changing an stl string to all
lowercase?


I think it depends on the locale.
To be portable you could use your own implementation of that kind of
functions according to the character set of your choice (Unicode would
be a good one).
Feb 21 '06 #2
loufoque wrote:

I think it depends on the locale.
To be portable you could use your own implementation of that kind of
functions according to the character set of your choice (Unicode would
be a good one).


Unicode would be a poor choice if, for example, your characters are
encoded in ASCII and you care about speed. Case conversions (and just
about any other character manipulation) in Unicode can be rather slow,
because of the size of the character set and the resulting complexity of
the data representation for character attributes (you really don't want
to carry around a bunch of 64K arrays). With ASCII, on the other hand,
converting to lowercase is just a test and an addition.

--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 21 '06 #3
* Pete Becker:
loufoque wrote:

I think it depends on the locale.
To be portable you could use your own implementation of that kind of
functions according to the character set of your choice (Unicode would
be a good one).


Unicode would be a poor choice if, for example, your characters are
encoded in ASCII and you care about speed. Case conversions (and just
about any other character manipulation) in Unicode can be rather slow,
because of the size of the character set and the resulting complexity of
the data representation for character attributes (you really don't want
to carry around a bunch of 64K arrays). With ASCII, on the other hand,
converting to lowercase is just a test and an addition.


I think that's incorrect.

To convince me otherwise, could you give an example where case
conversion of an arbitrary ASCII text is necessarily faster than the
same case conversion of the same text in fixed a size per character
Unicode representation (e.g. USC2 limited to BMP, or USC4)?

Consider that ASCII is a subset of Unicode.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 21 '06 #4
In article <i4********************@giganews.com>,
Pete Becker <pe********@acm.org> wrote:
loufoque wrote:

I think it depends on the locale.
To be portable you could use your own implementation of that kind of
functions according to the character set of your choice (Unicode would
be a good one).


Unicode would be a poor choice if, for example, your characters are
encoded in ASCII and you care about speed. Case conversions (and just
about any other character manipulation) in Unicode can be rather slow,
because of the size of the character set and the resulting complexity of
the data representation for character attributes (you really don't want
to carry around a bunch of 64K arrays). With ASCII, on the other hand,
converting to lowercase is just a test and an addition.


Thanks, ASCII is fine for my needs, I'll do it manually. (I'm just
surprised, given how many equally simple tasks _do_ have a defined
library function.)

Steve
Feb 21 '06 #5
Steve Edwards wrote:
Hi,
I'm re-writing some code that had relied on some platform/third-party
dependent utility functions, as I want to make it more portable.
Is there a standard C/C++/stl routine for changing an stl string to all
lowercase?
(I know how to do it manually, but in the interests of portability...)


Well, there is no unique way. A simple solution is to use a loop that calls
std::tolower for each character, but this doesn't work for every locale,
since in many languages, some letters don't have a straight 1:1 mapping
between lower case and upper case.

Feb 21 '06 #6
Alf P. Steinbach wrote:
* Pete Becker:
loufoque wrote:

I think it depends on the locale.
To be portable you could use your own implementation of that kind of
functions according to the character set of your choice (Unicode
would be a good one).

Unicode would be a poor choice if, for example, your characters are
encoded in ASCII and you care about speed. Case conversions (and just
about any other character manipulation) in Unicode can be rather slow,
because of the size of the character set and the resulting complexity
of the data representation for character attributes (you really don't
want to carry around a bunch of 64K arrays). With ASCII, on the other
hand, converting to lowercase is just a test and an addition.

I think that's incorrect.

To convince me otherwise, could you give an example where case
conversion of an arbitrary ASCII text is necessarily faster than the
same case conversion of the same text in fixed a size per character
Unicode representation (e.g. USC2 limited to BMP, or USC4)?

Consider that ASCII is a subset of Unicode.


Case conversions in Unicode can't assume that the characters they're
dealing with are ASCII.

--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 21 '06 #7
Steve Edwards wrote:

Thanks, ASCII is fine for my needs, I'll do it manually. (I'm just
surprised, given how many equally simple tasks _do_ have a defined
library function.)


If you're only interested in the native character set, you've got the
builtin C functions toupper and tolower. You can also do some stuff with
C++ locales to get the same result.

--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 21 '06 #8
* Pete Becker:
Alf P. Steinbach wrote:
* Pete Becker:
loufoque wrote:
I think it depends on the locale.
To be portable you could use your own implementation of that kind of
functions according to the character set of your choice (Unicode
would be a good one).
Unicode would be a poor choice if, for example, your characters are
encoded in ASCII and you care about speed. Case conversions (and just
about any other character manipulation) in Unicode can be rather
slow, because of the size of the character set and the resulting
complexity of the data representation for character attributes (you
really don't want to carry around a bunch of 64K arrays). With ASCII,
on the other hand, converting to lowercase is just a test and an
addition.

I think that's incorrect.

To convince me otherwise, could you give an example where case
conversion of an arbitrary ASCII text is necessarily faster than the
same case conversion of the same text in fixed a size per character
Unicode representation (e.g. USC2 limited to BMP, or USC4)?

Consider that ASCII is a subset of Unicode.


Case conversions in Unicode can't assume that the characters they're
dealing with are ASCII.


Well, that's not much of an example! ;-)

To quote yourself, again, "With ASCII, converting to lowercase is just a
test and an addition".

How would it be more if the same text is represented in UCS2 or UCS4?
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 21 '06 #9
Alf P. Steinbach wrote:
* Pete Becker:
Alf P. Steinbach wrote:
* Pete Becker:

loufoque wrote:

>
> I think it depends on the locale.
> To be portable you could use your own implementation of that kind
> of functions according to the character set of your choice (Unicode
> would be a good one).

Unicode would be a poor choice if, for example, your characters are
encoded in ASCII and you care about speed. Case conversions (and
just about any other character manipulation) in Unicode can be
rather slow, because of the size of the character set and the
resulting complexity of the data representation for character
attributes (you really don't want to carry around a bunch of 64K
arrays). With ASCII, on the other hand, converting to lowercase is
just a test and an addition.

I think that's incorrect.

To convince me otherwise, could you give an example where case
conversion of an arbitrary ASCII text is necessarily faster than the
same case conversion of the same text in fixed a size per character
Unicode representation (e.g. USC2 limited to BMP, or USC4)?

Consider that ASCII is a subset of Unicode.

Case conversions in Unicode can't assume that the characters they're
dealing with are ASCII.

Well, that's not much of an example! ;-)


It wasn't meant to be. You misrepresented what I said, and I gave an
accurate response.

To quote yourself, again, "With ASCII, converting to lowercase is just a
test and an addition".

How would it be more if the same text is represented in UCS2 or UCS4?


Read what I said again, this time without the attitude. But don't bother
replying, because I don't have any more time to waste on your sophomoric
games.

--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 21 '06 #10
Pete Becker wrote:
Steve Edwards wrote:

Thanks, ASCII is fine for my needs, I'll do it manually. (I'm just
surprised, given how many equally simple tasks _do_ have a defined
library function.)


If you're only interested in the native character set, you've got the
builtin C functions toupper and tolower. You can also do some stuff with
C++ locales to get the same result.


Whoops, sorry, got distracted by the noise. You undoubtedly know about
these. Yes, you have to call them multiple times to transform a text
seqquence.

--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 21 '06 #11
* Pete Becker:
Alf P. Steinbach wrote:
* Pete Becker:
Alf P. Steinbach wrote:

* Pete Becker:

> loufoque wrote:
>
>>
>> I think it depends on the locale.
>> To be portable you could use your own implementation of that kind
>> of functions according to the character set of your choice
>> (Unicode would be a good one).
>
>
>
> Unicode would be a poor choice if, for example, your characters are
> encoded in ASCII and you care about speed. Case conversions (and
> just about any other character manipulation) in Unicode can be
> rather slow, because of the size of the character set and the
> resulting complexity of the data representation for character
> attributes (you really don't want to carry around a bunch of 64K
> arrays). With ASCII, on the other hand, converting to lowercase is
> just a test and an addition.

I think that's incorrect.

To convince me otherwise, could you give an example where case
conversion of an arbitrary ASCII text is necessarily faster than the
same case conversion of the same text in fixed a size per character
Unicode representation (e.g. USC2 limited to BMP, or USC4)?

Consider that ASCII is a subset of Unicode.
Case conversions in Unicode can't assume that the characters they're
dealing with are ASCII.

Well, that's not much of an example! ;-)


It wasn't meant to be. You misrepresented what I said, and I gave an
accurate response.

To quote yourself, again, "With ASCII, converting to lowercase is just
a test and an addition".

How would it be more if the same text is represented in UCS2 or UCS4?


Read what I said again, this time without the attitude. But don't bother
replying, because I don't have any more time to waste on your sophomoric
games.


That's the second time in a row you've elected to use a personal attack
when cornered in a technical matter, Pete.

Are you the same Pete Becker who used to work at Dinkumware, and was
technically able and courteous?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 21 '06 #12
Alf P. Steinbach wrote:

That's the second time in a row you've elected to use a personal attack
when cornered in a technical matter, Pete.


I wasn't aware that I had been "cornered." You claimed that what I said
wasn't correct, but didn't say why. You did, however, misrepresent what
I said, and you have not corrected that. That's all that needs to be said.

--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 21 '06 #13
* Pete Becker:
Alf P. Steinbach wrote:

That's the second time in a row you've elected to use a personal
attack when cornered in a technical matter, Pete.


I wasn't aware that I had been "cornered." You claimed that what I said
wasn't correct, but didn't say why. You did, however, misrepresent what
I said, and you have not corrected that. That's all that needs to be said.


What you wrote earlier was technically incorrect, at the novice level,
and when questioned you offered an ad hominem attack in response.

What you write now is, as I count them, four lies.

Are you the same Pete Becker who used to work at Dinkumware, and was
technically able and courteous?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 21 '06 #14
"Alf P. Steinbach" <al***@start.no> wrote in message
news:46************@individual.net...
* Pete Becker:
Alf P. Steinbach wrote:

That's the second time in a row you've elected to use a personal attack
when cornered in a technical matter, Pete.


I wasn't aware that I had been "cornered." You claimed that what I said
wasn't correct, but didn't say why. You did, however, misrepresent what I
said, and you have not corrected that. That's all that needs to be said.


What you wrote earlier was technically incorrect, at the novice level, and
when questioned you offered an ad hominem attack in response.

What you write now is, as I count them, four lies.

Are you the same Pete Becker who used to work at Dinkumware, and was
technically able and courteous?


Back off, Dude. I just reviewed this entire exchange and can
attest that everything Pete said was accurate. To accuse
*anybody* in this forum of lying is way out of line.

And yes, this is the same Pete Becker who used to be an employee
of Dinkumware and still works with Dinkumware. IM(extensive)E
he remains technically able and courteous.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 21 '06 #15
* P.J. Plauger:
"Alf P. Steinbach" <al***@start.no> wrote in message
news:46************@individual.net...
* Pete Becker:
Alf P. Steinbach wrote:
That's the second time in a row you've elected to use a personal attack
when cornered in a technical matter, Pete.
I wasn't aware that I had been "cornered." You claimed that what I said
wasn't correct, but didn't say why. You did, however, misrepresent what I
said, and you have not corrected that. That's all that needs to be said.

What you wrote earlier was technically incorrect, at the novice level, and
when questioned you offered an ad hominem attack in response.

What you write now is, as I count them, four lies.

Are you the same Pete Becker who used to work at Dinkumware, and was
technically able and courteous?


Back off, Dude. I just reviewed this entire exchange and can
attest that everything Pete said was accurate. To accuse
*anybody* in this forum of lying is way out of line.


It's natural to defend one's friends.

But I don't think it's a good idea to want to be associated with his
slights and ad hominem attacks, nor the tecnical level of competence he
displayed here.

You know well that Pete has not been accurate in even one sentence in
this thread, from the point of not-answering my question, and you know
well that I'm not the kind of person to let a second such attack in a
row (as this was) go by, turning the other cheek, as I did with the
first, no matter how much respect I have had for the person previously.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 21 '06 #16
P.J. Plauger wrote:
Back off, Dude. I just reviewed this entire exchange and can
attest that everything Pete said was accurate. To accuse
*anybody* in this forum of lying is way out of line.


For those of us who are interested in learning more, I'll engage.
Since you've put yourself on the spot and attested to Pete's accuracy,
would you please briefly explain how Pete's original response is
correct?

Feb 21 '06 #17
"Alf P. Steinbach" <al***@start.no> wrote in message
news:46************@individual.net...
:* Pete Becker:
: > Alf P. Steinbach wrote:
: >>
: >> That's the second time in a row you've elected to use a personal
: >> attack when cornered in a technical matter, Pete.
: >
: > I wasn't aware that I had been "cornered." You claimed that what I
said
: > wasn't correct, but didn't say why. You did, however, misrepresent
what
: > I said, and you have not corrected that. That's all that needs to be
said.
:
: What you wrote earlier was technically incorrect, at the novice level,
: and when questioned you offered an ad hominem attack in response.
:
: What you write now is, as I count them, four lies.
:
: Are you the same Pete Becker who used to work at Dinkumware, and was
: technically able and courteous?

I only know Pete from this NG. He definitely is technically able,
but I never found him to be excessively courteous (or patient).
This being given, I think we can accept him as he is, take the
good and leave the rest.

I don't think this is worth much of an argument. I assume that,
for non-ASCII, Pete was thinking of converting the case of the
many letters with diacritical marks, and those of non-latin
alphabets. This reasonably seems to require more work...

Peace,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Feb 21 '06 #18
Squeamizh wrote:
P.J. Plauger wrote:
Back off, Dude. I just reviewed this entire exchange and can
attest that everything Pete said was accurate. To accuse
*anybody* in this forum of lying is way out of line.

For those of us who are interested in learning more, I'll engage.
Since you've put yourself on the spot and attested to Pete's accuracy,
would you please briefly explain how Pete's original response is
correct?


Look, it's simple: I said that case conversions under Unicode can be
rather slow compared to straight ASCII, and Alf challenged me to prove
that they're always slower. I declined to try to prove something that I
didn't say.

--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 21 '06 #19
Pete Becker wrote:

Whoops, better correct that before I get accused again of lying:

Look, it's simple: I said that case conversions

of characters in the ASCII character set
under Unicode can be
rather slow compared to straight ASCII, and Alf challenged me to prove
that they're always slower. I declined to try to prove something that I
didn't say.


--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 21 '06 #20
* Pete Becker:
* Squeamizh:
* P.J. Plauger:
Back off, Dude. I just reviewed this entire exchange and can
attest that everything Pete said was accurate. To accuse
*anybody* in this forum of lying is way out of line.


For those of us who are interested in learning more, I'll engage.
Since you've put yourself on the spot and attested to Pete's accuracy,
would you please briefly explain how Pete's original response is
correct?


Look, it's simple: I said that case conversions under Unicode can be
rather slow compared to straight ASCII, and Alf challenged me to prove
that they're always slower. I declined to try to prove something that I
didn't say.


Heh, I'm still following this thread... ;-)

And you're misrepresenting the earlier exchange.

You wrote, originally,

"Unicode would be a poor choice if, for example, your characters are
encoded in ASCII and you care about speed".

And you explained this by

"because of the size of the character set and the resulting complexity
of the data representation for character attributes"

Perhaps that's not what you /meant/ to write, but that's what you wrote,
and that, including the explanation that followed in the same para, was
what I asked for an example of,

"could you give an example where case conversion of an arbitrary ASCII
text is necessarily faster than the same case conversion of the same
text in fixed a size per character Unicode representation (e.g. USC2
limited to BMP, or USC4)?"
(transposition typos not intentional and not corrected here).

I can think of a case where uppercasing or lowercasing Unicode will
likely be slower than ASCII for the same text, namely for a really large
text that must be in-memory, where one encounters more paging. But that
has nothing to do with the size of the character set, nor the resulting
complexity of the data representation for character attributes. In
other cases Unicode might generally be faster than ASCII.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 21 '06 #21
"Squeamizh" <sq*****@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
P.J. Plauger wrote:
Back off, Dude. I just reviewed this entire exchange and can
attest that everything Pete said was accurate. To accuse
*anybody* in this forum of lying is way out of line.


For those of us who are interested in learning more, I'll engage.
Since you've put yourself on the spot and attested to Pete's accuracy,
would you please briefly explain how Pete's original response is
correct?


Pete has himself clarified the misreading that launched the flames,
but to cut to the chase...

The C toupper and tolower date from a simpler time when you had
at most 256 characters, each with a one-to-one mapping between
upper and lower case. Unicode has (depending on how you count)
tens of thousands to millions of characters. Even if you ignore
the possibility of one-to-many conversions (which Unicode mostly
does) you either have to maintain *huge* lookup tables or compress
them and spend time searching. Thus, one way or the other, "simply"
going to Unicode when you don't have to assuredly costs you more
code space, more execution time, or both.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Feb 21 '06 #22
Alf P. Steinbach wrote:

You wrote, originally,

"Unicode would be a poor choice if, for example, your characters are
encoded in ASCII and you care about speed".

And you explained this by

"because of the size of the character set and the resulting complexity
of the data representation for character attributes"

Perhaps that's not what you /meant/ to write, but that's what you wrote,
Yes, that is what I wrote, and it's what I /meant/ to write and it's
what I still mean.
and that, including the explanation that followed in the same para, was
what I asked for an example of,

"could you give an example where case conversion of an arbitrary ASCII
text is necessarily faster than the same case conversion of the same
text in fixed a size per character Unicode representation (e.g. USC2
limited to BMP, or USC4)?"
(transposition typos not intentional and not corrected here).


Sigh. I did not say that case conversion in ASCII is "necessarily"
faster. It's a better choice because it won't be slower and could be
faster, depending on whether the Unicode translation is special-cased
for ASCII. And, of course, it avoids the extra code and data that full
Unicode entails.

--

Pete Becker
Roundhouse Consulting, Ltd.
Feb 22 '06 #23

Pete Becker wrote:
Alf P. Steinbach wrote:

and that, including the explanation that followed in the same para, was
what I asked for an example of,

"could you give an example where case conversion of an arbitrary ASCII
text is necessarily faster than the same case conversion of the same
text in fixed a size per character Unicode representation (e.g. USC2
limited to BMP, or USC4)?"
(transposition typos not intentional and not corrected here).


Sigh. I did not say that case conversion in ASCII is "necessarily"
faster. It's a better choice because it won't be slower and could be
faster, depending on whether the Unicode translation is special-cased
for ASCII. And, of course, it avoids the extra code and data that full
Unicode entails.


Besides, your statement didn't qualify "fixed size per character".
Alf, unfairly in my opinion, altered the course of the argument in
favor of such. Encodings such as UTF8 are quite commonly used...why
did Alf specify a certain subset of encodings? I wouldn't have walked
into that trap either.

Feb 22 '06 #24
* ro**********@gmail.com:
* Pete Becker:
Alf P. Steinbach:

and that, including the explanation that followed in the same para, was
what I asked for an example of,

"could you give an example where case conversion of an arbitrary ASCII
text is necessarily faster than the same case conversion of the same
text in fixed a size per character Unicode representation (e.g. USC2
limited to BMP, or USC4)?"
(transposition typos not intentional and not corrected here).

Sigh. I did not say that case conversion in ASCII is "necessarily"
faster. It's a better choice because it won't be slower and could be
faster, depending on whether the Unicode translation is special-cased
for ASCII. And, of course, it avoids the extra code and data that full
Unicode entails.


Besides, your statement didn't qualify "fixed size per character".
Alf, unfairly in my opinion, altered the course of the argument in
favor of such. Encodings such as UTF8 are quite commonly used...why
did Alf specify a certain subset of encodings? I wouldn't have walked
into that trap either.


If you want speed you have to use an encoding that supports that.

One could argue that Pete was talking about some ASCII encoding of
Unicode, that it was the encoding method, not the character set, that
would be slow, but then the statement ("Unicode would be a poor choice"
.... [because of these Unicode attributes]) would be self-contradictory.

Generally Unicode with a fixed size encoding is as fast as or near as
fast as you can get text operations. With ASCII you have to handle
individual bytes. The main question is then whether

char x = *p;
...
*p = x;

for some pointer p, is faster than, slower than, or the same as e.g.

int x = *q;
...
*q = x;

for processing each individual char, when *q is properly aligned.

On my PC they seem to be the same, because I get the same timing results
for ASCII and Unicode case conversion when I put in the assumption that
the text is in ASCII range. On some older RISC machines (perhaps some
new ones too?) byte access was reportedly slow compared to properly
aligned word access, for an individual item; the processor had to do
shifting and masking to do bytes. That must weighed up against an extra
comparision for the Unicode conversion when you don't have the
assumption of ASCII range, said extra comparision checking whether a
more general conversion must be invoked and executed when the char is
not an ASCII uppercase or lowercase or whatever the subset is that is to
be converted. Also, there is the question of whether that comparision
can simply disappear timing-wise in the parallelism in the processor.

I don't have a RISC machine at hand to check this out...

But depending on how those factors work out, if byte access is slow,
then Pete's statement above that "it won't be slower" is simply incorrect.
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 22 '06 #25
Ivan Vecerina wrote:
I don't think this is worth much of an argument. I assume that,
for non-ASCII, Pete was thinking of converting the case of the
many letters with diacritical marks, and those of non-latin
alphabets. This reasonably seems to require more work...


Somewhere is it was proposed that the same ASCII text was encoded within
the unicode.

I would assume that there is a *possibility* that there is an overhead
in space or time that comes with the ability to convert unicode, even if
all of the characters in the text are those that fit within the ASCII
subset. I also suspect that there is often a bias towards ASCII text
for operations on unicode and that any overhead for this case is
somewhere between negligible and non-existent.

It seems to me that Pete was arguing the possibility, and that Alf was
arguing the probability.

Ben Pope
--
I'm not just a number. To many, I'm known as a string...
Feb 22 '06 #26
On 21 Feb 2006 18:30:58 -0800, ro**********@gmail.com wrote:
Besides, your statement didn't qualify "fixed size per character".
Alf, unfairly in my opinion, altered the course of the argument in
favor of such.


Wow, what's that sucking sound... oh it's Noah's lips against
Pete's... well, you can figure out the rest. Apparently it's better to
be a sycophant than good in corporate America today. The "Alpha's"
love it.

politics, n: From the Latin 'poly', meaning many,
and 'tic', meaning little bloodsucking insects.
Feb 22 '06 #27
"Ben Pope" <be***************@gmail.com> wrote in message
news:43**********************@taz.nntpserver.com.. .
: Ivan Vecerina wrote:
: > I don't think this is worth much of an argument. I assume that,
: > for non-ASCII, Pete was thinking of converting the case of the
: > many letters with diacritical marks, and those of non-latin
: > alphabets. This reasonably seems to require more work...
....
: It seems to me that Pete was arguing the possibility, and that Alf was
: arguing the probability.
Yep. Nothing worth the stir IMO, although I do sympathize with Alf,
based on past experience ;)

--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form

Feb 22 '06 #28

JustBoo wrote:
On 21 Feb 2006 18:30:58 -0800, ro**********@gmail.com wrote:
Besides, your statement didn't qualify "fixed size per character".
Alf, unfairly in my opinion, altered the course of the argument in
favor of such.


Wow, what's that sucking sound... oh it's Noah's lips against
Pete's... well, you can figure out the rest. Apparently it's better to
be a sycophant than good in corporate America today. The "Alpha's"
love it.


Have you EVER made a useful contribution to this group?

Feb 22 '06 #29
Alf P. Steinbach wrote:
* Pete Becker:
Alf P. Steinbach wrote:
I think that [Unicode is slower-MS] is incorrect.

To convince me otherwise, could you give an example where case
conversion of an arbitrary ASCII text is necessarily faster than the
same case conversion of the same text in fixed a size per character
Unicode representation (e.g. USC2 limited to BMP, or USC4)?

Consider that ASCII is a subset of Unicode.

Case conversions in Unicode can't assume that the characters they're
dealing with are ASCII.


Well, that's not much of an example! ;-)

To quote yourself, again, "With ASCII, converting to lowercase is just a
test and an addition".

How would it be more if the same text is represented in UCS2 or UCS4?


Simple. Assume the following string L"i". That's an ASCII text encoded
as
UCS2 or UCS4 (for the purposes of the discussion). If it were just "i",
the
uppercase variant would be just "I". Not so in Unicode, where the
uppercase
would depend on the locale, and could be a dotted uppercase I (in
Turkish).

As Pete said: you can't assume that the characters you're dealing with
are
ASCII, /even if your input is ASCII/ !

HTH,
Michiel Salters

Feb 23 '06 #30
* Mi*************@tomtom.com:
Alf P. Steinbach wrote:
* Pete Becker:
Alf P. Steinbach wrote:
I think that [Unicode is slower-MS] is incorrect.

To convince me otherwise, could you give an example where case
conversion of an arbitrary ASCII text is necessarily faster than the
same case conversion of the same text in fixed a size per character
Unicode representation (e.g. USC2 limited to BMP, or USC4)?

Consider that ASCII is a subset of Unicode.

Case conversions in Unicode can't assume that the characters they're
dealing with are ASCII. Well, that's not much of an example! ;-)

To quote yourself, again, "With ASCII, converting to lowercase is just a
test and an addition".

How would it be more if the same text is represented in UCS2 or UCS4?


Simple. Assume the following string L"i". That's an ASCII text encoded
asUCS2 or UCS4 (for the purposes of the discussion). If it were just "i",
the uppercase variant would be just "I". Not so in Unicode, where the
uppercase would depend on the locale, and could be a dotted uppercase I (in
Turkish).


Heh heh... The Turkish alphabet is broken beyond repair; there is no
general solution to the problem you have chosen as example. And
methinks you know that, and chose it for exactly that reason... :-)

As Pete said: you can't assume that the characters you're dealing with
are ASCII, /even if your input is ASCII/ !


That's incorrect.

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 23 '06 #31

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

37 posts views Thread by Zombie | last post: by
20 posts views Thread by Matthias | last post: by
7 posts views Thread by Jim Carlock | last post: by
4 posts views Thread by Jim Langston | last post: by
1 post views Thread by skillzero | last post: by
9 posts views Thread by Bob | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.