<string> to lowercase

Zombie

Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this so I fallback on
strlwr().
But the c_str() method returns a const pointer which cannot be used
with strlwr() as it does the conversion inplace. So, I use the
following logic of copying the contents to a dynamically allocated
char* array and then doing the conversion:

-----------------------------
string str = "faLSe";
char* pc_str = NULL;

pc_str = new char[str.length() + 1];
memset(pc_str, 0, sizeof(pc_str));

strcpy(pc_str, str.c_str());
strlwr(pc_str);
// pc_str now contains "false"
-----------------------------

Is there any other, less cumbersome way of doing the same?

Thanks for your time.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #1

Subscribe Post Reply

7291

Matt Wharton

You might look at using the 'for_each' algorithm in conjunction with the
string's iterators and 'tolower'.

-Matt

"Zombie" <na***********@yahoo.com> wrote in message
news:90**************************@posting.google.c om...

Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this so I fallback on
strlwr().
But the c_str() method returns a const pointer which cannot be used
with strlwr() as it does the conversion inplace. So, I use the
following logic of copying the contents to a dynamically allocated
char* array and then doing the conversion:

-----------------------------
string str = "faLSe";
char* pc_str = NULL;

pc_str = new char[str.length() + 1];
memset(pc_str, 0, sizeof(pc_str));

strcpy(pc_str, str.c_str());
strlwr(pc_str);
// pc_str now contains "false"
-----------------------------

Is there any other, less cumbersome way of doing the same?

Thanks for your time.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #2

Kai-Uwe Bux

Zombie wrote:

Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this so I fallback on
strlwr().
But the c_str() method returns a const pointer which cannot be used
with strlwr() as it does the conversion inplace. So, I use the
following logic of copying the contents to a dynamically allocated
char* array and then doing the conversion:

-----------------------------
string str = "faLSe";
char* pc_str = NULL;

pc_str = new char[str.length() + 1];
memset(pc_str, 0, sizeof(pc_str));

strcpy(pc_str, str.c_str());
strlwr(pc_str);
// pc_str now contains "false"
-----------------------------

Is there any other, less cumbersome way of doing the same?

Thanks for your time.

The following uses the current locale to convert a string to lowercase:

#include <string>
#include <iostream>
#include <locale>

template < typename Iter >
void range_tolower ( Iter beg, Iter end ) {
for( Iter iter = beg; iter != end; ++iter ) {
*iter = std::tolower( *iter );
}
}

void string_tolower ( std::string & str ) {
range_tolower( str.begin(), str.end() );
}

int main ( void ) {
std::string test ( "Test" );
string_tolower( test );
std::cout << test << std::endl;
}
Best

Kai-Uwe

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #3

Ashes

Hi

You can also use the transform() algorithm:

#include <string>
#include <algorithm>

void ConvertToLowerCase(std::string& str)
{
std::transform(str.begin(),
str.end(),
str.begin(),
tolower);
// You may need to cast the above line to (int(*)(int))
// tolower - this works as is on VC 7.1 but may not work on
// other compilers
}

Regards
Ashley

Jul 22 '05 #4

Xenos

"Zombie" <na***********@yahoo.com> wrote in message
news:90**************************@posting.google.c om...

Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this so I fallback on
strlwr().
But the c_str() method returns a const pointer which cannot be used
with strlwr() as it does the conversion inplace. So, I use the
following logic of copying the contents to a dynamically allocated
char* array and then doing the conversion:

-----------------------------
string str = "faLSe";
char* pc_str = NULL;

pc_str = new char[str.length() + 1];
memset(pc_str, 0, sizeof(pc_str));

strcpy(pc_str, str.c_str());
strlwr(pc_str);
// pc_str now contains "false"
-----------------------------

Is there any other, less cumbersome way of doing the same?

off the top of my head:

void to_lowercase(std::string&s)
{
for (std::string::iterator i = s.begin(); i != s.end(); ++i)
*i = tolower(*i);
}

OR

string to_lowercase(const std::string& s)
{
std::string t;
for (std::string::const_iterator i = s.begin(); i != s.end(); ++i)
t += tolower(*i);
return t;
}
or you could use a function object with one of the standard library
templates such as for_each or transform.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #5

Joe C

"Zombie" <na***********@yahoo.com> wrote in message
news:90**************************@posting.google.c om...

Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this so I fallback on
strlwr().
But the c_str() method returns a const pointer which cannot be used
with strlwr() as it does the conversion inplace. So, I use the
following logic of copying the contents to a dynamically allocated
char* array and then doing the conversion:

-----------------------------
string str = "faLSe";
char* pc_str = NULL;

pc_str = new char[str.length() + 1];
memset(pc_str, 0, sizeof(pc_str));

strcpy(pc_str, str.c_str());
strlwr(pc_str);
// pc_str now contains "false"
-----------------------------

Is there any other, less cumbersome way of doing the same?

These 2 little functions change the case of strings. Note that these assume
ASCII and as such are not portable. I've been told that these are the worst
functions ever...but they work for me...
if you want it to change the actual string rather than returning a new
string, just pass a reference to a void function and make the change to the
charecter of the original string.

here they are:
#include <iostream>
#include <string>

using namespace std;

string lcase(string in);
string ucase(string in);

int main(){
string str("A mIxEd CaSe StRiNg 123!@@#");

cout << str << endl
<< lcase(str) << endl
<< ucase(str) << endl;

return 0;
}

string lcase(string in){
string stringout;
for(int i = 0; i < in.size(); ++i)
if(!(in[i] & 128) && ((in[i] & 95) > 64) && ((in[i] & 31) <= 26))
stringout += (in[i] | 32); //turn on the lcase bit
else stringout += in[i]; //character wasn't a letter...dont change
return stringout;
}

string ucase(string in){
string stringout;
for(int i = 0; i < in.size(); ++i)
if(!(in[i] & 128) && ((in[i] & 95) > 64) && ((in[i] & 31) <= 26))
stringout += (in[i] & (223)); //turn off the lcase bit
else stringout += in[i]; //character wasn't a letter...dont change
return stringout;
}
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #6

Andre Kostur

"Matt Wharton" <no****@noSpam.com> wrote in news:1092853381.442698
@cswreg.cos.agilent.com:

You might look at using the 'for_each' algorithm in conjunction with the
string's iterators and 'tolower'.

Wouldn't you want std::transform ?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #7

Brian Stone

na***********@yahoo.com (Zombie) wrote in message news:<90**************************@posting.google. com>...

Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this so I fallback on
strlwr().
But the c_str() method returns a const pointer which cannot be used
with strlwr() as it does the conversion inplace. So, I use the
following logic of copying the contents to a dynamically allocated
char* array and then doing the conversion:

-----------------------------
string str = "faLSe";
char* pc_str = NULL;

pc_str = new char[str.length() + 1];
memset(pc_str, 0, sizeof(pc_str));

strcpy(pc_str, str.c_str());
strlwr(pc_str);
// pc_str now contains "false"
-----------------------------

Is there any other, less cumbersome way of doing the same?

Thanks for your time.

The easiest way I know is to use the transform() function from the
<algorithm> library. Here's an example of how to apply this to a
string to convert the case...

#include <iostream>
#include <string>
#include <algorithm>
#include <functional>
#include <cctype>

using namespace std;

int main ( int argc, char **argv )
{
string A = "TeStInG!";

cout << A << endl; // output: TeStInG!
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );
cout << A << endl; // output: testing!
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::toupper) );
cout << A << endl; // output: TESTING!
}

-- Brian Stone
South Dakota School of Mines & Technology
UAV Team Lead Programmer

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #8

Tommy Andreasen

Zombie wrote:

Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this so I fallback on
strlwr().
But the c_str() method returns a const pointer which cannot be used
with strlwr() as it does the conversion inplace. So, I use the
following logic of copying the contents to a dynamically allocated
char* array and then doing the conversion:

-----------------------------
string str = "faLSe";
char* pc_str = NULL;

pc_str = new char[str.length() + 1];
memset(pc_str, 0, sizeof(pc_str));

strcpy(pc_str, str.c_str());
strlwr(pc_str);
// pc_str now contains "false"
-----------------------------

Is there any other, less cumbersome way of doing the same?

Thanks for your time.

I usually to it like this:

std::transform(str.begin(), str.end(), str.begin(),
std::ptr_fun(std::tolower));

Tommy -

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #9

Old Wolf

Kai-Uwe Bux <jk********@gmx.net> wrote:

Zombie wrote:
Hi, what is the correct way of converting contents of a <string> to
lowercase?
The following uses the current locale to convert a string to lowercase:

#include <string>
#include <iostream>
#include <locale>

template < typename Iter >
void range_tolower ( Iter beg, Iter end ) {
for( Iter iter = beg; iter != end; ++iter ) {
*iter = std::tolower( *iter );
}
}

Unfortunately, std::tolower requires an argument in the range
0...UCHAR_MAX. So you can go:

*iter = std::tolower( (unsigned char)*iter );

and hope that it gets converted back to char properly afterwards, or:

if (*iter >= 0 && *iter <= UCHAR_MAX)
*iter = std::tolower(*iter);
void string_tolower ( std::string & str ) {
range_tolower( str.begin(), str.end() );
}

int main ( void ) {
std::string test ( "Test" );
string_tolower( test );
std::cout << test << std::endl;
}

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #10

Peter Koch Larsen

"Zombie" <na***********@yahoo.com> skrev i en meddelelse
news:90**************************@posting.google.c om...

Hi, what is the correct way of converting contents of a <string> to
lowercase?
Well... this is actually a rather complicated question. For an explanation
as to why, take a look at the thread "Case insensitive comparison of
std::strings" in comp.lang.c++.moderated. For a "basic" conversion, for_each
and tolower would be okay (perhaps combined with some locale but i am not
familiar with these).
There are no methods of <string> class to do this so I fallback on
strlwr().
But the c_str() method returns a const pointer which cannot be used
with strlwr() as it does the conversion inplace. So, I use the
following logic of copying the contents to a dynamically allocated
char* array and then doing the conversion:

-----------------------------
string str = "faLSe";
char* pc_str = NULL;

pc_str = new char[str.length() + 1];
memset(pc_str, 0, sizeof(pc_str));

strcpy(pc_str, str.c_str());
strlwr(pc_str);
// pc_str now contains "false"
-----------------------------

Is there any other, less cumbersome way of doing the same?
Yes - that approach surely seems to cumbersome.

Thanks for your time.

Kind regards
Peter
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #11

Kai-Uwe Bux

Old Wolf wrote:

Kai-Uwe Bux <jk********@gmx.net> wrote:
Zombie wrote:
> Hi, what is the correct way of converting contents of a <string> to
> lowercase?

The following uses the current locale to convert a string to lowercase:

#include <string>
#include <iostream>
#include <locale>

template < typename Iter >
void range_tolower ( Iter beg, Iter end ) {
for( Iter iter = beg; iter != end; ++iter ) {
*iter = std::tolower( *iter );
}
}

Unfortunately, std::tolower requires an argument in the range
0...UCHAR_MAX. So you can go:

*iter = std::tolower( (unsigned char)*iter );

and hope that it gets converted back to char properly afterwards, or:

if (*iter >= 0 && *iter <= UCHAR_MAX)
*iter = std::tolower(*iter);

I was under the impression that std::tolower, being a template, would be
instantiated for the deduced type <char> when the argument *iter where iter
is a std::string::iterator. Now, if it is a template, why should it be
restricted to 0..UCHAR_MAX, effectively forcing the type to be unsigned
char? That does not seem to make any sense -- of course, this does not
imply it isn't so. In any case, I looked up tolower in the standard and did
not see any hint at UCHAR_MAX. Probably, I was looking at the wrong
section. Could you point me to the source?
Best

Kai-Uwe Bux

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #12

kanze

no**********@stoneentertainment.com (Brian Stone) wrote in message
news:<ac**************************@posting.google. com>...

The easiest way I know is to use the transform() function from the
<algorithm> library. Here's an example of how to apply this to a
string to convert the case... #include <iostream>
#include <string>
#include <algorithm>
#include <functional>
#include <cctype> using namespace std; int main ( int argc, char **argv )
{
string A = "TeStInG!"; cout << A << endl; // output: TeStInG!
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );
cout << A << endl; // output: testing!
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::toupper) );
cout << A << endl; // output: TESTING!
}

Is it just me, or what? There have been a number of postings suggesting
this, either with or without the call to ptr_fun. Now, it has some
obvious and well known problems when it encounters a character encoding
that is negative, and toupper( 'ß' ) doesn't (and cannot) work at all,
but I can understand anglocentric programmers missing this in a quick
response. On the other hand, I have been unable to find a compiler where
this even compiles, in any of the suggested variants, on any system: it
fails to compile (with or without the ptr_fun) with g++ (3.4.0), Sun CC
(5.1) and VC++ (6.0).

In fact, the only variant which compiled (and that got a warning from
Sun CC) is yours, with ::tolower and ::toupper. And you are playing on a
bug in practically every implementation of <cctype>, which exposes
::tolower and ::toupper (rather than only having them available in
std::, as the standard requires).

As far as I know (and ignoring the issues of passing an out of bounds
value to the functions), the correct way to write the call to transform
is something like:

std::transform( str.begin(), str.end(),
str.begin(),
std::ptr_fun( (int (*)( int ))std::tolower ) ) ;

Even better would be something like:

std::transform(
str.begin(), str.end(),
str.begin(),
boost::bind(
std::ptr_fun(
(char (*)( char, std::locale const& ))std::tolower ),
_1,
std::locale() ) ) ;

(Some of the Boost experts should verify this. I still have enough older
compilers to support that I can't actively use Boost, as much as it
would facilitate my code.)

This should at least give defined behavior in every case, even if it
gives the wrong results sometimes.

Of course, the original poster asked for something that wasn't
awkward:-).

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #13

kanze

no**********@stoneentertainment.com (Brian Stone) wrote in message
news:<ac**************************@posting.google. com>...

na***********@yahoo.com (Zombie) wrote in message
news:<90**************************@posting.google. com>...
> Hi, what is the correct way of converting contents of a <string> to
> lowercase?
> There are no methods of <string> class to do this so I fallback on
> strlwr().
> But the c_str() method returns a const pointer which cannot be used
> with strlwr() as it does the conversion inplace. So, I use the
> following logic of copying the contents to a dynamically allocated
> char* array and then doing the conversion: > -----------------------------
> string str = "faLSe";
> char* pc_str = NULL; > pc_str = new char[str.length() + 1];
> memset(pc_str, 0, sizeof(pc_str)); > strcpy(pc_str, str.c_str());
> strlwr(pc_str);
> // pc_str now contains "false"
> ----------------------------- > Is there any other, less cumbersome way of doing the same?

The easiest way I know is to use the transform() function from the
<algorithm> library. Here's an example of how to apply this to a
string to convert the case... #include <iostream>
#include <string>
#include <algorithm>
#include <functional>
#include <cctype> using namespace std; int main ( int argc, char **argv )
{
string A = "TeStInG!"; cout << A << endl; // output: TeStInG!
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );
cout << A << endl; // output: testing!
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::toupper) );
cout << A << endl; // output: TESTING!
}

1. This isn't guaranteed to compile, for at least two reasons. The
obvious one is that you've forgotten to include <ostream>. The less
obvious one is that any C++ header may include any other C++
headers; if <iostream> includes <locale> (actually
quite likely, since often <iostream> just includes everything in the
iostream section of the library, and both basic_ios and
basic_streambuf need <locale>), then the call to ptr_fun will be
abiguous.

Formally, in fact, I think that the standard guarantees that it
won't compile, since there shouldn't be a tolower nor a toupper in
global namespace. (But I could be wrong about this. I don't really
understand the interactions between "using namespace" and the ::
specifier.) In practice, however, I don't know of a single
implementation which is conformant in this regard.

2. If it compiles, and uses the tolower in <cctype>, then you have
undefined behavior, at least if plain char is signed (as it is on
most systems). Passing a negative value to the tolower function in
<cctype> is undefined behavior.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #14

kanze

ol*****@inspire.net.nz (Old Wolf) wrote in message
news:<84**************************@posting.google. com>...

Kai-Uwe Bux <jk********@gmx.net> wrote:
Zombie wrote:
Hi, what is the correct way of converting contents of a <string>
to lowercase?
The following uses the current locale to convert a string to
lowercase: #include <string>
#include <iostream>
#include <locale> template < typename Iter >
void range_tolower ( Iter beg, Iter end ) {
for( Iter iter = beg; iter != end; ++iter ) {
*iter = std::tolower( *iter );
}
}
Unfortunately, std::tolower requires an argument in the range
0...UCHAR_MAX.
No, it takes a second parameter, a std::locale. E.g.:

*iter = std::tolower( *iter, std::locale() ) ;

At least in a conforming implementation (see below).
So you can go: *iter = std::tolower( (unsigned char)*iter );
This works on a conforming implementation as well, as long as you
include <clocale> rather than <locale> or <locale.h>. Conforming
implementations are still pretty rare, however, and I've found that
leaving the std:: off and including <locale.h> seems to be about the
only thing that works portably.

And of course, since in this case, you are using the C version of
tolower, you have to ensure that the input is an unsigned char. And, as
you say, hope that the results don't get mangled when you reconvert back
to char -- realistically, the amount of code that mangling them would
break (even though the standard allows it) is so large that no
implementation would dare...

Finally, of course, none of the solutions really work, because there is
no one to one mapping of upper case characters to lower case characters.
and hope that it gets converted back to char properly afterwards, or: if (*iter >= 0 && *iter <= UCHAR_MAX)
*iter = std::tolower(*iter);

Actually, the only cases such a mapping can make sense is when you are
using pure ASCII, and have no accented characters. So:

assert( *iter >= 0 && *iter <= 127 ) ;

(assuming ASCII, obviously -- this isn't really portable).

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #15

Matt Wharton

"Andre Kostur" <nn******@kostur.net> wrote in message
news:Xn*******************************@207.35.177. 134...

"Matt Wharton" <no****@noSpam.com> wrote in news:1092853381.442698
@cswreg.cos.agilent.com:
> You might look at using the 'for_each' algorithm in conjunction with the > string's iterators and 'tolower'.

Wouldn't you want std::transform ?

Yes, quite right; that would be better. My mistake.

-Matt

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #16

Kevin W.

> using namespace std;

transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );

A question: what does the double-colon mean in this context, and from
which library does the tolower function come?

--
Kevin W :-)
Opera/CSS/webdev blog: http://www.exclipy.com/
Using Opera: http://www.opera.com/m2/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #17

Francis Glassborow

In article <op**************@localhost.localdomain>, Kevin W.
<co*****@in.sig> writes

using namespace std;
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );

A question: what does the double-colon mean in this context, and from
which library does the tolower function come?

As a using directive is in operation, ::tolower() forces the lookup to
be only in the global namespace + any other names injected with using
declarations. This form of disambiguation is one of the few advantages
of using directives.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #18

llewelly

ka***@gabi-soft.fr writes:

no**********@stoneentertainment.com (Brian Stone) wrote in message
news:<ac**************************@posting.google. com>...
The easiest way I know is to use the transform() function from the
<algorithm> library. Here's an example of how to apply this to a
string to convert the case...
#include <iostream>
#include <string>
#include <algorithm>
#include <functional>
#include <cctype>

using namespace std;

int main ( int argc, char **argv )
{
string A = "TeStInG!";

cout << A << endl; // output: TeStInG!
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );
cout << A << endl; // output: testing!
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::toupper) );
cout << A << endl; // output: TESTING!
}

[snip] In fact, the only variant which compiled (and that got a warning from
Sun CC) is yours, with ::tolower and ::toupper. And you are playing on a
bug in practically every implementation of <cctype>, which exposes
::tolower and ::toupper (rather than only having them available in
std::, as the standard requires).

[snip]

The 'using namespace std;' at global scope makes std::tolower
and std::toupper be availible at global scope. (See 3.4.3.2)

Even without the 'using namespace std', we have 17.4.3.1.3/5:

# Each function signature from the Standard C library declared
# with external linkage is reserved to the implementation for use
# as a function signature with both extern "C" and extern "C++"
# linkage, (168) or as a name of namespace scope in the global
# namespace.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #19

kanze

Francis Glassborow <fr*****@robinton.demon.co.uk> wrote in message
news:<ZR**************@robinton.demon.co.uk>...

In article <op**************@localhost.localdomain>, Kevin W.
<co*****@in.sig> writes
using namespace std;
transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );
A question: what does the double-colon mean in this context, and from
which library does the tolower function come?
As a using directive is in operation, ::tolower() forces the lookup to
be only in the global namespace + any other names injected with using
declarations.
Are you sure?

I ask because this doesn't seem to be the behavior I'm seeing with most
compilers. If I compile exactly the original program, but with an
#include <locale> as well (so that a couple of other tolower are
available too), I still don't get an error about an ambiguous function;
both g++ and Sun CC chose uniquely the tolower in <cctype> (which in
both implementations, is actually in global namespace, instead of in
std:: as the standard requires). Sun CC, of course, does warn that I'm
trying to use an `extern "C"' function in a context which requires an
`extern "C++"' one. (I think that the ::tolower that g++ picks up is
also an `extern "C"'. If so, his code only works because of two
successive compiler errors.)
This form of disambiguation is one of the few advantages of using
directives.

It may be, but if so, it isn't very portable in practice, since not all
compilers implement it correctly:-).

And maybe I'm just dumb, but I find that the complexity here is getting
beyond what I can master.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #20

kanze

"Kevin W." <co*****@in.sig> wrote in message
news:<op**************@localhost.localdomain>...

using namespace std;
> transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );
A question: what does the double-colon mean in this context, and from
which library does the tolower function come?

It tells the compiler only to look in global namespace, not in std::.

In the actual code in question, according to the standard, there was no
function tolower in global namespace. In practice, however, most, if
not all implementations are broken in this regard, and the C version of
tolower (the one in <cctypes>) is visible in global namespace.

Of course, he could have gotten the same effect (I think -- I'm not
really that sure of the standard here, and of course, no two compilers
do exactly the same thing anyway) by including <ctype.h>.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #21

kanze

llewelly <ll*********@xmission.dot.com> wrote in message
news:<86************@Zorthluthik.local.bar>...

ka***@gabi-soft.fr writes:
> no**********@stoneentertainment.com (Brian Stone) wrote in message
> news:<ac**************************@posting.google. com>...
>> The easiest way I know is to use the transform() function from the
>> <algorithm> library. Here's an example of how to apply this to a
>> string to convert the case... >> #include <iostream>
>> #include <string>
>> #include <algorithm>
>> #include <functional>
>> #include <cctype> >> using namespace std; >> int main ( int argc, char **argv )
>> {
>> string A = "TeStInG!"; >> cout << A << endl; // output: TeStInG!
>> transform ( A.begin(), A.end(), A.begin(), ptr_fun(::tolower) );
>> cout << A << endl; // output: testing!
>> transform ( A.begin(), A.end(), A.begin(), ptr_fun(::toupper) );
>> cout << A << endl; // output: TESTING!
>> }

[snip]
> In fact, the only variant which compiled (and that got a warning
> from Sun CC) is yours, with ::tolower and ::toupper. And you are
> playing on a bug in practically every implementation of <cctype>,
> which exposes ::tolower and ::toupper (rather than only having them
> available in std::, as the standard requires).

[snip]

The 'using namespace std;' at global scope makes std::tolower
and std::toupper be availible at global scope. (See 3.4.3.2)
Now if only someone would interpret the second paragraph of that section
into something I could understand.

There is a statement to the effect that "using-derectives are ingored in
any namspace, including X, directly containing one or more declarations
of m". If there is a toupper declared in :: (the standard forbids it,
but all of the implementations I know do have one), then the
using-directives in :: should be ignored. Which would make this code
work. But it wouldn't work if the code, including the using directive,
where in another namespace.

This is all getting a bit beyond me. What I do know is that the
compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower and
toupper as ambiguous here, even if I include <locale> (so that there are
any number of toupper and tolower functions available). Whereas they do
if I leave the :: out. So whatever the standard says...

The whole thing has gotten me totally confused. For the moment, I'll
just stick with my current solution :

#include <ctype.h>

namespace {
struct ToUpper
{
char operator()( char ch ) const
{
return ::toupper( (unsigned char)ch ) ;
}
}
}

// ...

transform( a.begin(), a.end(), a.begin(), ToUpper() ) ;

It also has the advantage of not having undefined behavior when one of
the char values happens to be negative. And it should still work even
if an implementor eventually does come up with a conforming <cctype>.

(Of course, in practice I never use toupper or tolower anyway, because I
often have to deal with things like 'ß'. But I do use other functions
in <ctype.h>, with other algorithms, and when I do, I use something like
this.)
Even without the 'using namespace std', we have 17.4.3.1.3/5: # Each function signature from the Standard C library declared
# with external linkage is reserved to the implementation for use
# as a function signature with both extern "C" and extern "C++"
# linkage, (168) or as a name of namespace scope in the global
# namespace.

That's a reservation of the name; it doesn't mean that the name is
visible. What it means is that if you define an `isupper' function at
global scope, your program might not work.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #22

Gabriel Dos Reis

ka***@gabi-soft.fr writes:

[...]

| > The 'using namespace std;' at global scope makes std::tolower
| > and std::toupper be availible at global scope. (See 3.4.3.2)
|
| Now if only someone would interpret the second paragraph of that section
| into something I could understand.

I think only compiler writers care about what it means ;-p

| There is a statement to the effect that "using-derectives are ingored in
| any namspace, including X, directly containing one or more declarations
| of m". If there is a toupper declared in :: (the standard forbids it,
| but all of the implementations I know do have one), then the
| using-directives in :: should be ignored. Which would make this code
| work. But it wouldn't work if the code, including the using directive,
| where in another namespace.

The executive summary is that if you write X::m, then any actual
declaration of "m" in "X" hides any other declarations that would have
been found by searching namespaces nominated, directly or indirectly,
in using declarations reachable from X. If no actually declaration
for "m" is made in "X", then the result of the name lookup will be
that of applying the rule recursively to the nominated namespaces.

| This is all getting a bit beyond me. What I do know is that the
| compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower and
| toupper as ambiguous here, even if I include <locale> (so that there are
| any number of toupper and tolower functions available). Whereas they do
| if I leave the :: out. So whatever the standard says...

because the declaration of ::toupper "hides" other declarations for
toupper in the used namespace std.

--
Gabriel Dos Reis
gd*@integrable-solutions.net

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #23

kanze

Gabriel Dos Reis <gd*@integrable-solutions.net> wrote in message
news:<m3************@uniton.integrable-solutions.net>...

ka***@gabi-soft.fr writes: [...] | > The 'using namespace std;' at global scope makes std::tolower and
| > std::toupper be availible at global scope. (See 3.4.3.2) | Now if only someone would interpret the second paragraph of that
| section into something I could understand. I think only compiler writers care about what it means ;-p
I think that there's a lot like that in the standard:-). Maybe one day,
I'll get the occasion to write a compiler, instead of just complaining
about them. (The current crop of compilers are pretty lousy -- they
always do what say, and never what I mean:-).)
| There is a statement to the effect that "using-derectives are
| ingored in any namspace, including X, directly containing one or
| more declarations of m". If there is a toupper declared in :: (the
| standard forbids it, but all of the implementations I know do have
| one), then the using-directives in :: should be ignored. Which
| would make this code work. But it wouldn't work if the code,
| including the using directive, where in another namespace. The executive summary is that if you write X::m, then any actual
declaration of "m" in "X" hides any other declarations that would have
been found by searching namespaces nominated, directly or indirectly,
in using declarations reachable from X. If no actually declaration
for "m" is made in "X", then the result of the name lookup will be
that of applying the rule recursively to the nominated namespaces.
In sum, what I had intuitively expected (and what the compilers I use
seem to implement). So why did the other posters say that "using
namespace std" meant that ::toupper would find a toupper in std::.

But wait a minute. If I say explicitly that the only toupper that I
want considered is the one in global namespace (e.g. ::toupper), and
there isn't one in global namespace, the compiler will look elsewhere?
That doesn't sound intuitively right -- I would expect an error.
| This is all getting a bit beyond me. What I do know is that the
| compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower and
| toupper as ambiguous here, even if I include <locale> (so that there
| are any number of toupper and tolower functions available). Whereas
| they do if I leave the :: out. So whatever the standard says... because the declaration of ::toupper "hides" other declarations for
toupper in the used namespace std.

Except that, of course, if the libraries were conform, there wouldn't
have been a ::toupper in global namespace:-).

Anyhow, I still contend that the only "correct" solution using transform
involves something like:
boost::bind( (char (*)(char, std::locale const&))&std::toupper,
_1, std::locale() )
For a pretty weak definition of correct, even then -- any toupper that
doesn't convert "Maße" to "MASSE" is irremedially broken, and won't be
acceptable to some of my customers.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #24

Gabriel Dos Reis

ka***@gabi-soft.fr writes:

[...]

| > | There is a statement to the effect that "using-derectives are
| > | ingored in any namspace, including X, directly containing one or
| > | more declarations of m". If there is a toupper declared in :: (the
| > | standard forbids it, but all of the implementations I know do have
| > | one), then the using-directives in :: should be ignored. Which
| > | would make this code work. But it wouldn't work if the code,
| > | including the using directive, where in another namespace.
|
| > The executive summary is that if you write X::m, then any actual
| > declaration of "m" in "X" hides any other declarations that would have
| > been found by searching namespaces nominated, directly or indirectly,
| > in using declarations reachable from X. If no actually declaration
| > for "m" is made in "X", then the result of the name lookup will be
| > that of applying the rule recursively to the nominated namespaces.
|
| In sum, what I had intuitively expected (and what the compilers I use
| seem to implement). So why did the other posters say that "using
| namespace std" meant that ::toupper would find a toupper in std::.

I cannot speak for them and I hope they will clarify hat they meant.
And to tell the truth, I've lost most of those postings.

| But wait a minute. If I say explicitly that the only toupper that I
| want considered is the one in global namespace (e.g. ::toupper), and
| there isn't one in global namespace, the compiler will look elsewhere?

Yes. This is called in TC++PL3 "namespace composition" -- you trick
people into thinking that the "m" they're referring to in X::m comes
from your "X", whereas you may have just "stolen" it through
using-directives. E.g.

namespace N {
int m;
};

namespace X {
using namespace N; // compose X with N
}

int main()
{
return X::m; // finds N::m;
}

| That doesn't sound intuitively right -- I would expect an error.

I guess, it depends on how you look at the "::".
A view is that it is a scope resolution operator, i.e. it
disambiguates when there is a scope problem -- either there is no
visible declaration or there are too many visible declarations from
different scopes.

| > | This is all getting a bit beyond me. What I do know is that the
| > | compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower and
| > | toupper as ambiguous here, even if I include <locale> (so that there
| > | are any number of toupper and tolower functions available). Whereas
| > | they do if I leave the :: out. So whatever the standard says...
|
| > because the declaration of ::toupper "hides" other declarations for
| > toupper in the used namespace std.
|
| Except that, of course, if the libraries were conform, there wouldn't
| have been a ::toupper in global namespace:-).

Yes, but you (James) won't quibble me on that; right? :-)

| Anyhow, I still contend that the only "correct" solution using transform
| involves something like:
| boost::bind( (char (*)(char, std::locale const&))&std::toupper,
| _1, std::locale() )

I really do dislike the cast notation in front of std::toupper. It is
not a cast, it is an abuse of notation (manual overload resolution).
People should not be tricked into thinking that someone is doing a
weird cast from std::toupper; let's sequester cast notations to cast.

--
Gabriel Dos Reis
gd*@integrable-solutions.net

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #25

kanze

Gabriel Dos Reis <gd*@integrable-solutions.net> wrote in message
news:<m3************@uniton.integrable-solutions.net>...

ka***@gabi-soft.fr writes: [...] | > | There is a statement to the effect that "using-derectives are
| > | ingored in any namspace, including X, directly containing one or
| > | more declarations of m". If there is a toupper declared in ::
| > | (the standard forbids it, but all of the implementations I know
| > | do have one), then the using-directives in :: should be ignored.
| > | Which would make this code work. But it wouldn't work if the
| > | code, including the using directive, where in another namespace. | > The executive summary is that if you write X::m, then any actual
| > declaration of "m" in "X" hides any other declarations that would
| > have been found by searching namespaces nominated, directly or
| > indirectly, in using declarations reachable from X. If no actually
| > declaration for "m" is made in "X", then the result of the name
| > lookup will be that of applying the rule recursively to the
| > nominated namespaces. | In sum, what I had intuitively expected (and what the compilers I
| use seem to implement). So why did the other posters say that
| "using namespace std" meant that ::toupper would find a toupper in
| std::. I cannot speak for them and I hope they will clarify hat they meant.
And to tell the truth, I've lost most of those postings.
I may have misunderstood what they were trying to say, but I got the
impression from them that what they were saying was that the =AB using
namespace std =BB was why the compiler was finding a toupper in std
namespace.
| But wait a minute. If I say explicitly that the only toupper that I
| want considered is the one in global namespace (e.g. ::toupper), and
| there isn't one in global namespace, the compiler will look
| elsewhere? Yes. This is called in TC++PL3 "namespace composition" -- you trick
people into thinking that the "m" they're referring to in X::m comes
from your "X", whereas you may have just "stolen" it through
using-directives. E.g. namespace N {
int m;
}; namespace X {
using namespace N; // compose X with N
} int main()
{
return X::m; // finds N::m;
} | That doesn't sound intuitively right -- I would expect an error. I guess, it depends on how you look at the "::".
Yes. It occured to me shortly after posting that this is sort of the
way the :: works within a class hierarchy. It will look deeper than the
classname given, but only if it doesn't find the name in the first
class.

The analogy is far from exact, but it is enough to make me suspicious of
my "intuitively". The situation has enough variety that there is no
intuitiveity.
A view is that it is a scope resolution operator, i.e. it
disambiguates when there is a scope problem -- either there is no
visible declaration or there are too many visible declarations from
different scopes. | > | This is all getting a bit beyond me. What I do know is that the
| > | compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower
| > | and toupper as ambiguous here, even if I include <locale> (so
| > | that there are any number of toupper and tolower functions
| > | available). Whereas they do if I leave the :: out. So whatever
| > | the standard says... | > because the declaration of ::toupper "hides" other declarations
| > for toupper in the used namespace std. | Except that, of course, if the libraries were conform, there
| wouldn't have been a ::toupper in global namespace:-). Yes, but you (James) won't quibble me on that; right? :-)
Well, I don't think that it's your fault, even if you actively work on
one of the libraries:-).

Realisticly, I wonder if the standard doesn't ask too much here. Maybe
it should make it unspecified whether <cctype> introduces the names into
global scope or not. Theoretically, I find what the standard requires
much better, but that doesn't do me any good if all of the implementors
ignore the requirement.
| Anyhow, I still contend that the only "correct" solution using
| transform involves something like:
| boost::bind( (char (*)(char, std::locale const&))&std::toupper,
| _1, std::locale() ) I really do dislike the cast notation in front of std::toupper. It is
not a cast, it is an abuse of notation (manual overload resolution).
People should not be tricked into thinking that someone is doing a
weird cast from std::toupper; let's sequester cast notations to cast.
I said correct, and I even put correct in quotes; I certainly didn't say
that it was elegant, nor that I liked it:-). For once, in fact, I agree
with you 100%.
From a pratical point of view: it's what the standard says, and most, if not all, implementations seem to be conformant on this particular point.From an even more pratical point of view: it's overly complex, totally

illegible, and so not really maintainable. In production code, I'd
always write a custom function which used the standard function, so that
overload resolution would handle the issue automatically.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #26

llewelly

ka***@gabi-soft.fr writes:

Gabriel Dos Reis <gd*@integrable-solutions.net> wrote in message
news:<m3************@uniton.integrable-solutions.net>...
ka***@gabi-soft.fr writes:
[...]

| > | There is a statement to the effect that "using-derectives are
| > | ingored in any namspace, including X, directly containing one or
| > | more declarations of m". If there is a toupper declared in ::
| > | (the standard forbids it, but all of the implementations I know
| > | do have one), then the using-directives in :: should be ignored.
| > | Which would make this code work. But it wouldn't work if the
| > | code, including the using directive, where in another namespace.

| > The executive summary is that if you write X::m, then any actual
| > declaration of "m" in "X" hides any other declarations that would
| > have been found by searching namespaces nominated, directly or
| > indirectly, in using declarations reachable from X. If no actually
| > declaration for "m" is made in "X", then the result of the name
| > lookup will be that of applying the rule recursively to the
| > nominated namespaces.

| In sum, what I had intuitively expected (and what the compilers I
| use seem to implement). So why did the other posters say that
| "using namespace std" meant that ::toupper would find a toupper in
| std::.

I cannot speak for them and I hope they will clarify hat they meant.
And to tell the truth, I've lost most of those postings.

I may have misunderstood what they were trying to say, but I got the
impression from them that what they were saying was that the =AB using
namespace std =BB was why the compiler was finding a toupper in std
namespace.

That's my understanding. I get it from 3.4.3/4 :

# A name prefixed by the unary scope operator :: (5.1) is looked
# up in global scope, in the translation unit where it is
# used. The name shall be declared in global namespace scope or
# shall be a name whose declaration is visible in global scope
# because of a using-directive (3.4.3.2). The use of :: allows a
# global name to be referred to even if its identifier has been
# hidden (3.3.7).

Note that I think 17.4.3.1.3/4 (see below) allows 'toupper' to be
found in the global namespace even without the 'using namespace
std'; AFAICT the 'using namespace std' servers only to require
that the name found have the semantics you expect for 'toupper',
as opposed to implementation-defined semantics.

[snip] Realisticly, I wonder if the standard doesn't ask too much here. Maybe
it should make it unspecified whether <cctype> introduces the names into
global scope or not. [snip]

I think the standard already goes farther than that, see 17.4.3.1.3/4:

# Each name from the Standard C library declared with external
# linkage is reserved to the implementation for use as a name with
# extern "C" linkage, both in namespace std and in the global
# namespace.

There has been some dispute here and on comp.std.c++ about what this
means, but my interpretation (which I would rather be wrong) is
that if a name such as 'toupper' is *not* brought into the global
namespace by a sequence such as:

#include<cstddef>
using namespace std;

'::toupper' has implementation-defined semantics.

Someday, I am going to to make time to test snippets such as:

//note - no headers #included.
extern "C" double qsort(double,double,double,double);

int main()
{
double d= qsort(1.0,1.0,1.0,1.0);
return (int)d;
}

on several different implementations. (On g++ 3.4 on freebsd, it
compiles with no errors or warnings, and dumps core at runtime. )
Theoretically, I find what the standard requires
much better,
I think you're mistaken about what it requires, though I wish you
were right.
but that doesn't do me any good if all of the implementors
ignore the requirement.
Agreed. I do wonder if #including the C++ <cxxx> headers is actually
more dangerous than #including the older equivalent inherted from
C89 .

| Anyhow, I still contend that the only "correct" solution using
| transform involves something like:
| boost::bind( (char (*)(char, std::locale const&))&std::toupper,
| _1, std::locale() )
I really do dislike the cast notation in front of std::toupper. It is
not a cast, it is an abuse of notation (manual overload resolution).
People should not be tricked into thinking that someone is doing a
weird cast from std::toupper; let's sequester cast notations to cast.

I said correct, and I even put correct in quotes; I certainly didn't say
that it was elegant, nor that I liked it:-). For once, in fact, I agree
with you 100%.
From a pratical point of view: it's what the standard says, and most, if

not all, implementations seem to be conformant on this particular point.
From an even more pratical point of view: it's overly complex, totally

illegible, and so not really maintainable.

In some cases (though not the above) I find it cleaner to specify the
template arguments to the function template explicitly.
In production code, I'd
always write a custom function which used the standard function, so that
overload resolution would handle the issue automatically.

[snip]

Agreed.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #27

kanze

llewelly <ll*********@xmission.dot.com> wrote in message
news:<86************@Zorthluthik.local.bar>...

ka***@gabi-soft.fr writes:
> Gabriel Dos Reis <gd*@integrable-solutions.net> wrote in message
> news:<m3************@uniton.integrable-solutions.net>...
>> ka***@gabi-soft.fr writes: >> [...] >> | > | There is a statement to the effect that "using-derectives
>> | > | are ingored in any namspace, including X, directly
>> | > | containing one or more declarations of m". If there is a
>> | > | toupper declared in :: (the standard forbids it, but all of
>> | > | the implementations I know do have one), then the
>> | > | using-directives in :: should be ignored. Which would make
>> | > | this code work. But it wouldn't work if the code, including
>> | > | the using directive, where in another namespace. >> | > The executive summary is that if you write X::m, then any
>> | > actual declaration of "m" in "X" hides any other declarations
>> | > that would have been found by searching namespaces nominated,
>> | > directly or indirectly, in using declarations reachable from
>> | > X. If no actually declaration for "m" is made in "X", then the
>> | > result of the name lookup will be that of applying the rule
>> | > recursively to the nominated namespaces. >> | In sum, what I had intuitively expected (and what the compilers
>> | I use seem to implement). So why did the other posters say that
>> | "using namespace std" meant that ::toupper would find a toupper
>> | in std::. >> I cannot speak for them and I hope they will clarify hat they
>> meant. And to tell the truth, I've lost most of those postings.
> I may have misunderstood what they were trying to say, but I got
> the impression from them that what they were saying was that the
> =AB using namespace std =BB was why the compiler was finding a
> toupper in std namespace. That's my understanding. I get it from 3.4.3/4 : # A name prefixed by the unary scope operator :: (5.1) is looked
# up in global scope, in the translation unit where it is
# used. The name shall be declared in global namespace scope or
# shall be a name whose declaration is visible in global scope
# because of a using-directive (3.4.3.2). The use of :: allows a
# global name to be referred to even if its identifier has been
# hidden (3.3.7).
There are two things involved here. First, although the standard
doesn't allow it, there is a toupper in global namespace. Second, there
are a number of toupper in std:: (since in the implementation I use,
<iostream> does indirectly pull in <locale>). What I don't understand
is this: having done "using namespace std:",

- if I write ::toupper, and the toupper in global namespace isn't
available, do I find the toupper in std::, and

- if the above is true, and I do have a toupper in global namespace,
why isn't the function call ambiguous.

(I think that to really understand this, I'm going to have to find time
to write up some simple examples of my own. It's difficult following
toupper, because you are never 100% sure what the library may have done
with it.)
Note that I think 17.4.3.1.3/4 (see below) allows 'toupper' to be
found in the global namespace even without the 'using namespace
std'; AFAICT the 'using namespace std' servers only to require
that the name found have the semantics you expect for 'toupper',
as opposed to implementation-defined semantics.
I'm not sure. It certainly means that I cannot define a toupper of my
own in global namespace.

I think the intent is just a pratical one (for compiler implementers).
Regardless of the namespace in which I declare or define an ``extern
"C"'' function, the name must appear to the linker as if the function
were defined in the global namespace, since C can't do it any
differently. If the name isn't reserved to the implementation, I could
legally define a function of this name myself, and the linker would take
it instead of the one from the C library. That doesn't mean that my
code can see the name outside of std::.

But there may be other "special features" of ``extern "C"'' that I'm not
familiar with, which do make if visible.
[snip]
> Realisticly, I wonder if the standard doesn't ask too much here.
> Maybe it should make it unspecified whether <cctype> introduces the
> names into global scope or not. [snip]

I think the standard already goes farther than that, see 17.4.3.1.3/4:

# Each name from the Standard C library declared with external
# linkage is reserved to the implementation for use as a name
# with extern "C" linkage, both in namespace std and in the
# global namespace.
The name is reserved to the implementation. That doesn't mean that I
can see it. Or does it?

I think a clarification is in order.
There has been some dispute here and on comp.std.c++ about what this
means, but my interpretation (which I would rather be wrong) is
that if a name such as 'toupper' is *not* brought into the global
namespace by a sequence such as: #include<cstddef>
using namespace std; '::toupper' has implementation-defined semantics.
What interests me is what happens without the "using namespace std;".
Is the compiler still allowed to find toupper?
Someday, I am going to to make time to test snippets such as: //note - no headers #included.
extern "C" double qsort(double,double,double,double); int main()
{
double d= qsort(1.0,1.0,1.0,1.0);
return (int)d;
} on several different implementations. (On g++ 3.4 on freebsd, it
compiles with no errors or warnings, and dumps core at runtime. )
As far as I can see, you've violated §17.4.3.1.3/5, so your code has
undefined behavior. (Or does this paragraph only apply if you include
at least one standard header?)
> Theoretically, I find what the standard requires much better, I think you're mistaken about what it requires, though I wish you
were right.
I think it is open to interpretation.

Personally, for the moment, I stick with the good old <ctype.h> -- at
least I know what I'm getting:-). Sort of, because of course,
<ctype.h> can, in fact, I think it should, also expose the names in
std::. In practice, I don't think it does, most of the time.
> but that doesn't do me any good if all of the implementors ignore
> the requirement.

Agreed. I do wonder if #including the C++ <cxxx> headers is actually
more dangerous than #including the older equivalent inherted from
C89 .

That's been my fear. I don't know whether it is really justified by the
standard, but it does seem that what actual implementations do is less
clear than in the case of <ctype.h>. (Of course, one of the actual
implementations I still have to deal with is g++ 2.95.2. Which
complicates the issue because of its particular handling of std::.)

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #28

Gabriel Dos Reis

ka***@gabi-soft.fr writes:

| llewelly <ll*********@xmission.dot.com> wrote in message
| news:<86************@Zorthluthik.local.bar>...
| > ka***@gabi-soft.fr writes:
|
| > > Gabriel Dos Reis <gd*@integrable-solutions.net> wrote in message
| > > news:<m3************@uniton.integrable-solutions.net>...
| > >> ka***@gabi-soft.fr writes:
|
| > >> [...]
|
| > >> | > | There is a statement to the effect that "using-derectives
| > >> | > | are ingored in any namspace, including X, directly
| > >> | > | containing one or more declarations of m". If there is a
| > >> | > | toupper declared in :: (the standard forbids it, but all of
| > >> | > | the implementations I know do have one), then the
| > >> | > | using-directives in :: should be ignored. Which would make
| > >> | > | this code work. But it wouldn't work if the code, including
| > >> | > | the using directive, where in another namespace.
|
| > >> | > The executive summary is that if you write X::m, then any
| > >> | > actual declaration of "m" in "X" hides any other declarations
| > >> | > that would have been found by searching namespaces nominated,
| > >> | > directly or indirectly, in using declarations reachable from
| > >> | > X. If no actually declaration for "m" is made in "X", then the
| > >> | > result of the name lookup will be that of applying the rule
| > >> | > recursively to the nominated namespaces.
|
| > >> | In sum, what I had intuitively expected (and what the compilers
| > >> | I use seem to implement). So why did the other posters say that
| > >> | "using namespace std" meant that ::toupper would find a toupper
| > >> | in std::.
|
| > >> I cannot speak for them and I hope they will clarify hat they
| > >> meant. And to tell the truth, I've lost most of those postings.
|
| > > I may have misunderstood what they were trying to say, but I got
| > > the impression from them that what they were saying was that the
| > > =AB using namespace std =BB was why the compiler was finding a
| > > toupper in std namespace.
|
| > That's my understanding. I get it from 3.4.3/4 :
|
| > # A name prefixed by the unary scope operator :: (5.1) is looked
| > # up in global scope, in the translation unit where it is
| > # used. The name shall be declared in global namespace scope or
| > # shall be a name whose declaration is visible in global scope
| > # because of a using-directive (3.4.3.2). The use of :: allows a
| > # global name to be referred to even if its identifier has been
| > # hidden (3.3.7).
|
| There are two things involved here. First, although the standard
| doesn't allow it, there is a toupper in global namespace. Second, there
| are a number of toupper in std:: (since in the implementation I use,
| <iostream> does indirectly pull in <locale>). What I don't understand
| is this: having done "using namespace std:",
|
| - if I write ::toupper, and the toupper in global namespace isn't
| available, do I find the toupper in std::, and

Yes, I already explained why.

| - if the above is true, and I do have a toupper in global namespace,
| why isn't the function call ambiguous.

Because the declaration at the global scope "hides" other declarations
available through used namespaces. See my previous explanation.

| (I think that to really understand this, I'm going to have to find time
| to write up some simple examples of my own. It's difficult following
| toupper, because you are never 100% sure what the library may have done
| with it.)

Probably.

--
Gabriel Dos Reis
gd*@integrable-solutions.net

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #29

Gabriel Dos Reis

llewelly <ll*********@xmission.dot.com> writes:

| ka***@gabi-soft.fr writes:
|
| > Gabriel Dos Reis <gd*@integrable-solutions.net> wrote in message
| > news:<m3************@uniton.integrable-solutions.net>...
| >> ka***@gabi-soft.fr writes:
| >
| >> [...]
| >
| >> | > | There is a statement to the effect that "using-derectives are
| >> | > | ingored in any namspace, including X, directly containing one or
| >> | > | more declarations of m". If there is a toupper declared in ::
| >> | > | (the standard forbids it, but all of the implementations I know
| >> | > | do have one), then the using-directives in :: should be ignored.
| >> | > | Which would make this code work. But it wouldn't work if the
| >> | > | code, including the using directive, where in another namespace.
| >
| >> | > The executive summary is that if you write X::m, then any actual
| >> | > declaration of "m" in "X" hides any other declarations that would
| >> | > have been found by searching namespaces nominated, directly or
| >> | > indirectly, in using declarations reachable from X. If no actually
| >> | > declaration for "m" is made in "X", then the result of the name
| >> | > lookup will be that of applying the rule recursively to the
| >> | > nominated namespaces.
| >
| >> | In sum, what I had intuitively expected (and what the compilers I
| >> | use seem to implement). So why did the other posters say that
| >> | "using namespace std" meant that ::toupper would find a toupper in
| >> | std::.
| >
| >> I cannot speak for them and I hope they will clarify hat they meant.
| >> And to tell the truth, I've lost most of those postings.
| >
| > I may have misunderstood what they were trying to say, but I got the
| > impression from them that what they were saying was that the =AB using
| > namespace std =BB was why the compiler was finding a toupper in std
| > namespace.
|
| That's my understanding. I get it from 3.4.3/4 :
|
| # A name prefixed by the unary scope operator :: (5.1) is looked
| # up in global scope, in the translation unit where it is
| # used. The name shall be declared in global namespace scope or
| # shall be a name whose declaration is visible in global scope
| # because of a using-directive (3.4.3.2). The use of :: allows a
| # global name to be referred to even if its identifier has been
| # hidden (3.3.7).
What this says is that with ::toupper, you find those in the
global namespace *if* there are corresponding declarations there;
*otherwise*, you find those available through searching of used
namespaces (directly or indirectly). In particular, you do NOT find
both categoriss.

| Note that I think 17.4.3.1.3/4 (see below) allows 'toupper' to be
| found in the global namespace even without the 'using namespace
| std';

No, it does not. What that paragraph means is that a user does cannot
define them at global scope, or declare them with C language linkage.

| AFAICT the 'using namespace std' servers only to require
| that the name found have the semantics you expect for 'toupper',
| as opposed to implementation-defined semantics.
|
| [snip]
| > Realisticly, I wonder if the standard doesn't ask too much here. Maybe
| > it should make it unspecified whether <cctype> introduces the names into
| > global scope or not.
| [snip]
|
| I think the standard already goes farther than that, see 17.4.3.1.3/4:
|
| # Each name from the Standard C library declared with external
| # linkage is reserved to the implementation for use as a name with
| # extern "C" linkage, both in namespace std and in the global
| # namespace.
|
| There has been some dispute here and on comp.std.c++ about what this
| means, but my interpretation (which I would rather be wrong) is
| that if a name such as 'toupper' is *not* brought into the global
| namespace by a sequence such as:
|
| #include<cstddef>
| using namespace std;
|
| '::toupper' has implementation-defined semantics.

I don't see how you derive tht. Certainly <cstddef> is not described
to define namespace in the global namespace.

| Someday, I am going to to make time to test snippets such as:
|
| //note - no headers #included.
| extern "C" double qsort(double,double,double,double);

Strictly speaking, you get into undefined beahviour territory.
Presicely because of the very paragraph you quote above.

--
Gabriel Dos Reis
gd*@integrable-solutions.net

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #30

Zombie wrote:

Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this

I know I'm a little late responding, but I was reading this thread and
browsing through my compiler docs and the standard, and I was wondering
if there would be something wrong with:

std::string t("MiXeD cAsE");
std::ctype<std::string::value_type>().tolower(t.be gin(),t.end());

Or have I missed something really obvious? Character sets other than
ASCII 0...127?

LR

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 23 '05 #31

sanjay

I think you didn't even try compiling your code. std::ctype::~ctype is
protected and won't let you compile.

Thanks,
Sanjay.

LR wrote:

Zombie wrote:
Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this
I know I'm a little late responding, but I was reading this thread

and browsing through my compiler docs and the standard, and I was wondering if there would be something wrong with:

std::string t("MiXeD cAsE");
std::ctype<std::string::value_type>().tolower(t.be gin(),t.end());

Or have I missed something really obvious? Character sets other than
ASCII 0...127?

LR

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 23 '05 #32

kanze

LR wrote:

Zombie wrote:
Hi, what is the correct way of converting contents of a
<string> to lowercase? There are no methods of <string>
class to do this

I know I'm a little late responding, but I was reading this
thread and browsing through my compiler docs and the standard,
and I was wondering if there would be something wrong with: std::string t("MiXeD cAsE");
std::ctype<std::string::value_type>().tolower(t.be gin(),t.end());

Or have I missed something really obvious?
Just that ctype::tolower doesn't take string iterators as
parameters. (I'm also a bit dubious about using the default
constructor of ctype -- I'm not sure what the resulting class is
supposed to do. The whole point of ctype is that you get it
from a defined locale.)
Character sets other than ASCII 0...127?

That shouldn't be a problem. The conversion is obviously
locale specific, but for a given local, and an adequately
loose definition of work, it should work.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 23 '05 #33

msb222

I have one minor point to add to this discussion... While the tolower
and std::tolower, and locale stuff is all well and good for
lower-casing text strings in general, there is a non-trivial
performance cost to this operation. Depending on the implementations I
have measured including VC++7, which does case-insensitive symbol
lookups at runtime I am able to squeeze almost 15% faster performance
out of the code by using a cache strategy.

What we did was to write our own wrapper function structure called
fast_tolower... at startup time, an array is created of 256 bytes (if
you are using wide characters, you would end up with a 65536 array of
wchar_t, which would be about 128k of memory... which seems like a lot,
but our program does massive crunching on the order of gigs so it's
worth it). That then gets populated with the results of tolower() all
of the values in that byte range.

So all the fast_tolower lookup does is a constant array access, and
then we use that with std::transform. It's definitely overkill for a
small little program, but if you have a large system working on lots of
text or symbols that need to be case insensitive (dealing with one
locale of course) I think it's a good idea to do it this way.

One way you could generalize this cache is to have something possibly
templatized on the locale info, and then there would be one static
cache for each locale you are actually using if you care about case
info in that locale

Marc
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 23 '05 #34

ka***@gabi-soft.fr wrote:

LR wrote:
Zombie wrote:
Hi, what is the correct way of converting contents of a
<string> to lowercase?

std::string t("MiXeD cAsE");
std::ctype<std::string::value_type>().tolower(t. begin(),t.end());

Or have I missed something really obvious?

Just that ctype::tolower doesn't take string iterators as
parameters. (I'm also a bit dubious about using the default
constructor of ctype -- I'm not sure what the resulting class is
supposed to do. The whole point of ctype is that you get it
from a defined locale.)

Thanks for responding, and also to Sanjay, who pointed out the problem
of std::ctype's dtor being protected. I completely missed that. But my
compiler (VC++ 6.0) did compile and run the code I posted. More recent
MS product doesn't compile it. I also tried simply constructing a
std::ctype<char> at www.comeaucomputing.com and it seemed to compile.
LR

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 23 '05 #35

kanze

msb222 wrote:

I have one minor point to add to this discussion... While the
tolower and std::tolower, and locale stuff is all well and
good for lower-casing text strings in general, there is a
non-trivial performance cost to this operation.
I imagine that that is why the functions taking char* (instead
of just a single char) were added to the locale mechanism. In
most cases, it probably doesn't matter, but there are
conceivably cases where a virtual function call (as opposed to
an inlined function with no call) might be too expensive.
Depending on the implementations I have measured including
VC++7, which does case-insensitive symbol lookups at runtime I
am able to squeeze almost 15% faster performance out of the
code by using a cache strategy. What we did was to write our own wrapper function structure
called fast_tolower... at startup time, an array is created of
256 bytes
If this isn't what the implementation of ctype does, there's
something wrong with it. At least in the specialization for
char.
(if you are using wide characters, you would end up with a
65536 array of wchar_t, which would be about 128k of
memory... which seems like a lot, but our program does massive
crunching on the order of gigs so it's worth it). That then
gets populated with the results of tolower() all of the values
in that byte range.
To be really useful, wchar_t should be at least 21 bits. On the
machines I usually work on, it's 32 bits. And over 4 billion 4
byte elements isn't going to cut it.

In practice, of course, most of the code blocks don't have
upper/lower case, so using an additional level of indirection,
and only implementing the full table for blocks with at least
one upper/lower would probably be acceptable.
So all the fast_tolower lookup does is a constant array
access, and then we use that with std::transform. It's
definitely overkill for a small little program, but if you
have a large system working on lots of text or symbols that
need to be case insensitive (dealing with one locale of
course) I think it's a good idea to do it this way.

If profiling shows that your compiler's implementation of
ctype::tolower( char*, char const* ) is flakey, it's definitly a
solution to be considered. You might also want to consider it
simply because it means that you can pass the function arbitrary
iterators, rather than only char*'s -- if it avoids an otherwise
unnecessary copy, you have a speed advantage there as well.

Of course, if you want an internationalized environment, and to
do the conversion correctly, it gets a lot more complicated, and
the standard functions quickly become unusable (since toupper
will sometimes return two characters for a single lower case,
and things like that).

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 23 '05 #36

Ben Hutchings

msb222 wrote:
<snip>>

What we did was to write our own wrapper function structure called
fast_tolower... at startup time, an array is created of 256 bytes <snip> So all the fast_tolower lookup does is a constant array access, and
then we use that with std::transform.

<snip>

If you look at any implementation of std::tolower, I'm fairly sure
you'll find it does the same! The speed advantage of fast_tolower
probably comes either from inlining (if tolower is not inline), static
linking to the array (if the C library is dynamically linked) or the
avoidance of conversions.

--
Ben Hutchings
Having problems with C++ templates? Your questions may be answered by
<http://womble.decadentplace.org.uk/c++/template-faq.html>.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 23 '05 #37

kanze

msb222 wrote:

I have one minor point to add to this discussion... While the
tolower and std::tolower, and locale stuff is all well and
good for lower-casing text strings in general, there is a
non-trivial performance cost to this operation.
I imagine that that is why the functions taking char* (instead
of just a single char) were added to the locale mechanism. In
most cases, it probably doesn't matter, but there are
conceivably cases where a virtual function call (as opposed to
an inlined function with no call) might be too expensive.
Depending on the implementations I have measured including
VC++7, which does case-insensitive symbol lookups at runtime I
am able to squeeze almost 15% faster performance out of the
code by using a cache strategy. What we did was to write our own wrapper function structure
called fast_tolower... at startup time, an array is created of
256 bytes
If this isn't what the implementation of ctype does, there's
something wrong with it. At least in the specialization for
char.
(if you are using wide characters, you would end up with a
65536 array of wchar_t, which would be about 128k of
memory... which seems like a lot, but our program does massive
crunching on the order of gigs so it's worth it). That then
gets populated with the results of tolower() all of the values
in that byte range.
To be really useful, wchar_t should be at least 21 bits. On the
machines I usually work on, it's 32 bits. And over 4 billion 4
byte elements isn't going to cut it.

In practice, of course, most of the code blocks don't have
upper/lower case, so using an additional level of indirection,
and only implementing the full table for blocks with at least
one upper/lower would probably be acceptable.
So all the fast_tolower lookup does is a constant array
access, and then we use that with std::transform. It's
definitely overkill for a small little program, but if you
have a large system working on lots of text or symbols that
need to be case insensitive (dealing with one locale of
course) I think it's a good idea to do it this way.

Jul 23 '05 #38

<string> to lowercase

Similar topics