473,406 Members | 2,439 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Case insensitive set of strings

Hi,

I want a const static std::set of strings which is case insensitive
for the values.

So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.
TIA

Adrian

#include <iostream>
#include <functional>
#include <algorithm>
#include <set>
#include <string>
#include <iterator>

class Test
{
public:
void p()
{
std::copy(fields.begin(), fields.end(),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
}
private:
struct nocase_cmp : public std::binary_function<const
std::string &, const std::string &, bool>
{
struct nocase_char_cmp : public std::binary_function<char,
char, bool>
{
bool operator()(char a, char b)
{
return std::toupper(a) < std::toupper(b);
}
};
bool operator()(const std::string &a, const std::string &b)
{
return std::lexicographical_compare(a.begin(), a.end(),
b.begin(), b.end(),
nocase_char_cmp());
}
};

typedef std::set<std::string, nocase_cmpField_names_t;
static const Field_names_t fields;
};
const char *f[]={
"string1",
"string2",
"string3",
"STRIng1",
"string5"};

const Test::Field_names_t Test::fields(f, f+5);

int main(int argc, char *argv[])
{
Test t;
t.p();

return 0;
}

Apr 17 '07 #1
7 3309
Adrian wrote:
Hi,

I want a const static std::set of strings which is case insensitive
for the values.

So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.
I don't see anything obviously wrong with your code. However if your
set is _constant_ then std::set may be overkill (and incur needless time
and space penalties). A possibly more efficient approach would be to
use a sorted vector and the various binary search functions of the
standard library (lower_bound, upper_bound, binary_search, etc.).

Mark
>
TIA

Adrian

#include <iostream>
#include <functional>
#include <algorithm>
#include <set>
#include <string>
#include <iterator>

class Test
{
public:
void p()
{
std::copy(fields.begin(), fields.end(),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
}
private:
struct nocase_cmp : public std::binary_function<const
std::string &, const std::string &, bool>
{
struct nocase_char_cmp : public std::binary_function<char,
char, bool>
{
bool operator()(char a, char b)
{
return std::toupper(a) < std::toupper(b);
}
};
bool operator()(const std::string &a, const std::string &b)
{
return std::lexicographical_compare(a.begin(), a.end(),
b.begin(), b.end(),
nocase_char_cmp());
}
};

typedef std::set<std::string, nocase_cmpField_names_t;
static const Field_names_t fields;
};
const char *f[]={
"string1",
"string2",
"string3",
"STRIng1",
"string5"};

const Test::Field_names_t Test::fields(f, f+5);

int main(int argc, char *argv[])
{
Test t;
t.p();

return 0;
}
Apr 17 '07 #2
On Apr 17, 9:30 pm, Adrian <n...@bluedreamer.comwrote:
I want a const static std::set of strings which is case insensitive
for the values.
So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.
Your code has undefined behavior.
#include <iostream>
#include <functional>
#include <algorithm>
#include <set>
#include <string>
#include <iterator>
Don't forget:
#include <cctype>
(or <locale>, if you use the toupper functions from there).
class Test
{
public:
void p()
{
std::copy(fields.begin(), fields.end(),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
}
private:
struct nocase_cmp : public std::binary_function<const
std::string &, const std::string &, bool>
{
struct nocase_char_cmp : public std::binary_function<char,
char, bool>
{
bool operator()(char a, char b)
The function should be const, I think.
{
return std::toupper(a) < std::toupper(b);
Calling the single argument form of toupper with a char as
argument is undefined behavior. The argument type is int, with
the constraint that the value of the int must be either EOF, or
in the range [0...UCHAR_MAX]. If char is signed, it won't be in
range when converted (implicitly) to int.

There are two solutions here: either explicitly convert the char
to unsigned char before calling toupper, e.g.:

return toupper( static_cast< unsigned char >( a ) )
< toupper( static_cast< unsigned char >( b ) ) ;

or use the two operator forms in std::ctype. (In that case, I
would use something like:

class nocase_char_cmp
{
public:
typedef std::ctype< char >
ctype ;
explicit nocase_char_cmp(
std::locale const& l = std::locale() )
: my_ctype( &std::use_facet< ctype >( l ) )
{
}

bool operator()( char a, char b ) const
{
return my_ctype->tolower( a ) < my_ctype->toupper( a ) ;
}

private:
ctype const* my_ctype ;
} ;

..)

If you have a lot of case insensitive comparisons, it might be
worth writing a case insensitive collate facet (or there might
even be one available ready-made); in that case, just pass an
std::locale with this facet as the fifth argument to
lexicographical_compare, and you're done with it.
}
};
bool operator()(const std::string &a, const std::string &b)
{
return std::lexicographical_compare(a.begin(), a.end(),
b.begin(), b.end(),
nocase_char_cmp());
}
};
typedef std::set<std::string, nocase_cmpField_names_t;
static const Field_names_t fields;};
const char *f[]={
"string1",
"string2",
"string3",
"STRIng1",
"string5"};
Try throwing in some characters whose encoding results in a
negative number, and see what happens. (On my machine, just
about any accented character will do the trick. In my test
suites, I'll generally make sure that there is a ÿ somewhere,
since in the most frequent encoding, it is 0xFF, which, when
stored into a char, becomes -1, or EOF. You'd be surprised how
many programs stop when they encounter this character in a
file.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 18 '07 #3
On Apr 17, 11:42 pm, Mark P <use...@fall2005REMOVE.fastmailCAPS.fm>
wrote:
Adrian wrote:
I want a const static std::set of strings which is case insensitive
for the values.
So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.
I don't see anything obviously wrong with your code. However if your
set is _constant_ then std::set may be overkill (and incur needless time
and space penalties). A possibly more efficient approach would be to
use a sorted vector and the various binary search functions of the
standard library (lower_bound, upper_bound, binary_search, etc.).
In that case, you probably want char const*[], and use
lower_bound on it. (Sort the original data in the editor, e.g.
mark the block, and pipe it through the system utility sort with
the correct options for case insensistivity.) That way, you get
static initialization and thus avoid any order of initialization
problems. (Depending on the use, it might even be simpler to
not bother sorting it, and use std::find. Unless the actual
table has hundreds of entries, you're probably not likely to
notice the difference.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 18 '07 #4
On Apr 18, 2:09 am, James Kanze <james.ka...@gmail.comwrote:
On Apr 17, 9:30 pm, Adrian <n...@bluedreamer.comwrote:
I want a const static std::set of strings which is case insensitive
for the values.
So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.

Your code has undefined behavior.
Thanks James,

I thought the implicit would work without thinking much - a little
test proves it wont :-)

Thanks for the facet stuff - will have to read more as I've not used
them before.

Adrian

Apr 18 '07 #5
On Apr 18, 2:14 am, James Kanze <james.ka...@gmail.comwrote:
On Apr 17, 11:42 pm, Mark P <use...@fall2005REMOVE.fastmailCAPS.fm>
wrote:
Adrian wrote:
I want a const static std::set of strings which is case insensitive
for the values.
So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.
I don't see anything obviously wrong with your code. However if your
set is _constant_ then std::set may be overkill (and incur needless time
and space penalties). A possibly more efficient approach would be to
use a sorted vector and the various binary search functions of the
standard library (lower_bound, upper_bound, binary_search, etc.).

In that case, you probably want char const*[], and use
lower_bound on it. (Sort the original data in the editor, e.g.
mark the block, and pipe it through the system utility sort with
the correct options for case insensistivity.) That way, you get
static initialization and thus avoid any order of initialization
problems. (Depending on the use, it might even be simpler to
not bother sorting it, and use std::find. Unless the actual
table has hundreds of entries, you're probably not likely to
notice the difference.)
The strings in the set I have control over - its incomming data that
will be case insensitive that I need to match against the set.

I started with const char *[] but the set removes issues of including
duplicates.

Adrian
Apr 18 '07 #6
Adrian wrote:
On Apr 18, 2:14 am, James Kanze <james.ka...@gmail.comwrote:
>On Apr 17, 11:42 pm, Mark P <use...@fall2005REMOVE.fastmailCAPS.fm>
wrote:
>>Adrian wrote:
I want a const static std::set of strings which is case insensitive
for the values.
So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.
I don't see anything obviously wrong with your code. However if your
set is _constant_ then std::set may be overkill (and incur needless time
and space penalties). A possibly more efficient approach would be to
use a sorted vector and the various binary search functions of the
standard library (lower_bound, upper_bound, binary_search, etc.).
In that case, you probably want char const*[], and use
lower_bound on it. (Sort the original data in the editor, e.g.
mark the block, and pipe it through the system utility sort with
the correct options for case insensistivity.) That way, you get
static initialization and thus avoid any order of initialization
problems. (Depending on the use, it might even be simpler to
not bother sorting it, and use std::find. Unless the actual
table has hundreds of entries, you're probably not likely to
notice the difference.)

The strings in the set I have control over - its incomming data that
will be case insensitive that I need to match against the set.

I started with const char *[] but the set removes issues of including
duplicates.
Look at std::unique if that's your concern. The comparative advantage
of a set is that it handles dynamic data efficiently.

Mark
Apr 18 '07 #7
On Apr 18, 3:32 pm, Adrian <n...@bluedreamer.comwrote:
On Apr 18, 2:14 am, James Kanze <james.ka...@gmail.comwrote:
On Apr 17, 11:42 pm, Mark P <use...@fall2005REMOVE.fastmailCAPS.fm>
wrote:
Adrian wrote:
I want a const static std::set of strings which is case insensitive
for the values.
So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.
I don't see anything obviously wrong with your code. However if your
set is _constant_ then std::set may be overkill (and incur needless time
and space penalties). A possibly more efficient approach would be to
use a sorted vector and the various binary search functions of the
standard library (lower_bound, upper_bound, binary_search, etc.).
In that case, you probably want char const*[], and use
lower_bound on it. (Sort the original data in the editor, e.g.
mark the block, and pipe it through the system utility sort with
the correct options for case insensistivity.) That way, you get
static initialization and thus avoid any order of initialization
problems. (Depending on the use, it might even be simpler to
not bother sorting it, and use std::find. Unless the actual
table has hundreds of entries, you're probably not likely to
notice the difference.)
The strings in the set I have control over - its incomming data that
will be case insensitive that I need to match against the set.
In other words, you have to determine whether the incoming data
isin the set or not.
I started with const char *[] but the set removes issues of including
duplicates.
If the set is const, presumably you haven't put any duplicates
in there to begin with. I'll often do things like:

char const* table[] =
{
"string A",
"string B",
// ...
} ;

then filter the lines between (but not including) the { and the
} through "sort"; adding a -u option to sort would eliminate
duplicates, and putting a 'tr '[:upper:]' '[:lower:]' at the
head of the pipe will ensure that everything is lower case.
Then, use std::find or std::lowerbound, accordingly. (std::find
is easier, and probably sufficient if the set only has a couple
of dozen elements.)

Although likely not an issue in your code, the advantage of
doing this is that all data initialization is static, and you
avoid order of initialization issues.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 19 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

32
by: Elliot Temple | last post by:
Hi I have two questions. Could someone explain to me why Python is case sensitive? I find that annoying. Also, why aren't there multiline comments? Would adding them cause a problem of some...
1
by: ajay.sonawane | last post by:
How can I find the wheather the occurance substring is main string in wide char provided that the compasion should be case insensitive.
1
by: Ron James | last post by:
I have a Listbox containing strings. When calling Contains (), I would like a case insensitive comparison. (I don't want to add newfile.txt to a Listbox containing NewFile.txt). I'm...
3
by: kd | last post by:
Hi All, How to perform case-insensitive comparision of strings? Would there be some kind of an indicator, which when set to true, would allow case-insenitive comparision of strings using...
37
by: Vadim | last post by:
Hi! I have a STL string, which I need to convert to low case. In VC 6 I used: string sInputBuffer = _strlwr((char*)sBuffer.c_str()); However, now in MSVC.NET 2005 I am getting a warning, that...
4
by: Mark Rae | last post by:
Hi, Is it possible to create a case-insensitive List<stringcollection? E.g. List<stringMyList = new List<string>; MyList.Add("MyString"); So that:
1
by: benhoefer | last post by:
I have been searching around and have not been able to find any info on this. I have a unique situation where I need a case sensitive map: std::map<string, intimap; I need to be able to run a...
4
by: bb | last post by:
Hi, void fun(const std::map<std::string,int>& m1) { // How to make a case insensitive search of this map without making a copy? } cheers.
14
by: Mosfet | last post by:
Hi, what is the most efficient way of doing a case insensitive comparison ? I am trying to write a universal String class and I am stuck with the case insensitive part : TCHAR is a char in...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.