By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,702 Members | 1,101 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,702 IT Pros & Developers. It's quick & easy.

remove certain characters from a string

P: n/a
I'm writing a function to remove certain characters from strings. For
example, I often get strings with commas... they look like this:

"12,384"

I'd like to take that string, remove the comma and return "12384"

What is the most efficient, fastest way to approach this?

Thanks,

Brad
Jun 27 '08 #1
Share this Question
Share on Google+
26 Replies


P: n/a
Brad wrote:
I'm writing a function to remove certain characters from strings. For
example, I often get strings with commas... they look like this:

"12,384"

I'd like to take that string, remove the comma and return "12384"

What is the most efficient, fastest way to approach this?
Probably a character by character copy, skipping the characters you want
to remove.

--
Ian Collins.
Jun 27 '08 #2

P: n/a
Brad <br**@16systems.comkirjutas:
I'm writing a function to remove certain characters from strings. For
example, I often get strings with commas... they look like this:

"12,384"

I'd like to take that string, remove the comma and return "12384"

What is the most efficient, fastest way to approach this?

Thanks,

Brad
std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);
}

this ok?
Paavo
Jun 27 '08 #3

P: n/a
Paavo Helde wrote:
>
std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);
}

this ok?
Paavo
Thanks, works well with one character... How would I make it work with
several... like so:

#include <iostream>

std::string clean(std::string the_string)
{
char bad[] = {'.', ',', ' ', '|', '\0'};
std::string::size_type n = 0;
while ((n=the_string.find(bad, n))!=the_string.npos)
{
the_string.erase(n,1);
}
std::cout << the_string << std::endl;
return the_string;
}

int main()
{
clean("12,34.45|78 9");
return 0;
}

Jun 27 '08 #4

P: n/a
Brad schrieb:
Thanks, works well with one character... How would I make it work with
several... like so:
struct CheckForBad
{
const string chars;
CheckForBad(string const& chars)
: chars(sortUniq(chars))
{}
static string sortUniq(string tmp)
{
sort(tmp.begin(), tmp.end());
tmp.erase(
unique(tmp.begin(), tmp.end()),
tmp.end()
);
return tmp;
}
bool operator() (char const c) const
{
return binary_search(
chars.begin(), chars.end(),
c
);
}
};

string removeStuff(string text, string const& stuff)
{
text.erase(
remove_if(
text.begin(), text.end(),
CheckForBad(stuff)
),
text.end()
);
return text;
}

//or:
string removeStd(string text)
{
static const CheckForBad check(".,| ");
text.erase(
remove_if(
text.begin(), text.end(),
check
),
text.end()
);
return text;
}

not tested

Regards,
Frank
Jun 27 '08 #5

P: n/a
On May 24, 5:01 pm, Paavo Helde <nob...@ebi.eewrote:
Brad <b...@16systems.comkirjutas:
I'm writing a function to remove certain characters from strings. For
example, I often get strings with commas... they look like this:
"12,384"
I'd like to take that string, remove the comma and return "12384"
What is the most efficient, fastest way to approach this?
Thanks,
Brad

std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);

}

this ok?
Paavo
This is ugly... based on what Pavvo posted. It works. What do you guys
think? Bad?

#include <iostream>

std::string clean(std::string the_string)
{
char bad[] = {'.', ',', ' ', '|', '\0'};
std::string::size_type n = 0;

while ((n=the_string.find(bad[0], n))!=the_string.npos)
{
the_string.erase(n,1);
}

n = 0;
while ((n=the_string.find(bad[1], n))!=the_string.npos)
{
the_string.erase(n,1);
}

n = 0;
while ((n=the_string.find(bad[2], n))!=the_string.npos)
{
the_string.erase(n,1);
}

n = 0;
while ((n=the_string.find(bad[3], n))!=the_string.npos)
{
the_string.erase(n,1);
}

std::cout << the_string << std::endl;
return the_string;
}

int main()
{
clean("12,,34..56||78 9");
return 0;
}
Jun 27 '08 #6

P: n/a
Brad <br**@16systems.comkirjutas:
Paavo Helde wrote:
>>
std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);
}

this ok?
Paavo

Thanks, works well with one character... How would I make it work with
several... like so:
use 'find_first_of()' instead of 'find()'.

I see you have zero character listed as "bad". It has to be explicitly
encoded into the std::string, e.g.

....find_first_of( std::string("., |\0", 5), k);

regards
Paavo
Jun 27 '08 #7

P: n/a
Brad wrote:
Paavo Helde wrote:
>>
std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);
}

this ok?
Paavo

Thanks, works well with one character... How would I make it work with
several... like so:

#include <iostream>

std::string clean(std::string the_string)
{
char bad[] = {'.', ',', ' ', '|', '\0'};
std::string::size_type n = 0;
while ((n=the_string.find(bad, n))!=the_string.npos)
{
the_string.erase(n,1);
}
std::cout << the_string << std::endl;
return the_string;
}
If you want a simple loop:

std::string clean( const std::string& the_string )
{
static const std::string bad("., |");

std::string result;

for( std::string::const_iterator c = the_string.begin();
c != the_string.end(); ++c )
{
if( bad.find( *c ) == std::string::npos )
{
result += *c;
}
}

return result;
}

--
Ian Collins.
Jun 27 '08 #8

P: n/a
Brad <br**@16systems.comwrote:
I'm writing a function to remove certain characters from strings. For
example, I often get strings with commas... they look like this:

"12,384"

I'd like to take that string, remove the comma and return "12384"

What is the most efficient, fastest way to approach this?
The most obvious (if you want to replace the contents) is the
erase-remove idiom.

str.erase( remove( str.begin(), str.end(), ',' ), str.end() );
Thanks, works well with one character... How would I make it work with
several... like so:

#include <iostream>

std::string clean(std::string the_string)
{
char bad[] = {'.', ',', ' ', '|', '\0'};
std::string::size_type n = 0;
while ((n=the_string.find(bad, n))!=the_string.npos)
{
the_string.erase(n,1);
}
std::cout << the_string << std::endl;
return the_string;
}

int main()
{
clean("12,34.45|78 9");
return 0;
}
To make it work with several characters:

struct contains_any_of : unary_function< char, bool >
{
string str;
contains_any_of( const string& s ): str( s ) { }
bool operator()( char c ) const {
return find( str.begin(), str.end(), c ) != str.end();
}
};

str.erase( remove_if( str.begin(), str.end(),
contains_any_of( str2 ) ), str.end() );

Or to conform to your interface:

string clean( const string& str ) {
string result;
const char* const bad = "., |\0";
remove_copy_if( str.begin(), str.end(), back_inserter( result ),
contains_any_of( string( bad, bad + 5 ) ) );
return result;
}

Note: the "string( bad, bad + 5)" bit is necessary, otherwise the '\0'
will be clipped out, but maybe you didn't actually want to remove nulls
from the middle of the string?
Jun 27 '08 #9

P: n/a
Hi!

Looking at the various solutions to the original problem, I wanted to
state my design goals so one could make a resonable decision about which
code to use.

I try to keep the algorithmic complexity low. I try to reuse code, that
is I use the STL and therefore I stick to its idioms.

Regards,
Frank
Jun 27 '08 #10

P: n/a
On May 24, 7:43 pm, Frank Birbacher <bloodymir.c...@gmx.netwrote:
Hi!

Looking at the various solutions to the original problem, I wanted to
state my design goals so one could make a resonable decision about which
code to use.

I try to keep the algorithmic complexity low. I try to reuse code, that
is I use the STL and therefore I stick to its idioms.
Hmm... I see a lot of really complex and strange code here when it's
not really necessary. Most of what people posted requires multiple
passes through the string, or a lot of shifting of bytes around (e.g.
something like Paavo's "while (string contains char) remove_char" is
going to do -way- more moving of data than necessary -- it shifts the
entire end of the string back every time through the loop). Sticking
to generic STL calls for finding and removing characters in the string
gains you nothing unless you are going to be finding and removing
elements from generic containers that don't provide random access
iterators (in which case the generic programming is a benefit). The
use of remove_if, such as in Frank's example, will get you equal
performance to the example below (remove_if may very well be
implemented the same way), except Frank's sort + binary search is
likely to have more overhead then a simple linear search for your
original requirements of removing a set of 3 or 4 bad characters only
(however, for removing large character sets, a binary search will
perform better, the sort is unnecessary if the input is sorted to
begin with -- but you can do the search in -constant- time, with no
pre-sorting either, if you make some assumptions about the max value
of a char and load valid characters into a lookup table first). You
know that you are using a string (or any type with random access
iterators). Just do something like this:

In-place, single pass through string, no unnecessary copies or moves:

void remove_chars (const string &bad, string &str) {
string::iterator s, d;
for (s = str.begin(), d = s; s != str.end(); ++ s)
if (bad.find(*s) == string::npos)
*(d ++) = *s;
str.resize(d - str.begin());
}

That works because 'd' will always be behind or at the same position
as 's'. That in-place version can be made to work with generic
iterators as well as random access iterators if you replace the
resize() call with "erase(d, str.end())". Here is the same thing,
places result in destination buffer:

void remove_chars (const string &bad, const string &str, string
&clean) {
string::const_iterator s;
clean = "";
clean.reserve(str.size()); // do not perform extra realloc + copies.
for (s = str.begin(); s != str.end(); ++ s)
if (bad.find(*s) == string::npos)
clean += *s;
}

Example use:

{
string s = "a m|e|s|s|y s,t,r,i,n,g", c;
remove_chars("mesy", s, c);
remove_chars("|,", s);
cout << c << endl << s << endl;
}
Jason
Jun 27 '08 #11

P: n/a
ja************@gmail.com wrote:
>
void remove_chars (const string &bad, const string &str, string
&clean) {
string::const_iterator s;
clean = "";
clean.reserve(str.size()); // do not perform extra realloc + copies.
for (s = str.begin(); s != str.end(); ++ s)
if (bad.find(*s) == string::npos)
clean += *s;
}
Isn't that just about (excluding the reserve) identical to the solution
I posted?

--
Ian Collins.
Jun 27 '08 #12

P: n/a
Paavo Helde wrote:
>What is the most efficient, fastest way to approach this?

std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);
}
That's certainly not the most efficient way to do it.
Jun 27 '08 #13

P: n/a
Juha Nieminen <no****@thanks.invalidkirjutas:
Paavo Helde wrote:
>>What is the most efficient, fastest way to approach this?

std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);
}

That's certainly not the most efficient way to do it.
I would argue this is the fastest way to *write* the solution ;-)

OK, here latter parts of the string may be moved in memory multiple times,
which may be not so efficient for large strings. I do not like pompous
helper classes needed for std::remove_if(), so if the strings are large and
efficiency is important, I would suggest the solution posted by Jason.

Regards
Paavo
Jun 27 '08 #14

P: n/a
On May 25, 12:12 am, Ian Collins <ian-n...@hotmail.comwrote:
Isn't that just about (excluding the reserve) identical to the solution
I posted?
Yes it is identical. I was responding to the big blocks of code I saw
being posted. The OP wandered down the "while (character found) remove
character" path instead of using your suggestion right off the bat.
I'm sorry, I should have mentioned it.

Jason
Jun 27 '08 #15

P: n/a
Paavo Helde <no****@ebi.eewrote:
Juha Nieminen <no****@thanks.invalidkirjutas:
Paavo Helde wrote:
What is the most efficient, fastest way to approach this?
>
std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);
}
That's certainly not the most efficient way to do it.

I would argue this is the fastest way to *write* the solution ;-)
Why do you think the above is faster to write than:

str.erase(remove(str.begin(), str.end(), ','), str.end());

It seems to me that the above is faster to write and more efficient.
I do not like pompous helper classes needed for std::remove_if(),
so if the strings are large and efficiency is important, I would
suggest the solution posted by Jason.
Jason's solution is simply to write the remove_if algorithm out by hand.
Jun 27 '08 #16

P: n/a
Hi!

Daniel T. schrieb:
Jason's solution is simply to write the remove_if algorithm out by hand.
Which doesn't necessarily make the code more readable. But readability
contraditcs writing code fast/short. And writing code fast/short doesn't
mean the code will work. <ironic>But anyways, who actually wants code
that works?</ironic>

Well, remove_if is tested, the while-loop solution is not. I recently
was terrified by similar char-shifting code. I dislike it.

Frank
Jun 27 '08 #17

P: n/a
Frank Birbacher <bl************@gmx.netwrote:
Daniel T. schrieb:
Jason's solution is simply to write the remove_if algorithm out
by hand.

Which doesn't necessarily make the code more readable. But
readability contraditcs writing code fast/short.
That isn't necessarily true. Fast/short code can be some of the most
readable there is, if it is done right.
And writing code fast/short doesn't mean the code will work.
Neither does it mean the code will fail to work.
Well, remove_if is tested, the while-loop solution is not.
Yes, and tested code is more likely to work that untested code. :-)
Jun 27 '08 #18

P: n/a
"Daniel T." <da******@earthlink.netkirjutas:
Paavo Helde <no****@ebi.eewrote:
>Juha Nieminen <no****@thanks.invalidkirjutas:
Paavo Helde wrote:

What is the most efficient, fastest way to approach this?

std::string s("12,384,586");
std::string::size_type k = 0;
while((k=s.find(',',k))!=s.npos) {
s.erase(k, 1);
}

That's certainly not the most efficient way to do it.

I would argue this is the fastest way to *write* the solution ;-)

Why do you think the above is faster to write than:

str.erase(remove(str.begin(), str.end(), ','), str.end());
Because I have not used std::remove() much and would have to look up all
the details in manuals ;-). But you are right, using tested algorithms is
in general better to implementing the algorithm by oneself. I am just
waiting for C++0x for more handy lambdas for using e.g. remove_if() more
conveniently.

Best regards
Paavo

Jun 27 '08 #19

P: n/a
In article <3329c5e1-0b62-46da-92e7-714fa83b5b69
@m44g2000hsc.googlegroups.com>, ja************@gmail.com says...

[ ... ]
In-place, single pass through string, no unnecessary copies or moves:

void remove_chars (const string &bad, string &str) {
string::iterator s, d;
for (s = str.begin(), d = s; s != str.end(); ++ s)
if (bad.find(*s) == string::npos)
*(d ++) = *s;
str.resize(d - str.begin());
}
While this is strictly linear in terms of the manipulation of the target
string, it's still O(N*N) with respect to searches of the string holding
characters to remove.

If your list of characters to remove is very long, you might want to use
a bitmap indexed by the characters to determine whether a particular
character is allowed or not:

bool bitmap[UCHAR_MAX] = { false };

while (char *pos=bad; *pos; ++pos)
bitmap[*pos] = true;

Then instead of:

if (bad.find(*s) == string::npos)

you'd use something like:

if (!bitmap[*s])

giving constant complexity for each search. If you have a large
character set, that could waste a lot of storage though. If that's a
problem, you could sort the "bad" string, and use a binary search
instead. For a list of only 5 characters (or so) that would be a waste
of time and effort -- but if (for example) its a few hundred characters
long, the time saved could be considerable.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #20

P: n/a
Jerry Coffin <jc*****@taeus.comwrote:
ja************@gmail.com says...

[ ... ]
In-place, single pass through string, no unnecessary copies or moves:

void remove_chars (const string &bad, string &str) {
string::iterator s, d;
for (s = str.begin(), d = s; s != str.end(); ++ s)
if (bad.find(*s) == string::npos)
*(d ++) = *s;
str.resize(d - str.begin());
}

While this is strictly linear in terms of the manipulation of the target
string, it's still O(N*N) with respect to searches of the string holding
characters to remove.

If your list of characters to remove is very long, you might want to use
a bitmap indexed by the characters to determine whether a particular
character is allowed or not:

bool bitmap[UCHAR_MAX] = { false };

while (char *pos=bad; *pos; ++pos)
bitmap[*pos] = true;

Then instead of:

if (bad.find(*s) == string::npos)

you'd use something like:

if (!bitmap[*s])

giving constant complexity for each search. If you have a large
character set, that could waste a lot of storage though. If that's a
problem, you could sort the "bad" string, and use a binary search
instead. For a list of only 5 characters (or so) that would be a waste
of time and effort -- but if (for example) its a few hundred characters
long, the time saved could be considerable.
Good comment! I didn't think of that angle. The nice thing about using
the remove_copy_if algorithm and a functor (the solution I presented
yesterday) is that the time and effort to convert to a binary search is
minimal...

struct contains_any_of : unary_function< char, bool >
{
string str;
contains_any_of( const string& s ): str( s ) { }
bool operator()( char c ) const {
return find( str.begin(), str.end(), c ) != str.end();
}
};

str.erase( remove_if( str.begin(), str.end(),
contains_any_of( str2 ) ), str.end() );

Changing it simply requires a small chanage in the functor:

struct contains_any_of : unary_function< char, bool >
{
set<charstr;
contains_any_of( const string& s ): str( s.begin(), s.end() ) { }
bool operator()( char c ) const {
return str.count( c ) == 1;
}
};

You are right, much more efficient.
Jun 27 '08 #21

P: n/a
On May 25, 6:33 am, "jason.cipri...@gmail.com"
<jason.cipri...@gmail.comwrote:
On May 25, 12:12 am, Ian Collins <ian-n...@hotmail.comwrote:
Isn't that just about (excluding the reserve) identical to the solution
I posted?

Yes it is identical. I was responding to the big blocks of code I saw
being posted. The OP wandered down the "while (character found) remove
character" path instead of using your suggestion right off the bat.
I'm sorry, I should have mentioned it.

Jason
ja************@gmail.com wrote:
On May 25, 12:12 am, Ian Collins <ian-n...@hotmail.comwrote:
>Isn't that just about (excluding the reserve) identical to the solution
I posted?

Yes it is identical. I was responding to the big blocks of code I saw
being posted. The OP wandered down the "while (character found) remove
character" path instead of using your suggestion right off the bat.
I'm sorry, I should have mentioned it.

Jason
I wanted to thank everyone for this thread. Very good information. Are
there any C++ books that go into these sorts of considerations...
specifically performance when considering how to best optimize code or
does it just come with experience?

Thanks again,

Brad
Jun 27 '08 #22

P: n/a
Now we're going in circles :-)

Ian Collins wrote:
Probably a character by character copy, skipping the characters you
want to remove.
Frank Birbacher wrote:
>...
sort(tmp.begin(), tmp.end());
tmp.erase(
unique(tmp.begin(), tmp.end()),
tmp.end()
);
...
return binary_search(
Jason wrote:
Frank's sort + binary search is
likely to have more overhead then a simple linear search for your
original requirements of removing a set of 3 or 4 bad characters only
(however, for removing large character sets, a binary search will
perform better, the sort is unnecessary if the input is sorted to
begin with -- but you can do the search in -constant- time, with no
pre-sorting either, if you make some assumptions about the max value
of a char and load valid characters into a lookup table first)
(well; no assumptions, there's UCHAR_MAX, oops)

Jerry Coffin wrote:
If your list of characters to remove is very long, you might want to use
a bitmap indexed by the characters to determine whether a particular
character is allowed or not:
...
If that's a
problem, you could sort the "bad" string, and use a binary search
instead. For a list of only 5 characters (or so) that would be a waste
of time and effort -- but if (for example) its a few hundred characters
long, the time saved could be considerable.
Jun 27 '08 #23

P: n/a
Hi!

by*******@gmail.com schrieb:
I wanted to thank everyone for this thread. Very good information. Are
there any C++ books that go into these sorts of considerations...
specifically performance when considering how to best optimize code or
does it just come with experience?
Two rules of program optimization:
http://wordaligned.org/articles/the-...m-optimisation

And:
http://en.wikipedia.org/wiki/Optimiz...en_to_optimize

Frank
Jun 27 '08 #24

P: n/a
"Daniel T." <da******@earthlink.netwrites:
struct contains_any_of : unary_function< char, bool >
{
set<charstr;
contains_any_of( const string& s ): str( s.begin(), s.end() ) { }
bool operator()( char c ) const {
return str.count( c ) == 1;
Maybe better `return str.find(c) != str.end();' ? That seems more
natural
to me.
}
};
--
William

http://williamxu.net9.org

I hate small towns because once you've seen the cannon in the park
there's nothing else to do.
-- Lenny Bruce

Jun 27 '08 #25

P: n/a
Daniel T. wrote:
Jerry Coffin <jc*****@taeus.comwrote:
>ja************@gmail.com says...

[ ... ]
In-place, single pass through string, no unnecessary copies or moves:

void remove_chars (const string &bad, string &str) {
string::iterator s, d;
for (s = str.begin(), d = s; s != str.end(); ++ s)
if (bad.find(*s) == string::npos)
*(d ++) = *s;
str.resize(d - str.begin());
}

While this is strictly linear in terms of the manipulation of the target
string, it's still O(N*N) with respect to searches of the string holding
characters to remove.

If your list of characters to remove is very long, you might want to use
a bitmap indexed by the characters to determine whether a particular
character is allowed or not:

bool bitmap[UCHAR_MAX] = { false };

while (char *pos=bad; *pos; ++pos)
bitmap[*pos] = true;

Then instead of:

if (bad.find(*s) == string::npos)

you'd use something like:

if (!bitmap[*s])

giving constant complexity for each search. If you have a large
character set, that could waste a lot of storage though. If that's a
problem, you could sort the "bad" string, and use a binary search
instead. For a list of only 5 characters (or so) that would be a waste
of time and effort -- but if (for example) its a few hundred characters
long, the time saved could be considerable.

Good comment! I didn't think of that angle. The nice thing about using
the remove_copy_if algorithm and a functor (the solution I presented
yesterday) is that the time and effort to convert to a binary search is
minimal...

struct contains_any_of : unary_function< char, bool >
{
string str;
contains_any_of( const string& s ): str( s ) { }
bool operator()( char c ) const {
return find( str.begin(), str.end(), c ) != str.end();
}
};

str.erase( remove_if( str.begin(), str.end(),
contains_any_of( str2 ) ), str.end() );

Changing it simply requires a small chanage in the functor:

struct contains_any_of : unary_function< char, bool >
{
set<charstr;
contains_any_of( const string& s ): str( s.begin(), s.end() ) { }
bool operator()( char c ) const {
return str.count( c ) == 1;
}
};

You are right, much more efficient.
Hm, this might well be a case where logarithmic is slower than linear for
all practically relevant cases. Due to its tree-like implementation,
std::set is not very local in memory and can cause cache misses. For a
logarithmic solution, I would instead consider something based on
std::vector, e.g., like so:
struct matches_any_of : std::unary_function< char, bool {

std::vector<charthe_str;

template < typename Iterator >
void assign ( Iterator from, Iterator to ) {
the_str.assign( from, to );
std::sort( the_str.begin(), the_str.end() );
the_str.erase( std::unique( the_str.begin(), the_str.end() ),
the_str.end() );
}

template < typename Iterator >
matches_any_of ( Iterator from, Iterator to ) {
assign( from, to );
}

matches_any_of ( char const * str ) {
assign( str, str+std::strlen( str ) );
}

matches_any_of ( std::string const & str ) {
assign( str.begin(), str.end() );
}

bool operator() ( char chr ) const {
return( std::binary_search
( the_str.begin(), the_str.end(), chr ) );
}

};
Of course, one should measure before betting on any of the solutions to be
the most efficient.
Best

Kai-Uwe Bux
Jun 27 '08 #26

P: n/a
In article <48***********************@read.cnntp.org>,
jk********@gmx.net says...
Daniel T. wrote:
Jerry Coffin <jc*****@taeus.comwrote:
[ ... ]
giving constant complexity for each search. If you have a large
character set, that could waste a lot of storage though. If that's a
problem, you could sort the "bad" string, and use a binary search
instead. For a list of only 5 characters (or so) that would be a waste
of time and effort -- but if (for example) its a few hundred characters
long, the time saved could be considerable.
[ ... ]
You are right, much more efficient.

Hm, this might well be a case where logarithmic is slower than linear for
all practically relevant cases. Due to its tree-like implementation,
std::set is not very local in memory and can cause cache misses. For a
logarithmic solution, I would instead consider something based on
std::vector, e.g., like so:
Yes, this is much closer to what I suggested -- I had suggested sorting
the string (i.e. sorting the characters in the string) but there's
little practical difference between sorting characters in a string and
sorting them in a vector instead. In any case, I agree that a set is
rarely (if ever) likely to be what you really want.

A set works well if you're going to modify the contents on a regular
basis, as it allows logarithmic inserts and deletes. When/if the
collection remains (nearly) static after creation, you're generally
better off using something with contiguous storage (string, vector,
deque,...).

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #27

This discussion thread is closed

Replies have been disabled for this discussion.