std::map performance

"John Harrison" <jo*************@hotmail.com> wrote in message
news:2hlijbFc8c0gU1@uni-

I would expect a hand coded binary tree to be faster than std::map provided the data is unordered. Try your hand coded binary tree on a large file that is already in alphabetical order and I think you'll see a big performance
hit (assuming your binary tree code is alphabetically ordered). In other
words you are paying a price for using std::map over a binary tree but
std::map guarantees better worst case performance.

Can you elaborate? Why is std::map slower? Has it something to do with
balancing?

Jul 22 '05 #7

"M. Hrabowski" <mh***@ira.uka.de> wrote in message
news:c9*************@news.t-

in my experience using the []-operator with maps isn't a very good thing if you want performance.
Why is this? Or do you have some code or empirical data?
I would suggest to try both a version with find() and conditional insert()
afterwards and a version with the std::pair<iterator,bool> insert()

method.

Using find followed by insert seems less efficient because in find you scan
nodes to realize the element is not there, and then in insert you scan these
nodes again to place the element there. But the STLPort implementation of
operator[] seems more efficient. It finds the closest element, and then
inserts the new element there using the closest element as a hint.

_Tp& operator[](const key_type& __k) {
iterator __i = lower_bound(__k);
// __i->first is greater than or equivalent to __k.
if (__i == end() || key_comp()(__k, (*__i).first))
__i = insert(__i, value_type(__k, _Tp()));
return (*__i).second;
}

Jul 22 '05 #8

"Jacques Labuschagne" <ja*****@clawshrimp.com> wrote in message
news:augtc.4219

jmoy wrote:

while (/*Not end of input*/)

Odds are that this is your bottleneck. How are you reading the file?

But both methods (the std::map and hand coded map) use the same file
scanning algorithm, so while the file scanning can be optimized, it is not
the issue here.

Jul 22 '05 #9

John Harrison

"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message
news:VK********************@bgtnsc04-news.ops.worldnet.att.net...

"John Harrison" <jo*************@hotmail.com> wrote in message
news:2hlijbFc8c0gU1@uni-
I would expect a hand coded binary tree to be faster than std::map

provided
the data is unordered. Try your hand coded binary tree on a large file

that
is already in alphabetical order and I think you'll see a big performance hit (assuming your binary tree code is alphabetically ordered). In other
words you are paying a price for using std::map over a binary tree but
std::map guarantees better worst case performance.

Can you elaborate? Why is std::map slower? Has it something to do with
balancing?

Balancing is what I was thinking of. Also the possibility (it wasn't clear
from the OP's post) that his binary tree code was tailored in some way to
his application.

john

Jul 22 '05 #10

Jesper Madsen

In the handcoded version, do you use std::string there too, or const char*
or char*.. You could be paying for constructing strings...

"jmoy" <na******@yahoo.co.in> wrote in message
news:8d*************************@posting.google.co m...

I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. The result is not changed very much if I replace std::string as
the key with a simple string class of my own (I guess I cannot use
char* because of the value semantics of the standard containers). I am
using GNU libstdc++3 and gcc 3.3 with the -03 (all optimizations)
flag. Is this a problem with my program, my C++ implementation, or
just the necessary overhead of using a more general class?

The structure of the program is as follows:

std::map<std::string,long> words;
main()
{
char buf[maxbuf];
while (/*Not end of input*/)
{
//Get next word into buf
insert(buf);
}
//Print out the words
}

void insert(char *s)
{
long &l=words[s];
if (l<numeric_limits<long>::max())
++l;
}

Jul 22 '05 #11

DaKoadMunky

> _Tp& operator[](const key_type& __k) {

iterator __i = lower_bound(__k);
// __i->first is greater than or equivalent to __k.
if (__i == end() || key_comp()(__k, (*__i).first))
__i = insert(__i, value_type(__k, _Tp()));
return (*__i).second;
}

This is the version on my platform...

mapped_type& operator[](const key_type& _Keyval)
{
iterator _Where = insert(value_type(_Keyval, mapped_type())).first;
return ((*_Where).second);
}

Having not seen it before I assumed that it would do a find followed by an
insert if necessary, just like the version you posted.

As can be seen in my version insert is called unconditionally which results in
numerous constructor/destructor calls for key_type, mapped_type and value_type.

If my client code replaces calls to operator[] with code that uses the public
find and insert functions the cost of updating an existing element is reduced
by a slew of constructor/destructor calls.

Can anyone offer insight as to why the authors might have chosen this
particular approach?

Jul 22 '05 #12

"DaKoadMunky" <da*********@aol.com> wrote in message
news:20***************************@mb-m18.aol.com...

_Tp& operator[](const key_type& __k) {
iterator __i = lower_bound(__k);
// __i->first is greater than or equivalent to __k.
if (__i == end() || key_comp()(__k, (*__i).first))
__i = insert(__i, value_type(__k, _Tp()));
return (*__i).second;
}
This is the version on my platform...

mapped_type& operator[](const key_type& _Keyval)
{
iterator _Where = insert(value_type(_Keyval, mapped_type())).first;
return ((*_Where).second);
}

Having not seen it before I assumed that it would do a find followed by an
insert if necessary, just like the version you posted.

As can be seen in my version insert is called unconditionally which

results in numerous constructor/destructor calls for key_type, mapped_type and value_type.
If my client code replaces calls to operator[] with code that uses the public find and insert functions the cost of updating an existing element is reduced by a slew of constructor/destructor calls.

Can anyone offer insight as to why the authors might have chosen this
particular approach?

23.3.1.2 map element access

T& operator[](const key_type& x);

Returns: (*((insert(make_pair( x, T()))).first)).second.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 22 '05 #13

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:fUstc.18088

"DaKoadMunky" <da*********@aol.com> wrote in message

_Tp& operator[](const key_type& __k) {
iterator __i = lower_bound(__k);
// __i->first is greater than or equivalent to __k.
if (__i == end() || key_comp()(__k, (*__i).first))
__i = insert(__i, value_type(__k, _Tp()));
return (*__i).second;
}
This is the version on my platform...

mapped_type& operator[](const key_type& _Keyval)
{
iterator _Where = insert(value_type(_Keyval, mapped_type())).first;
return ((*_Where).second);
}

Having not seen it before I assumed that it would do a find followed by an insert if necessary, just like the version you posted.

As can be seen in my version insert is called unconditionally which

results in
numerous constructor/destructor calls for key_type, mapped_type and

value_type.

If my client code replaces calls to operator[] with code that uses the

public
find and insert functions the cost of updating an existing element is

reduced
by a slew of constructor/destructor calls.

Can anyone offer insight as to why the authors might have chosen this
particular approach?

23.3.1.2 map element access

T& operator[](const key_type& x);

Returns: (*((insert(make_pair( x, T()))).first)).second.

Function operator[] has only to behave as if it were written this way.

Anyway, the OP said he's using gcc 3.3, which I think uses the STLPort,
which is the version at the top of this email.

Jul 22 '05 #14

"John Harrison" <jo*************@hotmail.com> wrote in message
news:2hmlfkFec4btU1@uni-

"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message

Can you elaborate? Why is std::map slower? Has it something to do with
balancing?

Balancing is what I was thinking of. Also the possibility (it wasn't clear
from the OP's post) that his binary tree code was tailored in some way to
his application.

But balancing is supposed to make it faster. In an unbalanced tree, lookup
is worst case O(N), but in a balanced it is always O(N). In any case, I
don't think the act of balancing the tree should make the algorithm twice as
slow, which is what the OP reported. The answer probably has to do with
extra calls to the copy constructor, as indicated by tom_usenet and Jesper.

Jul 22 '05 #15

John Harrison

"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message
news:A2********************@bgtnsc04-news.ops.worldnet.att.net...

"John Harrison" <jo*************@hotmail.com> wrote in message
news:2hmlfkFec4btU1@uni-
"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message
Can you elaborate? Why is std::map slower? Has it something to do with balancing?

Balancing is what I was thinking of. Also the possibility (it wasn't clear from the OP's post) that his binary tree code was tailored in some way to his application.

But balancing is supposed to make it faster. In an unbalanced tree,

lookup is worst case O(N), but in a balanced it is always O(N).
O(log N)

Balancing introduces a constant overhead, with balancing insertion and
deleteion might take twice as long as without (say). In the case of
randomised or unordered data a binary tree will be balanced without any
extra effort, therefore a binary tree could be faster than a balanced tree
by a constant factor given the right sort of data.
In any case, I
don't think the act of balancing the tree should make the algorithm twice as slow, which is what the OP reported. The answer probably has to do with
extra calls to the copy constructor, as indicated by tom_usenet and Jesper.

Yes I think you're right.

john

Jul 22 '05 #16

Roshan

use map::find() instead of map::operator[] that will most likely speed
up things as operator[] serves 2 purposes...insert and lookup.
SGI stl docs make a note that operator[] for map is really only for
convenience
and stricly speaking unnecessary!

http://www.sgi.com/tech/stl/Map.html

do let us know if you see any perf diff.

--roshan

Jul 22 '05 #17

jmoy

na******@yahoo.co.in (jmoy) wrote in message news:<8d*************************@posting.google.c om>...

I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. .... The structure of the program is as follows: .... std::map<std::string,long> words;
void insert(char *s)
{
long &l=words[s];
if (l<numeric_limits<long>::max())
++l;
}

Thanks everybody for your interest. To eliminate possible overhead
from std::string, I used char * (code below). But the gain is only
about 5%. Profiling shows that both in the earlier and latter versions
most of the time is taken by the function _Rb_tree::lower_bound which
is called by map::lower_bound (in the original case this is in turn
called by map::operator[]).

By the way my handcoded binary tree is extremely naive and I agree
that it will have a much worse worst case performance. But an average
case performance hit of 100% seems too high a cost for insurance.

Just to check whether the slowdown may be due to function call
overhead, I made my insert() an inline function. This actually led to
an increase in running time! Why might this be happening? Can it be
due to cache effects?

Going the other way round, I changed insert() in the handcoded version
to do nothing but call another function do_insert() to do the actual
processing. This however does not lead to any appreciable slowdown.

The code for the std::map version using char* is:

struct Cstring_less
{
bool operator()(const char *s, const char *t) const{
return strcmp(s,t)<0;
}
};
typedef std::map<const char *,long,Cstring_less> Map;
Map words;

const long maximum=std::numeric_limits<long>::max();

void insert(const char *s)
{
Map::iterator p=words.lower_bound(s);
if (p==words.end()||strcmp(p->first,s)!=0){//Not found
char *dup=new char[strlen(s)+1];
strcpy(dup,s);
std::pair<const char * const,long> tuple(dup,1L);
p=words.insert(p,tuple);
}else{//Found
if (p->second<maximum)
p->second++;
}
}

Jul 22 '05 #18

"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message
news:z2********************@bgtnsc04-news.ops.worldnet.att.net...

Can anyone offer insight as to why the authors might have chosen this
particular approach?

23.3.1.2 map element access

T& operator[](const key_type& x);

Returns: (*((insert(make_pair( x, T()))).first)).second.

Function operator[] has only to behave as if it were written this way.

Anyway, the OP said he's using gcc 3.3, which I think uses the STLPort,
which is the version at the top of this email.

Agreed. What part of the above can you eliminate if a particularly
thorough validation suite is checking for full conformance?

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 22 '05 #19

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:vcFtc.15555

23.3.1.2 map element access

T& operator[](const key_type& x);

Returns: (*((insert(make_pair( x, T()))).first)).second.
Function operator[] has only to behave as if it were written this way.

Agreed. What part of the above can you eliminate if a particularly
thorough validation suite is checking for full conformance?

All of it? If T::T() has side effects then the STL way eliminates this.
But just as the return value optimization can ignore side effects in the
copy constructor, so should operator[]. Other than that, I can't think of
anything else. Do you know? (I may not check newsgroups till Tuesday or
Wednesday.)

Jul 22 '05 #20

"jmoy" <na******@yahoo.co.in> wrote in message

Thanks everybody for your interest. To eliminate possible overhead
from std::string, I used char * (code below). But the gain is only
about 5%. Profiling shows that both in the earlier and latter versions
most of the time is taken by the function _Rb_tree::lower_bound which
is called by map::lower_bound (in the original case this is in turn
called by map::operator[]).
Take a look at lower_bound in tree.h (or some name like _tree.h or
stl_tree.h). It's pretty straightforward. Then compare it to your own
version and see what's different. Also, what does your profiler say about
functions that lower_bound calls, as maybe the bottleneck is here?
By the way my handcoded binary tree is extremely naive and I agree
that it will have a much worse worst case performance. But an average
case performance hit of 100% seems too high a cost for insurance.
Right.
Just to check whether the slowdown may be due to function call
overhead, I made my insert() an inline function. This actually led to
an increase in running time! Why might this be happening? Can it be
due to cache effects?
Maybe instruction page caching. But just a wild guess.
Instead of the code below, you could try tom_usenet's reading into a string
idea, which seems like it does the same thing, but looks simpler.
struct Cstring_less
{
bool operator()(const char *s, const char *t) const{
return strcmp(s,t)<0;
}
};
typedef std::map<const char *,long,Cstring_less> Map;
Map words;

const long maximum=std::numeric_limits<long>::max();

void insert(const char *s)
{
Map::iterator p=words.lower_bound(s);
if (p==words.end()||strcmp(p->first,s)!=0){//Not found
char *dup=new char[strlen(s)+1];
strcpy(dup,s);
std::pair<const char * const,long> tuple(dup,1L);
p=words.insert(p,tuple);
}else{//Found
if (p->second<maximum)
p->second++;
}
}

Jul 22 '05 #21

In article <r4******************@bgtnsc05-news.ops.worldnet.att.net>,
"Siemel Naran" <Si*********@REMOVE.att.net> wrote:

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:vcFtc.15555
> 23.3.1.2 map element access
>
> T& operator[](const key_type& x);
>
> Returns: (*((insert(make_pair( x, T()))).first)).second.

Function operator[] has only to behave as if it were written this way.

Agreed. What part of the above can you eliminate if a particularly
thorough validation suite is checking for full conformance?

All of it? If T::T() has side effects then the STL way eliminates this.
But just as the return value optimization can ignore side effects in the
copy constructor, so should operator[]. Other than that, I can't think of
anything else. Do you know? (I may not check newsgroups till Tuesday or
Wednesday.)

<nod> See:

http://www.open-std.org/jtc1/sc22/wg...fects.html#334

for the LWG's intent.

-Howard

Jul 22 '05 #22

"Howard Hinnant" <hi*****@metrowerks.com> wrote in message
news:hi***************************@syrcnyrdrs-03-ge0.nyroc.rr.com...

In article <r4******************@bgtnsc05-news.ops.worldnet.att.net>,
"Siemel Naran" <Si*********@REMOVE.att.net> wrote:
"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:vcFtc.15555
> > 23.3.1.2 map element access
> >
> > T& operator[](const key_type& x);
> >
> > Returns: (*((insert(make_pair( x, T()))).first)).second.
>
> Function operator[] has only to behave as if it were written this way. >

Agreed. What part of the above can you eliminate if a particularly
thorough validation suite is checking for full conformance?

All of it? If T::T() has side effects then the STL way eliminates this.
But just as the return value optimization can ignore side effects in the
copy constructor, so should operator[]. Other than that, I can't think of anything else. Do you know? (I may not check newsgroups till Tuesday or Wednesday.)

<nod> See:

http://www.open-std.org/jtc1/sc22/wg...fects.html#334

for the LWG's intent.

Exactly. It took a defect report to clarify the LWG's intent because
the bald statement in the C++ Standard leaves next to no wiggle room.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 22 '05 #23

jmoy

na******@yahoo.co.in (jmoy) wrote in message news:<8d*************************@posting.google.c om>...

I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. The result is not changed very much if I replace std::string as
the key with a simple string class of my own.

I ultimately figured out why this is so. std::map::find() is a general
algorithm that has to seperately check whether (key1<key2),
(key1>key2) or (key1==key2). Actually since one of these has to be
true it only needs to check the first two. Thus it calls strcmp()
twice for each node it traverses. However my handcoded version knows
about strcmp() which it calls only once and then just tests the saved
return value. Since strcmp() is about the most expensive operation
that takes place in the inner loop, this explains the factor of two
slowdown.

My measurements show that in reality strcmp() is called about 1.6
times more in the std::map version. This is because std::map's trees
are more balanced and hence fewer nodes have to be traversed on
average for each call to find().

Given this, shouldn't my implementation have provided a specialization
of std::map for the case where the key is a std::string or char*?

Jul 22 '05 #24

"jmoy" <na******@yahoo.co.in> wrote in message
news:8d**************************@posting.google.c om...

na******@yahoo.co.in (jmoy) wrote in message

news:<8d*************************@posting.google.c om>...

I am a C programmer graduating to C++. As an exercise I wrote a
program to count the number of times that different words occur in a
text file. Though a hash table might have been a better choice, I
chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees. The result is not changed very much if I replace std::string as
the key with a simple string class of my own.

I ultimately figured out why this is so. std::map::find() is a general
algorithm that has to seperately check whether (key1<key2),
(key1>key2) or (key1==key2). Actually since one of these has to be
true it only needs to check the first two. Thus it calls strcmp()
twice for each node it traverses.

Nope.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 22 '05 #25

jmoy

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:<zj*****************@nwrddc01.gnilink.net>...

"jmoy" <na******@yahoo.co.in> wrote in message
news:8d**************************@posting.google.c om...
na******@yahoo.co.in (jmoy) wrote in message

news:<8d*************************@posting.google.c om>...
I chose to use std::map. However, the program runs at less than half the
speed of an equivalent program that uses ordinary handcoded binary
trees.

I ultimately figured out why this is so. std::map::find() is a general
algorithm that has to seperately check whether (key1<key2),
(key1>key2) or (key1==key2). Actually since one of these has to be
true it only needs to check the first two. Thus it calls strcmp()
twice for each node it traverses.

Nope.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Oops, I was wrong and you are right. However my basic point remains.
The actual loop in lower_bound from my implementation (GNU libstdc++3)
is

while (__x != 0)
if (!_M_key_compare(_S_key(__x), __k))
__y = __x, __x = _S_left(__x);
else
__x = _S_right(__x);

return iterator(__y);

While you are right that the loop does not call strcmp() twice on each
node, it pays a cost for doing so by going deeper into the tree
unnecessarily even when _S_key(__x)==k (where an optimal
implementation would have returned) and calling strcmp() on the deeper
nodes. Either way the inefficiency lies in throwing away the
information contained in the three-valued (-ve,0,+ve) strcmp() by
converting it to the two-valued (true,false) predicate key_compare().

To reiterate, on my test data strcmp() is called 249860018 times by
the stl::map version and 150069159 times by the handcoded version and
I don't think that there is a bug in my counter code.

Jul 22 '05 #26

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:NOPtc.2957

"Howard Hinnant" <hi*****@metrowerks.com> wrote in message

<nod> See:

http://www.open-std.org/jtc1/sc22/wg...fects.html#334

for the LWG's intent.

Exactly. It took a defect report to clarify the LWG's intent because
the bald statement in the C++ Standard leaves next to no wiggle room.

Howard, thanks for the link.

Silly question. What is LWG?

Jul 22 '05 #27

"jmoy" <na******@yahoo.co.in> wrote in message

Oops, I was wrong and you are right. However my basic point remains.
The actual loop in lower_bound from my implementation (GNU libstdc++3)
is

while (__x != 0)
if (!_M_key_compare(_S_key(__x), __k))
__y = __x, __x = _S_left(__x);
else
__x = _S_right(__x);

return iterator(__y);

While you are right that the loop does not call strcmp() twice on each
node, it pays a cost for doing so by going deeper into the tree
unnecessarily even when _S_key(__x)==k (where an optimal
implementation would have returned) and calling strcmp() on the deeper
nodes. Either way the inefficiency lies in throwing away the
information contained in the three-valued (-ve,0,+ve) strcmp() by
converting it to the two-valued (true,false) predicate key_compare().

To reiterate, on my test data strcmp() is called 249860018 times by
the stl::map version and 150069159 times by the handcoded version and
I don't think that there is a bug in my counter code.
Interesting. One question I have is: Is std::map inherently inefficient
because the compare function returns true or false rather than an integer?
With true and false, how do we test for equality? Seems we have to call
(!less(a,b) && !less(b,a)), which means 2 calls to the less function. I'm
sure there were performance oriented people on the standards committee, and
they thought through the issue.

At the same, comment on my other favorite function: std::sort. Is the
version using a less argument returning true or false less efficient than
the version return an integer? I sure hope not!

Jmoy, to test the your premise, please consider trying the following
experiment -- though maybe better experiments exist. Make your stream of
words contain only unique words. Then the balanced binary tree should have
fewer calls to std::strcmp than your hand coded tree, and should be faster.
Now some detail. You say:
To reiterate, on my test data strcmp() is called 249860018 times by
the stl::map version and 150069159 times by the handcoded version and

I don't know how you would generate 150,069,159 words (maybe by counters
like when we count from 1 to 9 then 10 then 11 to 99 then 100 then 101 to
199 etc), and also worry that the std::map<std::string word, int count>
might cause your system to run out of memory. Maybe you could just test on
fewer words.

Jul 22 '05 #28

Jeff Flinn

"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message
news:Js********************@bgtnsc05-news.ops.worldnet.att.net...

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:NOPtc.2957
"Howard Hinnant" <hi*****@metrowerks.com> wrote in message

<nod> See:

http://www.open-std.org/jtc1/sc22/wg...fects.html#334

for the LWG's intent.

Exactly. It took a defect report to clarify the LWG's intent because
the bald statement in the C++ Standard leaves next to no wiggle room.

Howard, thanks for the link.

Silly question. What is LWG?

Library Working Group(?)

Jeff F

Jul 22 '05 #29

In article <c9**********@bluegill.adi.com>,
"Jeff Flinn" <NO****@nowhere.com> wrote:

"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message
news:Js********************@bgtnsc05-news.ops.worldnet.att.net...
"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:NOPtc.2957
"Howard Hinnant" <hi*****@metrowerks.com> wrote in message

> <nod> See:
>
> http://www.open-std.org/jtc1/sc22/wg...fects.html#334
>
> for the LWG's intent.

Exactly. It took a defect report to clarify the LWG's intent because
the bald statement in the C++ Standard leaves next to no wiggle room.

Howard, thanks for the link.

Silly question. What is LWG?

Library Working Group(?)

Right. When the committee meets, it spends most of its time divided up
into working groups. The current active working groups are Core,
Library and Evolution. Core and Library have an ongoing issues lists
posted at:

http://www.open-std.org/jtc1/sc22/wg21/

Issues have various status indicating what has (or has not) been done
with them, from New (nobody has looked at it), to TC (it is now an
official part of the standard).

-Howard

Jul 22 '05 #30

jmoy

"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message news:<_S********************@bgtnsc05-news.ops.worldnet.att.net>...

"jmoy" <na******@yahoo.co.in> wrote in message
While you are right that the loop does not call strcmp() twice on each
node, it pays a cost for doing so by going deeper into the tree
unnecessarily even when _S_key(__x)==k (where an optimal
implementation would have returned) and calling strcmp() on the deeper
nodes. Either way the inefficiency lies in throwing away the
information contained in the three-valued (-ve,0,+ve) strcmp() by
converting it to the two-valued (true,false) predicate key_compare().

To reiterate, on my test data strcmp() is called 249860018 times by
the stl::map version and 150069159 times by the handcoded version and
I don't think that there is a bug in my counter code.
Interesting. One question I have is: Is std::map inherently inefficient
because the compare function returns true or false rather than an integer?
With true and false, how do we test for equality? Seems we have to call
(!less(a,b) && !less(b,a)), which means 2 calls to the less function. I'm
sure there were performance oriented people on the standards committee, and
they thought through the issue.

As far as I understand, std::map is inherently inefficient whenever
computing (k1<k2) and (k2<k1) together is significantly cheaper than
computing them seperately. As for why it was designed this way, I have
no guesses and I would like to know what other people on this group
think.
At the same, comment on my other favorite function: std::sort. Is the
version using a less argument returning true or false less efficient than
the version return an integer? I sure hope not!
Not for the implementations of sort that I can think of, but I will
have to check.
Jmoy, to test the your premise, please consider trying the following
experiment -- though maybe better experiments exist. Make your stream of
words contain only unique words. Then the balanced binary tree should have
fewer calls to std::strcmp than your hand coded tree, and should be faster.
Agreed. But this is a very special case. And if I rewrote my handcoded
version to use balanced trees it would be as efficient as std::map in
this case and more efficient in all other cases.
Now some detail. You say:

I don't know how you would generate 150,069,159 words (maybe by counters
like when we count from 1 to 9 then 10 then 11 to 99 then 100 then 101 to
199 etc), and also worry that the std::map<std::string word, int count>
might cause your system to run out of memory. Maybe you could just test on
fewer words.

My test data--which consists of the contents of a fairly large web
site--contains about 130,000 unique words. 150,069,159 is the total
number of times strcmp() is called. The peak memory usage of the
program (code+data) is only about 7MB which is not much for my desktop
machine.

Jul 22 '05 #31

"jmoy" <na******@yahoo.co.in> wrote in message
news:8d************************@posting.google.com ...

As far as I understand, std::map is inherently inefficient whenever
computing (k1<k2) and (k2<k1) together is significantly cheaper than
computing them seperately. As for why it was designed this way, I have
no guesses and I would like to know what other people on this group
think.

You need to make both tests at most once per search, and then only
for a container that requires unique keys. So the deeper the tree,
the less extra overhead -- log(N)+1 vs. log(N) ain't no thang.

As for why it was designed that way, there's a certain minimalism
about a strict weak ordering, which is based purely on the ability
to test whether one key is less than another. A predicate that
returns a three-way result may have to perform two tests internally
to develop the ternary result. Yes, you luck out if all you have to
do is a numeric subtract, but that is far from the general case.

So comparing the number of tests you see doesn't mean much if
upwards of half are hidden in one case.

Moreover, tests are often cheap, so simply counting tests is
often a poor indicator of overall performance.

On top of everything else, if you *really* want a fast binary
tree you probably shouldn't even try to balance it. Go read
Knuth. His very first observation is that trees almost always
grow up to be reasonably balanced with no heroics at all in
the building. You want a (nearly) balanced tree in a professional
library to ensure against pathological cases -- as eagerly
supplied by the authors of validation suites --and to guarantee
worst-case behavior in the field.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 22 '05 #32

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:62lvc.24912

As far as I understand, std::map is inherently inefficient whenever
computing (k1<k2) and (k2<k1) together is significantly cheaper than
computing them seperately. As for why it was designed this way, I have
no guesses and I would like to know what other people on this group
think.
You need to make both tests at most once per search, and then only
for a container that requires unique keys. So the deeper the tree,
the less extra overhead -- log(N)+1 vs. log(N) ain't no thang.

The OP reports his hand coded tree is twice as fast. That's big!
As for why it was designed that way, there's a certain minimalism
about a strict weak ordering, which is based purely on the ability
to test whether one key is less than another. A predicate that
returns a three-way result may have to perform two tests internally
to develop the ternary result. Yes, you luck out if all you have to
do is a numeric subtract, but that is far from the general case.

So comparing the number of tests you see doesn't mean much if
upwards of half are hidden in one case.

Moreover, tests are often cheap, so simply counting tests is
often a poor indicator of overall performance.

On top of everything else, if you *really* want a fast binary
tree you probably shouldn't even try to balance it. Go read
Knuth. His very first observation is that trees almost always
grow up to be reasonably balanced with no heroics at all in
the building. You want a (nearly) balanced tree in a professional
library to ensure against pathological cases -- as eagerly
supplied by the authors of validation suites --and to guarantee
worst-case behavior in the field.

Great answer. It seems a stream of words is inherently random, so you don't
need balancing.

Jul 22 '05 #33

"Howard Hinnant" <hi*****@metrowerks.com> wrote in message news:hinnant-

Right. When the committee meets, it spends most of its time divided up
into working groups. The current active working groups are Core,
Library and Evolution. Core and Library have an ongoing issues lists
posted at:

What is the Evolution group about?

Jul 22 '05 #34

"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message
news:qT********************@bgtnsc05-news.ops.worldnet.att.net...

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:62lvc.24912
As far as I understand, std::map is inherently inefficient whenever
computing (k1<k2) and (k2<k1) together is significantly cheaper than
computing them seperately. As for why it was designed this way, I have
no guesses and I would like to know what other people on this group
think.
You need to make both tests at most once per search, and then only
for a container that requires unique keys. So the deeper the tree,
the less extra overhead -- log(N)+1 vs. log(N) ain't no thang.

The OP reports his hand coded tree is twice as fast. That's big!

Moderately. But note what I said next:

As for why it was designed that way, there's a certain minimalism
about a strict weak ordering, which is based purely on the ability
to test whether one key is less than another. A predicate that
returns a three-way result may have to perform two tests internally
to develop the ternary result. Yes, you luck out if all you have to
do is a numeric subtract, but that is far from the general case.

Here, the alternate form lucks out because the predicate derives
from strcmp, which is expensive to compute (for longish strings)
but which gives a ternary result essentially for free. Mazeltov.

If a factor of two is so important to you that you think it's
worth reimplementing a red-black tree then go ahead. My bet is
you'll spend more time debugging, and chasing mysterious runtime
crashes, than you'll ever save by avoiding that "big" factor of two.
But don't let me stop you.
On top of everything else, if you *really* want a fast binary
tree you probably shouldn't even try to balance it. Go read
Knuth. His very first observation is that trees almost always
grow up to be reasonably balanced with no heroics at all in
the building. You want a (nearly) balanced tree in a professional
library to ensure against pathological cases -- as eagerly
supplied by the authors of validation suites --and to guarantee
worst-case behavior in the field.

Great answer. It seems a stream of words is inherently random, so you

don't need balancing.

Indeed. If you're going to hand optimize, it's always best to first
try to do *less* before you try to do *more*.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 22 '05 #35

In article <sT********************@bgtnsc05-news.ops.worldnet.att.net>,
"Siemel Naran" <Si*********@REMOVE.att.net> wrote:

"Howard Hinnant" <hi*****@metrowerks.com> wrote in message news:hinnant-
Right. When the committee meets, it spends most of its time divided up
into working groups. The current active working groups are Core,
Library and Evolution. Core and Library have an ongoing issues lists
posted at:

What is the Evolution group about?

The EWG is looking at potential language changes for C++0X. You can get
a flavor of what they're looking at by looking at some of the papers
submitted to the committee at:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/

-Howard

Jul 22 '05 #36

David A. Ferguson

> Here, the alternate form lucks out because the predicate derives

from strcmp, which is expensive to compute (for longish strings)
but which gives a ternary result essentially for free. Mazeltov.

Is it really luck? Most of my comparable classes are composed
of strings and numbers which both give ternary results for free.
Even the second tier classes use the lower classes for comparison
and hence have the free ternary result.

My experience is not as broad as others but it appears that the
'for free' ternary result is more the rule than the exception.

Every design decision has trade offs and there is no free lunch,
but I have often though that ternary based STL would have been
slightly better.

Of course it is water under the bridge now and as you aptly pointed
out a factor of x2 is hardly worth reinventing the wheel.

Cheers...David

Jul 22 '05 #37

"David A. Ferguson" <Da**********************@hotmail.com> wrote in message
news:rzuvc.2369$CW.192@lakeread05...

Here, the alternate form lucks out because the predicate derives
from strcmp, which is expensive to compute (for longish strings)
but which gives a ternary result essentially for free. Mazeltov.
Is it really luck? Most of my comparable classes are composed
of strings and numbers which both give ternary results for free.
Even the second tier classes use the lower classes for comparison
and hence have the free ternary result.

My experience is not as broad as others but it appears that the
'for free' ternary result is more the rule than the exception.

Hasn't been my experience. But more to the point, ternary wins
significantly only if:

1) the tree isn't very deep

2) the tree tolerates only unique entries

3) the cost of the ternary compare is rather less than twice the
cost of a single less-than compare, and

4) the cost of compares dominates the cost of fiddling nodes
in a tree, whether to erase, insert, or find
Every design decision has trade offs and there is no free lunch,
but I have often though that ternary based STL would have been
slightly better.
Having written and rewritten the STL containers and algorithms
several times now, in two or three different dialects, I strongly
believe in the elegance of strict weak ordering. I say this
knowing full well that there are cases, such as the one in this
thread, where certain optimizations can improve performance.
Of course it is water under the bridge now and as you aptly pointed
out a factor of x2 is hardly worth reinventing the wheel.

My sentiment, obviously.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 22 '05 #38

jmoy

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:<62*******************@nwrddc01.gnilink.net>. ..

"jmoy" <na******@yahoo.co.in> wrote in message
news:8d************************@posting.google.com ...
As far as I understand, std::map is inherently inefficient whenever
computing (k1<k2) and (k2<k1) together is significantly cheaper than
computing them seperately. As for why it was designed this way, I have
no guesses and I would like to know what other people on this group
think.
You need to make both tests at most once per search, and then only
for a container that requires unique keys. So the deeper the tree,
the less extra overhead -- log(N)+1 vs. log(N) ain't no thang.

I don't think so. When you are traversing a tree with unique keys to
find a particular key, if you do both the tests you can terminate the
traversal early when you find an exact match. The cost of not doing so
will be higher for a deeper tree. This cost is zero for two corner
cases: an input consisting entirely of distinct words and an input
consisting entirely of repetitions of a single word. The worst case
will lie somewhere in between. Though I have not done a formal
analysis I think that the extra cost in the worst case will not be an
additive but a multiplicative one.
In short: not log(N)+1 but C.log(N) [my guess]
... Yes, you luck out if all you have to
do is a numeric subtract, but that is far from the general case.
I think this case is quite general. First, this is the case for all
fundamental types. As my example shows, this is also the case for
strings, and they are quite frequently used as keys. Moreover, this is
even the case when comparison for a complicated type is based on
lexicographic ordering and comparison of simple types: for example
comparing two names by comparing the strings representing the last
names and if they are equal then comparing the strings representing
the first names. The last I think is by far the most general case:
even the comparison of strings is just lexicographic ordering of
arrays of characters.
So comparing the number of tests you see doesn't mean much if
upwards of half are hidden in one case. See above.
Moreover, tests are often cheap, so simply counting tests is
often a poor indicator of overall performance.
Something being cheap is not good enough reason for using it
wastefully. Where a comparison based on ternary tests is optimal (and
I have argued above that it is likely to be generally so), using it
imposes no overheads on other parts of std::map, so why should we not
use it. More so since while an individual comparison is cheap, taken
together comparisons make up a large proportion of the running times
of find and insert which in turn are likely to be the most frequently
used operations on map.
On top of everything else, if you *really* want a fast binary
tree you probably shouldn't even try to balance it. Go read
Knuth.

My handcoded version was unbalanced (thanks in equal parts to my
laziness and Knuth's book). But that is beside the point anyway.

Jul 22 '05 #39

jmoy

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:<hW*******************@nwrddc01.gnilink.net>. ..

"David A. Ferguson" <Da**********************@hotmail.com> wrote in message
news:rzuvc.2369$CW.192@lakeread05...
Every design decision has trade offs and there is no free lunch,
but I have often though that ternary based STL would have been
slightly better. Of course it is water under the bridge now and as you aptly pointed
out a factor of x2 is hardly worth reinventing the wheel.

My sentiment, obviously.

The factor of x2 in my experiments is an overstatement, since for most
applications inserting and searching maps in a binary tree will take
only a small proportion of their time. But even then I think that the
STL map should allow users to provide a ternary comparison criteria,
perhaps with template magic to make binary comparisons the default in
order to maintain compatibility. Then we can reinvent the wheel for
situations where it matters while making the change invisible for
users who don't care.

Jul 22 '05 #40

Dave Moore

na******@yahoo.co.in (jmoy) wrote in message news:<8d**************************@posting.google. com>...

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:<zj*****************@nwrddc01.gnilink.net>...
"jmoy" <na******@yahoo.co.in> wrote in message
news:8d**************************@posting.google.c om...
na******@yahoo.co.in (jmoy) wrote in message news:<8d*************************@posting.google.c om>... > I chose to use std::map. However, the program runs at less than half the
> speed of an equivalent program that uses ordinary handcoded binary
> trees.

I ultimately figured out why this is so. std::map::find() is a general
algorithm that has to seperately check whether (key1<key2),
(key1>key2) or (key1==key2). Actually since one of these has to be
true it only needs to check the first two. Thus it calls strcmp()
twice for each node it traverses.

Nope.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Oops, I was wrong and you are right. However my basic point remains.
The actual loop in lower_bound from my implementation (GNU libstdc++3)
is

while (__x != 0)
if (!_M_key_compare(_S_key(__x), __k))
__y = __x, __x = _S_left(__x);
else
__x = _S_right(__x);

return iterator(__y);

While you are right that the loop does not call strcmp() twice on each
node, it pays a cost for doing so by going deeper into the tree
unnecessarily even when _S_key(__x)==k (where an optimal
implementation would have returned) and calling strcmp() on the deeper
nodes. Either way the inefficiency lies in throwing away the
information contained in the three-valued (-ve,0,+ve) strcmp() by
converting it to the two-valued (true,false) predicate key_compare().

Hmm .. what if you "roll your own" predicate for text key comparison
instead of using strcmp? Can you do better on performance? You can
fairly easily write one that obeys strict weak ordering to use with
the std::map implementation. Something like:

// for NTBS
bool mylexcmp(const char *a, const char *b) {
bool result=true; // single return value for strong exception safety
while (a!='\0') {
if (b=='\0') result=false;
else if (a++ > b++) result=false;
}
return result;
}

and similar for std::string ... of course there is no guarantee that
my naive version above won't be slower that strcmp, but I expect there
is a way to write it that would be faster, since the comparison
criterion is less strict.

AFAICS this is more consistent with the "you don't pay for what you
don't use" philosophy underlying C++ (and C). OTOH, it may be that
the performance gain with the strict weak version is negligible, since
you only save the explicit check for equality (and the conversion of
the return value). However, if your test case is in English (or
another language), most of the repeated values (which are the source
of the performance difference IIUC) will likely be short words (a,
the, and, of, etc.), which will have the highest performance gain
using the above technique, so it might make a significant difference
after all for this particular case.

As you can perhaps tell, I don't have much experience with these kind
of performance issues (I deal mostly with numerical code), so it is
hard for me to guess at the effect ... I was just thinking (always
dangerous 8*).

Dave Moore

Jul 22 '05 #41

In article <8d**************************@posting.google.com >,
na******@yahoo.co.in (jmoy) wrote:

You need to make both tests at most once per search, and then only
for a container that requires unique keys. So the deeper the tree,
the less extra overhead -- log(N)+1 vs. log(N) ain't no thang.

I don't think so. When you are traversing a tree with unique keys to
find a particular key, if you do both the tests you can terminate the
traversal early when you find an exact match. The cost of not doing so
will be higher for a deeper tree. This cost is zero for two corner
cases: an input consisting entirely of distinct words and an input
consisting entirely of repetitions of a single word. The worst case
will lie somewhere in between. Though I have not done a formal
analysis I think that the extra cost in the worst case will not be an
additive but a multiplicative one.
In short: not log(N)+1 but C.log(N) [my guess]

Consider a perfectly balanced (and full) binary tree with L levels. It
will contain N elements with N and L related by:

N = 2^L - 1

The typical std::map/set implementation of find will do L+1 comparison
tests when searching for each element in the tree (as previously
described).

Your algorithm can also be analyzed in this context by finding the
"average depth" of each element. That is, if each element in the tree
is searched for, the average number of comparisons can be found by
finding the average depth in the tree.

The average depth of each element in the tree can be found by summing
the depth of each element and dividing by the total number of elements:

Sum[i*2^(i-1), {i, 1, L}] / (2^L - 1)

which simplifies to:

(2^L * (L - 1) + 1) / (2^L - 1)

As L grows large (doesn't have to be that large), the above formula for
average depth becomes approximately:

L - 1

That is, on average, when searching for an element that exists in the
tree, you will find it on the second-to-deepest level.

So the comparison is L+1 vs L-1, or in the earlier notation: log(N)+1 vs
log(N)-1.

-Howard

Jul 22 '05 #42

"jmoy" <na******@yahoo.co.in> wrote in message

"P.J. Plauger" <pj*@dinkumware.com> wrote in message news:<62lvc.24912

... Yes, you luck out if all you have to
do is a numeric subtract, but that is far from the general case.

I think this case is quite general. First, this is the case for all
fundamental types. As my example shows, this is also the case for
strings, and they are quite frequently used as keys. Moreover, this is
even the case when comparison for a complicated type is based on
lexicographic ordering and comparison of simple types: for example
comparing two names by comparing the strings representing the last
names and if they are equal then comparing the strings representing
the first names. The last I think is by far the most general case:
even the comparison of strings is just lexicographic ordering of
arrays of characters.

Yes, I think std::map<std::string, Something> is quite common.

Anyway, perhaps the ternary tree is better for your purposes. Someone
posted an implementation of this to this newsgroup some months ago. You can
search on google. I have to get to work now, and can search for it later if
you can't find it.

Jul 22 '05 #43