How to deal with a platform-dependent issue.

Mark P

I wrote some code which makes heavy use of std::map. In fact, for a
typical instance, I may have on the order of 100K maps, each with only a
small number of elements.

(In case you're curious, this is for a computational geometry
application-- each map represents an edge of a polygon and its elements
are intermediate markers located along the edge, mapping relative
position on the edge to some auxiliary data.)

This worked just fine when compiled with gcc on Linux, but after
recently running on Sun with the CC compiler (using the RogueWave STL
implementation, I believe) I found horrible memory performance. Memory
usage jumped by a factor of 10 or more which is unacceptable for my
application.

After spending the better part of a day trying to figure out the
problem, it seems that the RogueWave map performs bulk allocation so
that as soon as one item is inserted into the map, it allocates space
for 32. (I found this by writing an allocator to forward memory ops to
malloc/free and log each such call.) This is much more than the space
needed for a typical case (where I might have say 3 or 4 entries only),
hence the tremendous increase in memory usage compared to the gcc
version (which allocates space one entry at a time).

OK, so I realize this is fundamentally a platform issue and not entirely
on-topic, but I'd welcome any advice. For example, is there a language
feature I don't know about that could circumvent this allocation
strategy? Is there another approach that achieves the functionality of
a map without making explicit use thereof? Any other recommendations?

Thanks for your help,
Mark

Jul 23 '05 #1

Subscribe Post Reply

1850

Michael Olea

Mark P wrote:

I wrote some code which makes heavy use of std::map. In fact, for a
typical instance, I may have on the order of 100K maps, each with only a
small number of elements.

(In case you're curious, this is for a computational geometry
application-- each map represents an edge of a polygon and its elements
are intermediate markers located along the edge, mapping relative
position on the edge to some auxiliary data.)

[...]

I don't know if this will help or not, but you might want to take a look at:

http://www.boost.org/libs/property_m...perty_map.html

Jul 23 '05 #2

Alf P. Steinbach

* Mark P:

I wrote some code which makes heavy use of std::map. In fact, for a
typical instance, I may have on the order of 100K maps, each with only a
small number of elements.

(In case you're curious, this is for a computational geometry
application-- each map represents an edge of a polygon and its elements
are intermediate markers located along the edge, mapping relative
position on the edge to some auxiliary data.)
I don't see how that would work, but, accepting that it does work:

This worked just fine when compiled with gcc on Linux, but after
recently running on Sun with the CC compiler (using the RogueWave STL
implementation, I believe) I found horrible memory performance. Memory
usage jumped by a factor of 10 or more which is unacceptable for my
application.

After spending the better part of a day trying to figure out the
problem, it seems that the RogueWave map performs bulk allocation so
that as soon as one item is inserted into the map, it allocates space
for 32. (I found this by writing an allocator to forward memory ops to
malloc/free and log each such call.) This is much more than the space
needed for a typical case (where I might have say 3 or 4 entries only),
hence the tremendous increase in memory usage compared to the gcc
version (which allocates space one entry at a time).
Just out of curiosity, by 'allocator' do you mean replacing global 'new'
or did you write an allocator to pass as template argument to 'map'?

If the latter then the bulk allocation is part of the 'map'
implementation (e.g. a B-tree like thing).

And that would preclude a solution based on supplying your own allocator
template argument.

OK, so I realize this is fundamentally a platform issue and not entirely
on-topic, but I'd welcome any advice. For example, is there a language
feature I don't know about that could circumvent this allocation
strategy? Is there another approach that achieves the functionality of
a map without making explicit use thereof? Any other recommendations?

It seems you need a map-like class optimized for 3 to 4 elements, but
also able to handle larger sizes.

I'd try out an implementation based on std::vector first: linear search
is almost always the fastestest ( :-) ) for a very small number of
elements; wrap it in a class providing the 'map' operations you use, and
set the capacity to e.g. 2 or 4 on construction; measure what's best.

If the number of maps with larger number of elements is sufficiently
large that the simple approach isn't efficient enough then perhaps
consider more complicated things, like, you already have an example of
an implementation that is efficient enough, and another approach might
be to dynamically switch internal implementation of your wrappers
(what's the pattern name for that, something like handle/envelope?),
although that would introduce some overhead by itself.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Jul 23 '05 #3

Mark P

Alf P. Steinbach wrote:

* Mark P:
I wrote some code which makes heavy use of std::map. In fact, for a
typical instance, I may have on the order of 100K maps, each with only a
small number of elements.

(In case you're curious, this is for a computational geometry
application-- each map represents an edge of a polygon and its elements
are intermediate markers located along the edge, mapping relative
position on the edge to some auxiliary data.)

I don't see how that would work, but, accepting that it does work:

Well, I haven't told you what the application is :) It actually works
quite nicely for its intended purpose (memory issues below notwithstanding).
This worked just fine when compiled with gcc on Linux, but after
recently running on Sun with the CC compiler (using the RogueWave STL
implementation, I believe) I found horrible memory performance. Memory
usage jumped by a factor of 10 or more which is unacceptable for my
application.

After spending the better part of a day trying to figure out the
problem, it seems that the RogueWave map performs bulk allocation so
that as soon as one item is inserted into the map, it allocates space
for 32. (I found this by writing an allocator to forward memory ops to
malloc/free and log each such call.) This is much more than the space
needed for a typical case (where I might have say 3 or 4 entries only),
hence the tremendous increase in memory usage compared to the gcc
version (which allocates space one entry at a time).

Just out of curiosity, by 'allocator' do you mean replacing global 'new'
or did you write an allocator to pass as template argument to 'map'?

The latter.
If the latter then the bulk allocation is part of the 'map'
implementation (e.g. a B-tree like thing).
Understood. I believe it's a red-black tree. STL implementations are a
bit inscrutable in my experience, but somewhere buried in the map
implementation is a parameter __buffer_size which at first glance seems
to be responsible for the bulk allocation.

And that would preclude a solution based on supplying your own allocator
template argument.
Also understood.
OK, so I realize this is fundamentally a platform issue and not entirely
on-topic, but I'd welcome any advice. For example, is there a language
feature I don't know about that could circumvent this allocation
strategy? Is there another approach that achieves the functionality of
a map without making explicit use thereof? Any other recommendations?

It seems you need a map-like class optimized for 3 to 4 elements, but
also able to handle larger sizes.

Yep, that's be great. Got one?

I'd try out an implementation based on std::vector first: linear search
is almost always the fastestest ( :-) ) for a very small number of
elements; wrap it in a class providing the 'map' operations you use, and
set the capacity to e.g. 2 or 4 on construction; measure what's best.

That's certainly an idea to consider. The problem is that this
structure is dynamically updated, requiring insertions in the middle
(w.r.t. the sort), range operations, and so on. Still it's worth
considering so thanks for the suggestion.
If the number of maps with larger number of elements is sufficiently
large that the simple approach isn't efficient enough then perhaps
consider more complicated things, like, you already have an example of
an implementation that is efficient enough, and another approach might
be to dynamically switch internal implementation of your wrappers
(what's the pattern name for that, something like handle/envelope?),
although that would introduce some overhead by itself.

I've come up with a couple other ideas too:

1. Right now the map holds objects which are about 12 words each. I
could store pointers instead so that the extra unused nodes are smaller.
Unfortunately there's still the size of the node separate from its
data which adds a few words at least, and managing raw pointers can be a
pain.

2. Rather than have one map per edge I could have, say, one map per
polygon and use a more complicated sort to keep all the intermediate
points straight. This also seems like a pain, it breaks the edge
abstraction, and there's an efficiency penalty from searching in a
larger structure.

I think I'd be satisfied, at least provisionally, with a structure that
supports linear time searches and constant time inserts. This suggests
a list except that the list implementation also allocates in blocks of
32 after the first element. On the plus side there's less overhead
associated with the nodes of a list compared to those of a map.

All of which makes wonder, as I pause to wax philosophical, why the
S(tandard)TL isn't more standardized.

Jul 23 '05 #4

Ian

Mark P wrote:

This worked just fine when compiled with gcc on Linux, but after
recently running on Sun with the CC compiler (using the RogueWave STL
implementation, I believe) I found horrible memory performance. Memory
usage jumped by a factor of 10 or more which is unacceptable for my
application.

A bit OT, but have you tried the stlport supplied with CC?

Ian

Jul 23 '05 #5

Chris Newton

Mark P wrote:

Alf P. Steinbach wrote:
I'd try out an implementation based on std::vector first: linear search
is almost always the fastestest ( :-) ) for a very small number of
elements; wrap it in a class providing the 'map' operations you use, and
set the capacity to e.g. 2 or 4 on construction; measure what's best.

That's certainly an idea to consider. The problem is that this
structure is dynamically updated, requiring insertions in the middle
(w.r.t. the sort), range operations, and so on. Still it's worth
considering so thanks for the suggestion.

If the insertions are expensive (I'm not clear on how big the items
you're storing are) then maybe some sort of list structure would be
appropriate?

HTH,
Chris

Jul 23 '05 #6

Alf P. Steinbach

* Chris Newton:

Mark P wrote:
Alf P. Steinbach wrote:
I'd try out an implementation based on std::vector first: linear search
is almost always the fastestest ( :-) ) for a very small number of
elements; wrap it in a class providing the 'map' operations you use, and
set the capacity to e.g. 2 or 4 on construction; measure what's best.

That's certainly an idea to consider. The problem is that this
structure is dynamically updated, requiring insertions in the middle
(w.r.t. the sort), range operations, and so on. Still it's worth
considering so thanks for the suggestion.

If the insertions are expensive (I'm not clear on how big the items
you're storing are) then maybe some sort of list structure would be
appropriate?

Actually the vector wins handily over a list in that respect, because
per-element dynamic allocations are avoided (a dynamic allocation is
very expensive compared to moving ten or twenty or forty words/bytes).

Anyway, with just a little added support a vector can do list'ing too,
providing at the same time (1) constant time random read/write access,
(2) constant time insert/remove at an iterator position, where (3) the
iterator can be moved backward and forward one element in constant time.
The cost is that moving the cursor involves copying an element, and that
there can be only one insert/remove iterator at a time, and the
gain is the constant time random read/write access, as well as avoidance
of dynamic allocations and deallocations (or alternatively, avoidance of
having to use a free-list) by using a reasonable buffer size.

This little data structure is called a cursor gap array, and the idea is
simply to collect all the free space in the vector at the cursor (aka
iterator) position, which means the iterator is moved one step by
copying an element from one side of the cursor gap to the other side.

But hey, I'm off-topic, and besides, measuring the actual performance
would be appropriate before even considering adding such support.

This should probably be in [comp.programming]... ;-)

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Jul 23 '05 #7

Chris Newton

Alf P. Steinbach wrote:

Actually the vector wins handily over a list in that respect, because
per-element dynamic allocations are avoided (a dynamic allocation is
very expensive compared to moving ten or twenty or forty words/bytes).

In isolation yes, but since the OP is apparently happy to play with
allocators I don't see why that would necessarily be a problem...

Cheers,
Chris

Jul 23 '05 #8

Mark P

Ian wrote:

Mark P wrote:

This worked just fine when compiled with gcc on Linux, but after
recently running on Sun with the CC compiler (using the RogueWave STL
implementation, I believe) I found horrible memory performance.
Memory usage jumped by a factor of 10 or more which is unacceptable
for my application.

A bit OT, but have you tried the stlport supplied with CC?

Ian

Actually yes, though after my original post. (I'm new to CC and didn't
know there was an alternative until later browsing the CC
documentation.) stlport *does* do the desired thing, namely allocate
one tree node at a time.

The problem is that this is part of a large piece of software and I'm
not sure everyone else developing it will feel comfortable with such a
large change (i.e., using a completely different library
implementation). What I'm wondering now is whether there's some way for
me to copy the relevant files of the stlport source to some side
directory, rename map to, say, portmap, and use it like that without
interfering with the default library implementation. Is such a thing
feasible or would it be a hopeless task?

Jul 23 '05 #9

Larry I Smith

Mark P wrote:

Ian wrote:
Mark P wrote:

This worked just fine when compiled with gcc on Linux, but after
recently running on Sun with the CC compiler (using the RogueWave STL
implementation, I believe) I found horrible memory performance.
Memory usage jumped by a factor of 10 or more which is unacceptable
for my application.

A bit OT, but have you tried the stlport supplied with CC?

Ian

Actually yes, though after my original post. (I'm new to CC and didn't
know there was an alternative until later browsing the CC
documentation.) stlport *does* do the desired thing, namely allocate
one tree node at a time.

The problem is that this is part of a large piece of software and I'm
not sure everyone else developing it will feel comfortable with such a
large change (i.e., using a completely different library
implementation). What I'm wondering now is whether there's some way for
me to copy the relevant files of the stlport source to some side
directory, rename map to, say, portmap, and use it like that without
interfering with the default library implementation. Is such a thing
feasible or would it be a hopeless task?

I'm not that familiar with stlport, but in many stl implementations
the files are highly co-dependent. 'map' for example might
include other headers, which might include other headers, etc;
any of these could define/use objects specific to the implementation.
So, taking a few files from one implementation and trying to use
them with another would probably be very problematic.

Larry

Jul 23 '05 #10

Ian

Mark P wrote:

Ian wrote:
Mark P wrote:

This worked just fine when compiled with gcc on Linux, but after
recently running on Sun with the CC compiler (using the RogueWave STL
implementation, I believe) I found horrible memory performance.
Memory usage jumped by a factor of 10 or more which is unacceptable
for my application.

A bit OT, but have you tried the stlport supplied with CC?

Ian

Actually yes, though after my original post. (I'm new to CC and didn't
know there was an alternative until later browsing the CC
documentation.) stlport *does* do the desired thing, namely allocate
one tree node at a time.

The problem is that this is part of a large piece of software and I'm
not sure everyone else developing it will feel comfortable with such a
large change (i.e., using a completely different library
implementation). What I'm wondering now is whether there's some way for
me to copy the relevant files of the stlport source to some side
directory, rename map to, say, portmap, and use it like that without
interfering with the default library implementation. Is such a thing
feasible or would it be a hopeless task?

Hopeless.

I thought from your original post this was a fresh port to Solaris. The
transition is fairly painless, I've just done it with a large application.

Ian

Jul 23 '05 #11

Mark P

Chris Newton wrote:

Mark P wrote:
Alf P. Steinbach wrote:
I'd try out an implementation based on std::vector first: linear search
is almost always the fastestest ( :-) ) for a very small number of
elements; wrap it in a class providing the 'map' operations you use, and
set the capacity to e.g. 2 or 4 on construction; measure what's best.

That's certainly an idea to consider. The problem is that this
structure is dynamically updated, requiring insertions in the middle
(w.r.t. the sort), range operations, and so on. Still it's worth
considering so thanks for the suggestion.

If the insertions are expensive (I'm not clear on how big the items
you're storing are) then maybe some sort of list structure would be
appropriate?

HTH,
Chris

Sadly the implementation also allocates list nodes in blocks of 32 after
the first insertion. So inserting 1 item is memory cheap, but if you
insert 2 you may as well insert 33, since 2-33 are allocated together.
On the plus side, the memory overhead per list node is less than that
per map (i.e., tree) node.

I broached this in a previous post but no one commented on it-- are
there any efforts to further standardize STL containers with respect to
these sorts of implementation details?

Jul 23 '05 #12

James Dennett

Mark P wrote:

Chris Newton wrote:
Mark P wrote:
Alf P. Steinbach wrote:

I'd try out an implementation based on std::vector first: linear search
is almost always the fastestest ( :-) ) for a very small number of
elements; wrap it in a class providing the 'map' operations you use,
and
set the capacity to e.g. 2 or 4 on construction; measure what's best.

That's certainly an idea to consider. The problem is that this
structure is dynamically updated, requiring insertions in the middle
(w.r.t. the sort), range operations, and so on. Still it's worth
considering so thanks for the suggestion.

If the insertions are expensive (I'm not clear on how big the items
you're storing are) then maybe some sort of list structure would be
appropriate?

HTH,
Chris

Sadly the implementation also allocates list nodes in blocks of 32 after
the first insertion. So inserting 1 item is memory cheap, but if you
insert 2 you may as well insert 33, since 2-33 are allocated together.
On the plus side, the memory overhead per list node is less than that
per map (i.e., tree) node.

I broached this in a previous post but no one commented on it-- are
there any efforts to further standardize STL containers with respect to
these sorts of implementation details?

Amounts of memory used are not likely to be standardized; they
are viewed as quality of implementation (QoI) issues. The ideal
trade-offs vary depending on the environment -- embedded systems
would tend often to favour space over speed, whereas for larger
systems it's often (though not always) beneficial to use more
space to improve performance.

Certain QoI issues tend to be common enough that they become
quasi-standard; an empty vector that used dynamic allocation
would be considered pretty poor, for example.

I don't expect to see support from standards group for mandating
things like amount of memory used or constants of performance.

-- James

Jul 23 '05 #13

How to deal with a platform-dependent issue.

Similar topics