470,572 Members | 2,601 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,572 developers. It's quick & easy.

performance of freestore management

Over the years of using C++ I've begun noticing that freestore
management functions (malloc/free) become performance bottlenecks in
complex object-oriented libraries. This is usually because these
functions acquire a mutex lock on the heap. Since the software I'm
writing is targetted for a number of embedded platforms as well as the
PC, it's somewhat difficult to use anything but the standard
implementation given with the compiler.

I've noticed the performance of even STL classes such as std::list is
bottlenecked by allocators. On the VS.net 2005 in particular,
std::allocator uses malloc even for small allocations, which apparently
has an overhead of over 30 bytes per block (in release mode), so
list<inttakes about 40 bytes per node. Ouch. Aside from all mutex
locking mentioned previously, wasting memory in this manner also
reduces performance because the cache is much less efficiently
utilized. RAM access is usually the next worst bottleneck, because the
gap between CPU power to memory bandwidth seems only to increase over
time.

Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).

I've found some remedy to these problems in boost::fast_pool_alloc and
in particular when instantiated using null_mutex. This can virtually
eliminate the abovementioned problems with STL containers and
shared_ptr, but in a multithreaded program you sometimes still can't
get rid of the lock. Looking into the future, it would seem smart to
implement multithreaded architectures for high-performance apps,
because of the developments in multi-core hardware. This makes it ever
more important to keep your program as lock-free as you can.
Ironically, it seems it only gets harder to code high performance apps
these days. I miss the times you could focus on your inner loop calcs
and lookup tables actually made things faster - running time was much
more closely related to the academic notion of complexity than it is
nowadays.

Maybe I'm just a little too concerned over these issues, I don't know

Oct 6 '06 #1
3 1736

yonil wrote:
Over the years of using C++ I've begun noticing that freestore
management functions (malloc/free) become performance bottlenecks in
complex object-oriented libraries. This is usually because these
functions acquire a mutex lock on the heap.
I assume you have profiling to back-up that statement? Also this
depends on the implementation, of course.
Since the software I'm
writing is targetted for a number of embedded platforms as well as the
PC, it's somewhat difficult to use anything but the standard
implementation given with the compiler.
Not at all. You are allowed to use your own allocator. Write your own,
find a free one or buy one. There are several out there.
>
I've noticed the performance of even STL classes such as std::list is
bottlenecked by allocators.
This comes as a surprise to me. First surprise is that you need
std::list at all - this class is rarely the right choice. The second
surprise is that the bottleneck is the allocator.
On the VS.net 2005 in particular,
std::allocator uses malloc even for small allocations, which apparently
has an overhead of over 30 bytes per block (in release mode), so
list<inttakes about 40 bytes per node. Ouch. Aside from all mutex
locking mentioned previously, wasting memory in this manner also
reduces performance because the cache is much less efficiently
utilized. RAM access is usually the next worst bottleneck, because the
gap between CPU power to memory bandwidth seems only to increase over
time.
You should go to a microsoft group to get quality response to the
questions posed above.
>
Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).
I've found some remedy to these problems in boost::fast_pool_alloc and
in particular when instantiated using null_mutex. This can virtually
eliminate the abovementioned problems with STL containers and
shared_ptr, but in a multithreaded program you sometimes still can't
get rid of the lock.
You could get rid of some locks by using lock-free stuff where
appropriate.
Apart from that, it is wisest to avoid to intensive data-sharing when
multithreading - but this is again off-topic.
Looking into the future, it would seem smart to
implement multithreaded architectures for high-performance apps,
because of the developments in multi-core hardware. This makes it ever
more important to keep your program as lock-free as you can.
Ironically, it seems it only gets harder to code high performance apps
these days. I miss the times you could focus on your inner loop calcs
and lookup tables actually made things faster - running time was much
more closely related to the academic notion of complexity than it is
nowadays.

Maybe I'm just a little too concerned over these issues, I don't know
/Peter

Oct 6 '06 #2
yonil wrote:
Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).
Why not put objects on the stack, or inside other objects?
Maybe I'm just a little too concerned over these issues, I don't know
Did you profile your application and find allocation is a bottleneck?

--
Phlip
http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
Oct 6 '06 #3
Why not put objects on the stack, or inside other objects?

I do it whenever possible, though sometimes it just can't be helped,
and I must use the heap. I'm really trying to think of how to minimize
heap usage these days whenever I'm concerned with performance.
Did you profile your application and find allocation is a bottleneck?
I've actually had trouble profiling my C++ app because it has multiple
threads, and the collector only tracks how much real time is spent
inside each function - not how much CPU time was actually wasted in
running it. If you have a thread that waits on a semaphore 99% of the
time, any function in the call stack during the wait will appear to the
profiler to be extremely time consuming, somewhat skewing the
statistics. Context switching may create arbitrary peaks in the time
apparently spent inside all sorts of unimportant functions, and the
exact behavour depends on the synchronization state of the program.
It's quite difficult for me to give exact numbers how much of the CPU
time is wasted purely because of locks, or because of
allocation-related locks. I do see alot of CPU % going to ntdll, and
that usually indicates a synchronization bottleneck.

So yea, much of what I said is speculation on my part. I think parhaps
the real problem is that I don't have a good profiling tool for
multithreaded apps :)

Oct 6 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

13 posts views Thread by bjarne | last post: by
37 posts views Thread by jortizclaver | last post: by
18 posts views Thread by Philipp | last post: by
2 posts views Thread by cjshea | last post: by
4 posts views Thread by =?Utf-8?B?V2lsc29uIEMuSy4gTmc=?= | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.