473,405 Members | 2,154 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

performance of freestore management

Over the years of using C++ I've begun noticing that freestore
management functions (malloc/free) become performance bottlenecks in
complex object-oriented libraries. This is usually because these
functions acquire a mutex lock on the heap. Since the software I'm
writing is targetted for a number of embedded platforms as well as the
PC, it's somewhat difficult to use anything but the standard
implementation given with the compiler.

I've noticed the performance of even STL classes such as std::list is
bottlenecked by allocators. On the VS.net 2005 in particular,
std::allocator uses malloc even for small allocations, which apparently
has an overhead of over 30 bytes per block (in release mode), so
list<inttakes about 40 bytes per node. Ouch. Aside from all mutex
locking mentioned previously, wasting memory in this manner also
reduces performance because the cache is much less efficiently
utilized. RAM access is usually the next worst bottleneck, because the
gap between CPU power to memory bandwidth seems only to increase over
time.

Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).

I've found some remedy to these problems in boost::fast_pool_alloc and
in particular when instantiated using null_mutex. This can virtually
eliminate the abovementioned problems with STL containers and
shared_ptr, but in a multithreaded program you sometimes still can't
get rid of the lock. Looking into the future, it would seem smart to
implement multithreaded architectures for high-performance apps,
because of the developments in multi-core hardware. This makes it ever
more important to keep your program as lock-free as you can.
Ironically, it seems it only gets harder to code high performance apps
these days. I miss the times you could focus on your inner loop calcs
and lookup tables actually made things faster - running time was much
more closely related to the academic notion of complexity than it is
nowadays.

Maybe I'm just a little too concerned over these issues, I don't know

Oct 6 '06 #1
3 1894

yonil wrote:
Over the years of using C++ I've begun noticing that freestore
management functions (malloc/free) become performance bottlenecks in
complex object-oriented libraries. This is usually because these
functions acquire a mutex lock on the heap.
I assume you have profiling to back-up that statement? Also this
depends on the implementation, of course.
Since the software I'm
writing is targetted for a number of embedded platforms as well as the
PC, it's somewhat difficult to use anything but the standard
implementation given with the compiler.
Not at all. You are allowed to use your own allocator. Write your own,
find a free one or buy one. There are several out there.
>
I've noticed the performance of even STL classes such as std::list is
bottlenecked by allocators.
This comes as a surprise to me. First surprise is that you need
std::list at all - this class is rarely the right choice. The second
surprise is that the bottleneck is the allocator.
On the VS.net 2005 in particular,
std::allocator uses malloc even for small allocations, which apparently
has an overhead of over 30 bytes per block (in release mode), so
list<inttakes about 40 bytes per node. Ouch. Aside from all mutex
locking mentioned previously, wasting memory in this manner also
reduces performance because the cache is much less efficiently
utilized. RAM access is usually the next worst bottleneck, because the
gap between CPU power to memory bandwidth seems only to increase over
time.
You should go to a microsoft group to get quality response to the
questions posed above.
>
Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).
I've found some remedy to these problems in boost::fast_pool_alloc and
in particular when instantiated using null_mutex. This can virtually
eliminate the abovementioned problems with STL containers and
shared_ptr, but in a multithreaded program you sometimes still can't
get rid of the lock.
You could get rid of some locks by using lock-free stuff where
appropriate.
Apart from that, it is wisest to avoid to intensive data-sharing when
multithreading - but this is again off-topic.
Looking into the future, it would seem smart to
implement multithreaded architectures for high-performance apps,
because of the developments in multi-core hardware. This makes it ever
more important to keep your program as lock-free as you can.
Ironically, it seems it only gets harder to code high performance apps
these days. I miss the times you could focus on your inner loop calcs
and lookup tables actually made things faster - running time was much
more closely related to the academic notion of complexity than it is
nowadays.

Maybe I'm just a little too concerned over these issues, I don't know
/Peter

Oct 6 '06 #2
yonil wrote:
Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).
Why not put objects on the stack, or inside other objects?
Maybe I'm just a little too concerned over these issues, I don't know
Did you profile your application and find allocation is a bottleneck?

--
Phlip
http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
Oct 6 '06 #3
Why not put objects on the stack, or inside other objects?

I do it whenever possible, though sometimes it just can't be helped,
and I must use the heap. I'm really trying to think of how to minimize
heap usage these days whenever I'm concerned with performance.
Did you profile your application and find allocation is a bottleneck?
I've actually had trouble profiling my C++ app because it has multiple
threads, and the collector only tracks how much real time is spent
inside each function - not how much CPU time was actually wasted in
running it. If you have a thread that waits on a semaphore 99% of the
time, any function in the call stack during the wait will appear to the
profiler to be extremely time consuming, somewhat skewing the
statistics. Context switching may create arbitrary peaks in the time
apparently spent inside all sorts of unimportant functions, and the
exact behavour depends on the synchronization state of the program.
It's quite difficult for me to give exact numbers how much of the CPU
time is wasted purely because of locks, or because of
allocation-related locks. I do see alot of CPU % going to ntdll, and
that usually indicates a synchronization bottleneck.

So yea, much of what I said is speculation on my part. I think parhaps
the real problem is that I don't have a good profiling tool for
multithreaded apps :)

Oct 6 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Bob | last post by:
Are there any known applications out there used to test the performance of the .NET garbage collector over a long period of time? Basically I need an application that creates objects, uses them, and...
133
by: Gaurav | last post by:
http://www.sys-con.com/story/print.cfm?storyid=45250 Any comments? Thanks Gaurav
4
by: wriggs | last post by:
Hi, Any suggestions on the following as I've kind of run out of ideas. I have 2 servers which are the same spec ie box, processor etc. The only difference I can tell is that the production box...
13
by: bjarne | last post by:
Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance...
37
by: jortizclaver | last post by:
Hi, I'm about to develop a new framework for my corporative applications and my first decision point is what kind of strings to use: std::string or classical C char*. Performance in my system...
18
by: Philipp | last post by:
Hello, in my main() I have code which looks like this MyClass* myObject; if ( param.isValid() ) myObject = new MyClass(otherParam, param); else myObject = new MyClass(otherParam); -----...
2
by: cjshea | last post by:
Hello, I'm posting this casually, with hope that someone can provide a bit of guidance. DB2 has many monitors, health, performance, etc. My management has asked me to provide a report on a...
4
by: =?Utf-8?B?V2lsc29uIEMuSy4gTmc=?= | last post by:
Hi Experts, I am doing a prototype of providing data access (read, write & search) through Web Service. We observed that the data storing in SQL Server 2005, the memory size is always within...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.