performance of freestore management

yonil

Over the years of using C++ I've begun noticing that freestore
management functions (malloc/free) become performance bottlenecks in
complex object-oriented libraries. This is usually because these
functions acquire a mutex lock on the heap. Since the software I'm
writing is targetted for a number of embedded platforms as well as the
PC, it's somewhat difficult to use anything but the standard
implementation given with the compiler.

I've noticed the performance of even STL classes such as std::list is
bottlenecked by allocators. On the VS.net 2005 in particular,
std::allocator uses malloc even for small allocations, which apparently
has an overhead of over 30 bytes per block (in release mode), so
list<inttakes about 40 bytes per node. Ouch. Aside from all mutex
locking mentioned previously, wasting memory in this manner also
reduces performance because the cache is much less efficiently
utilized. RAM access is usually the next worst bottleneck, because the
gap between CPU power to memory bandwidth seems only to increase over
time.

Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).

I've found some remedy to these problems in boost::fast_pool_alloc and
in particular when instantiated using null_mutex. This can virtually
eliminate the abovementioned problems with STL containers and
shared_ptr, but in a multithreaded program you sometimes still can't
get rid of the lock. Looking into the future, it would seem smart to
implement multithreaded architectures for high-performance apps,
because of the developments in multi-core hardware. This makes it ever
more important to keep your program as lock-free as you can.
Ironically, it seems it only gets harder to code high performance apps
these days. I miss the times you could focus on your inner loop calcs
and lookup tables actually made things faster - running time was much
more closely related to the academic notion of complexity than it is
nowadays.

Maybe I'm just a little too concerned over these issues, I don't know

Oct 6 '06 #1

Subscribe Post Reply

1894

peter koch

yonil wrote:

Over the years of using C++ I've begun noticing that freestore
management functions (malloc/free) become performance bottlenecks in
complex object-oriented libraries. This is usually because these
functions acquire a mutex lock on the heap.

I assume you have profiling to back-up that statement? Also this
depends on the implementation, of course.

Since the software I'm
writing is targetted for a number of embedded platforms as well as the
PC, it's somewhat difficult to use anything but the standard
implementation given with the compiler.

Not at all. You are allowed to use your own allocator. Write your own,
find a free one or buy one. There are several out there.

>
I've noticed the performance of even STL classes such as std::list is
bottlenecked by allocators.

This comes as a surprise to me. First surprise is that you need
std::list at all - this class is rarely the right choice. The second
surprise is that the bottleneck is the allocator.

On the VS.net 2005 in particular,
std::allocator uses malloc even for small allocations, which apparently
has an overhead of over 30 bytes per block (in release mode), so
list<inttakes about 40 bytes per node. Ouch. Aside from all mutex
locking mentioned previously, wasting memory in this manner also
reduces performance because the cache is much less efficiently
utilized. RAM access is usually the next worst bottleneck, because the
gap between CPU power to memory bandwidth seems only to increase over
time.

You should go to a microsoft group to get quality response to the
questions posed above.

>
Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).
I've found some remedy to these problems in boost::fast_pool_alloc and
in particular when instantiated using null_mutex. This can virtually
eliminate the abovementioned problems with STL containers and
shared_ptr, but in a multithreaded program you sometimes still can't
get rid of the lock.

You could get rid of some locks by using lock-free stuff where
appropriate.
Apart from that, it is wisest to avoid to intensive data-sharing when
multithreading - but this is again off-topic.

Looking into the future, it would seem smart to
implement multithreaded architectures for high-performance apps,
because of the developments in multi-core hardware. This makes it ever
more important to keep your program as lock-free as you can.
Ironically, it seems it only gets harder to code high performance apps
these days. I miss the times you could focus on your inner loop calcs
and lookup tables actually made things faster - running time was much
more closely related to the academic notion of complexity than it is
nowadays.

Maybe I'm just a little too concerned over these issues, I don't know

/Peter

Oct 6 '06 #2

Phlip

yonil wrote:

Since I'm actively attempting to implement useful design techniques
such as RAII, my code is increasingly rife with allocation of resource
management objects. shared_ptr's also double the amount of freestore
allocations (since they also allocate a reference counter).

Why not put objects on the stack, or inside other objects?

Maybe I'm just a little too concerned over these issues, I don't know

Did you profile your application and find allocation is a bottleneck?

--
Phlip
http://www.greencheese.us/ZeekLand <-- NOT a blog!!!

Oct 6 '06 #3

yonil

Why not put objects on the stack, or inside other objects?

I do it whenever possible, though sometimes it just can't be helped,
and I must use the heap. I'm really trying to think of how to minimize
heap usage these days whenever I'm concerned with performance.

Did you profile your application and find allocation is a bottleneck?

I've actually had trouble profiling my C++ app because it has multiple
threads, and the collector only tracks how much real time is spent
inside each function - not how much CPU time was actually wasted in
running it. If you have a thread that waits on a semaphore 99% of the
time, any function in the call stack during the wait will appear to the
profiler to be extremely time consuming, somewhat skewing the
statistics. Context switching may create arbitrary peaks in the time
apparently spent inside all sorts of unimportant functions, and the
exact behavour depends on the synchronization state of the program.
It's quite difficult for me to give exact numbers how much of the CPU
time is wasted purely because of locks, or because of
allocation-related locks. I do see alot of CPU % going to ntdll, and
that usually indicates a synchronization bottleneck.

So yea, much of what I said is speculation on my part. I think parhaps
the real problem is that I don't have a good profiling tool for
multithreaded apps :)

Oct 6 '06 #4

Similar topics

Garbage collection performance test app

by: Bob | last post by:

Are there any known applications out there used to test the performance of the .NET garbage collector over a long period of time? Basically I need an application that creates objects, uses them, and...

.NET Framework

133

Java's performance far better that optimized C++

by: Gaurav | last post by:

http://www.sys-con.com/story/print.cfm?storyid=45250 Any comments? Thanks Gaurav

C / C++

Insert performance different between two servers

by: wriggs | last post by:

Hi, Any suggestions on the following as I've kind of run out of ideas. I have 2 servers which are the same spec ie box, processor etc. The only difference I can tell is that the production box...

Microsoft SQL Server

performance

by: bjarne | last post by:

Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance...

C# / C Sharp

std::string performance (Sun implementation)

by: jortizclaver | last post by:

Hi, I'm about to develop a new framework for my corporative applications and my first decision point is what kind of strings to use: std::string or classical C char*. Performance in my system...

C / C++

Is there any books such as 'effective python' or else about theperformance?

by: Kenneth Xie | last post by:

att, thx.

Python

Freestore allocation etc

by: Philipp | last post by:

Hello, in my main() I have code which looks like this MyClass* myObject; if ( param.isValid() ) myObject = new MyClass(otherParam, param); else myObject = new MyClass(otherParam); -----...

C / C++

Reporting On DB2 Performance

by: cjshea | last post by:

Hello, I'm posting this casually, with hope that someone can provide a bit of guidance. DB2 has many monitors, health, performance, etc. My management has asked me to provide a report on a...

DB2 Database

Linq Over Dataset/SQL Server in terms of Performance, Concurrency

by: =?Utf-8?B?V2lsc29uIEMuSy4gTmc=?= | last post by:

Hi Experts, I am doing a prototype of providing data access (read, write & search) through Web Service. We observed that the data storing in SQL Server 2005, the memory size is always within...

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA