Malloc Query

pushpakulkar

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done. Any inputs/comments
on the same are highly appreciated.

Thanks and Regards,
Pushpa

Oct 15 '08 #1

Subscribe Post Reply

1641

Nate Eldredge

pu**********@gmail.com writes:

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done.

Certainly in principle you could write a better malloc() than the one
your system provides. However, the authors of your system's standard
library (hopefully) put a lot of thought and effort into writing a good
general-purpose allocator, so you might have a hard time doing better.

Now, the phrase "general-purpose" gives a caveat. There are a lot of
possible factors to consider in writing an allocator: speed, efficient
usage of memory, robustness in the face of user error, avoiding
fragmentation, returning memory to the OS, etc. These involve
tradeoffs. Your library's malloc() will have chosen some balance
between these, and it's possible that you want a different balance, in
which case your own implementation could do "better". Another factor is
that if you know something about how you are going to use your allocator
(e.g. you will be mainly allocating 73-byte chunks and rarely freeing
them), you can optimize appropriately, which the system's malloc() can't
do.

I'd suggest if you are considering this, to first look at some of the
many freely available memory allocator implementations out there. Some
of them are quite advanced. Then, once you have a few possible
candidates, benchmark them and see which is the best for your
application.

Oct 15 '08 #2

James Kuyper

pu**********@gmail.com wrote:

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager.

You need to understand that the overhead of malloc() and free() is an
intrinsic consequence of the job that they're required to perform.
You'll find it difficult, though not necessarily impossible, to create a
faster version of malloc() than the one provided by the standard library.

The only way you can reasonably hope for a significantly faster memory
manager than malloc()/free() is to define it with more limited
capabilities than malloc()/free(). For example example: a memory manager
that always returns allocations of exactly the same size can be
significantly faster than malloc(). A memory manager that never actually
needs to release memory can also be significantly faster.

Oct 15 '08 #3

Eric Sosman

pu**********@gmail.com wrote:

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same.

"Many users" claim all manner of foolishness. The claim quoted
here is empty of useful content and hence not really debatable; it
can be proven or disproven merely by choosing suitable or unsuitable
contexts. It's like saying "helicopters are inefficient" without
describing the task at hand.

Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done. Any inputs/comments
on the same are highly appreciated.

If you are much, much smarter than the people who wrote the memory
manager(s) provided with the system, you may be able to make huge
improvements. You might also achieve huge improvements in specific
applications by taking advantage of application-specific knowledge the
library writers lacked. But if you're writing a "general purpose"
memory manager and you're only slightly smarter than the implementors,
you will probably do only a little bit better. If that.

The big gains, in my experience, come not from replacing malloc()
and friends but from building richer API's atop them (or atop other
low-level memory managers). Consider: malloc() gets only one piece
of information, the number of bytes requested. Could malloc() do a
better job if it knew whether the allocation would be short-lived or
long-lived? Might it benefit from knowing that some allocations were
associated with others, having similar lifetimes and being accessed at
pretty much the same times? Would knowledge of which allocations would
keep constant size and which were likely to be reallocated be of use?
I'm sure you can think of ways a memory manager might make productive
use of information of this nature, information malloc() doesn't have.

However, this sort of information tends to be a bit application-
specific, which takes us away from the "general purpose" goal. If you
plan to use the narrow API of malloc() et al. and you measure the
performance across a wide range of disparate applications, you're not
likely to beat the system-provided manager(s) by a huge margin.

--
Er*********@sun.com

Oct 15 '08 #4

user923005

On Oct 15, 10:04*am, pushpakul...@gmail.com wrote:

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same.

Some applications may be bottlenecked by rapid allocation and
releasing of resources. If you measure this as a problem in your
application via profile, then it is worth investigating alternatives.

Can we really
eliminate overhead by using
used defined functions/user defined memory manager.

No. We might reduce it.

Is there any
significant advantage in providing user defined malloc() and free().

It depends. The BSD memory allocator is famous for its excellence,
and so it is unlikely that you are going to write a better one. Even
at that, it may be worthwhile to create an alternative for special
situations.

I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done.

Imagine that you are a compiler vendor. If you want to compete with
the other compiler vendors, you will have to write a good memory
manager. If other teams of people write better memory managers, you
are probably going to look at them and use their good ideas (unless
they are patented -- and even then you might just pay the royalty if
the benefit is large enough). The compiler memory manager will have
to have intimate knowlege of the operating system memory manager if it
is to perform well. I have written memory managers (most often sub-
allocators that manage a large memory pool of rapid alloc/free
cycles). However, this is rarely the bottleneck in a large
application and it is a total waste of time to do this sort of thing
for applications that are not bound by this limit.

Did you hear about Dr. Finklebean? He invented a cure for a disease
that didn't exist. But then he caught the disease. Worst of all, the
cure did not work.
So don't be like Dr. Finklebean unless you have already caught the
disease and you need to find a solution for it.

If you want your code to be fast, write efficient algorithms.
Everything else pales in comparison.
IMO-YMMV.

Oct 15 '08 #5

Default User

pu**********@gmail.com wrote:

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager.

Chances are, if you have to ask the question, then the answer is "no".

Brian

Oct 15 '08 #6

Richard Harter

On Wed, 15 Oct 2008 10:04:52 -0700 (PDT), pu**********@gmail.com
wrote:

>Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done. Any inputs/comments
on the same are highly appreciated.

Whether you can write one that is better depends in part upon the
quality of the allocator supplied by the OS, in part upon your
requirements, and, of course, in part upon you. Writing a high
performance allocator is not a trivial exercise. Writing one (or
using someone else's) is probably not worthwhile if your
objective is to squeeze out more performance from the raw
allocation routines.

However in my opinion there are good reasons for using user
defined functions. They fall into three main categories -
robustness, error management, and information management. I
wrote and use a storage allocator for those reasons. See
http://home.tiac.net/~cri_a/source_c...ace/readme.txt for a
discussion of my reasoning.

That said, what you should do is write and use special purpose
storage management routines. Just as an example, suppose you
have a data structure with a lot of nodes that is transient. Do
you allocate each node separately and then free each node
separately when you are done with the data structure? That's
expensive. Instead you can allocate an array of nodes and then
deallocate all of the nodes with one call to free. There are
many such techniques.

Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Oct 15 '08 #7

Paul Hsieh

On Oct 15, 10:24*am, Nate Eldredge <n...@vulcan.lanwrote:

pushpakul...@gmail.com writes:
Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done.

Certainly in principle you could write a better malloc() than the one
your system provides. *However, the authors of your system's standard
library (hopefully) put a lot of thought and effort into writing a good
general-purpose allocator, so you might have a hard time doing better.

You would think so ... but I was able to write a malloc/free that
highly outperformed the System V allocator on my first try, without
using any really advanced tricks (just a header at -sizeof(header)
offset from the pointer that contains the size, a footer, a linked
list in a power of two array of size bounded free entries, and active
coalescence).

Now, the phrase "general-purpose" gives a caveat. *There are a lot of
possible factors to consider in writing an allocator: speed, efficient
usage of memory,

After a little while of thinking about this, you should realize that
these are all actually the same thing.

[...] robustness in the face of user error,

Ah. Now that's a different problem. However, it turns out that the
cost of this is relatively low in the grand scheme of things.

[...] avoiding fragmentation,

I've never understood why people obsess over this. If instead you
just try to make your allocations have the highest probability of
staying in your cache, then you have a far better heuristic that makes
fragmentation obsession a moot point.

[...] returning memory to the OS, etc.

Yeah, I dunno how important this is. The more you give the memory
back, the more you end up negotiating for your memory, which again
slows you down.

[...]*These involve tradeoffs.

Many of them do not, and there are some obvious prioritization that
can be done.

[...] *Your library's malloc() will have chosen some balance
between these,

More likely they were tuned to some arbitrary benchmark. My own
investigation on this specific problem indicates this has not lead to
the right answer most of the time.

[...] and it's possible that you want a different balance, in
which case your own implementation could do "better". *Another factor is
that if you know something about how you are going to use your allocator
(e.g. you will be mainly allocating 73-byte chunks and rarely freeing
them), you can optimize appropriately, which the system's malloc() can't
do.

I'd suggest if you are considering this, to first look at some of the
many freely available memory allocator implementations out there. *Some
of them are quite advanced. *Then, once you have a few possible
candidates, benchmark them and see which is the best for your
application.

Actually the only thing interesting in any of these are how they deal
with multi-threading. Otherwise, its just really hard to imagine any
truly interesting strategizing that doesn't cause as many problems as
it tries to solve.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Oct 15 '08 #8

Lowell Gilbert

Paul Hsieh <we******@gmail.comwrites:

On Oct 15, 10:24*am, Nate Eldredge <n...@vulcan.lanwrote:
>pushpakul...@gmail.com writes:
Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.
I am talking about general purpose applications. With a user defined
memory manager can it handle
memory allocation and de-allocation in a much better way than what OS
would have done.

Certainly in principle you could write a better malloc() than the one
your system provides. *However, the authors of your system's standard
library (hopefully) put a lot of thought and effort into writing a good
general-purpose allocator, so you might have a hard time doing better.

You would think so ... but I was able to write a malloc/free that
highly outperformed the System V allocator on my first try, without
using any really advanced tricks (just a header at -sizeof(header)
offset from the pointer that contains the size, a footer, a linked
list in a power of two array of size bounded free entries, and active
coalescence).

Nat Eldredge could have emphasized his "hopefully" side comment a bit
more (I'll skip the grammatical nit about the word). Case in point
(more modern than the System V example): the number of systems on
which the FreeBSD allocator added enough performance to be worth
adding to firefox.

>Now, the phrase "general-purpose" gives a caveat. *There are a lot of
possible factors to consider in writing an allocator: speed, efficient
usage of memory,

After a little while of thinking about this, you should realize that
these are all actually the same thing.

Not at all. Many allocators can improve performance by increasing the
granularity of their underlying allocations, but that increases the
amount of internal fragmentation, which by definition is wasted
space. "Memory is cheap" tends to keep this from mattering on PCs,
but not always, and not in other environments.

>[...] robustness in the face of user error,

Ah. Now that's a different problem. However, it turns out that the
cost of this is relatively low in the grand scheme of things.

>[...] avoiding fragmentation,

I've never understood why people obsess over this. If instead you
just try to make your allocations have the highest probability of
staying in your cache, then you have a far better heuristic that makes
fragmentation obsession a moot point.

External fragmentation can be a horrendous problem over sufficiently
difficult allocation patterns. Those patterns are unlikely for a lot
of applications, but not for others. In particular, "long-lived"
applications with applications of varying sizes and lifetimes are more
likely to hit such problems than other applications generally are.

>[...] returning memory to the OS, etc.

Yeah, I dunno how important this is. The more you give the memory
back, the more you end up negotiating for your memory, which again
slows you down.

It depends on your system and what else is happening in the system.
It's not something that can be optimized based on your application's
characteristics alone.

>[...]*These involve tradeoffs.

Many of them do not, and there are some obvious prioritization that
can be done.

>[...] *Your library's malloc() will have chosen some balance
between these,

More likely they were tuned to some arbitrary benchmark. My own
investigation on this specific problem indicates this has not lead to
the right answer most of the time.

Or even a non-arbitrary benchmark doesn't cover all situations, and
may become less meaningful as other improvements in the system change
the factors that its design trades off.

>[...] and it's possible that you want a different balance, in
which case your own implementation could do "better". *Another factor is
that if you know something about how you are going to use your allocator
(e.g. you will be mainly allocating 73-byte chunks and rarely freeing
them), you can optimize appropriately, which the system's malloc() can't
do.

I'd suggest if you are considering this, to first look at some of the
many freely available memory allocator implementations out there. *Some
of them are quite advanced. *Then, once you have a few possible
candidates, benchmark them and see which is the best for your
application.

Actually the only thing interesting in any of these are how they deal
with multi-threading. Otherwise, its just really hard to imagine any
truly interesting strategizing that doesn't cause as many problems as
it tries to solve.

Multi-threading, along with multiprocessing, qualifies as one of the
top problems that has caused many system allocators to become less
optimal than they used to be. Furthermore, improvements in those
areas often *have* to be system-dependent.

--
Lowell Gilbert, embedded/networking software engineer
http://be-well.ilk.org/~lowell/

Oct 15 '08 #9

jameskuyper

Paul Hsieh wrote:

On Oct 15, 10:24ï¿½am, Nate Eldredge <n...@vulcan.lanwrote:

....

Now, the phrase "general-purpose" gives a caveat. ï¿½There are a lot of
possible factors to consider in writing an allocator: speed, efficient
usage of memory,

After a little while of thinking about this, you should realize that
these are all actually the same thing.

It doesn't seem so to me. Efficient use of memory means being able to
deliver to the user as large a percentage of the memory available as
possible. That's a very different criterion from deliverying the
memory as fast as possible.Many of the things I'm aware of that speed
up memory allocation involve rendering some of that memory
unavailable. For example, consider the popular approach of rounding up
allocation requests to the nearest power of 2, and allocating
different powers of 2 from different blocks of memory.. The difference
between the rounded size of the requrest and the actual size requested
is memory that is unavailble for any useful task. It reduces effiicent
use of memory, but makes allocation faster.

[...] avoiding fragmentation,

....

I've never understood why people obsess over this. If instead you
just try to make your allocations have the highest probability of
staying in your cache, then you have a far better heuristic that makes
fragmentation obsession a moot point.

My understanding is that the key reason for avoiding fragmentation is
that if a request comes in for a single block of memory larger than
the largest fragment currently available, then it cannot be satisfied,
even though the total amount of memory available might be orders of
magnitude larger than the size requested. I don't see how the approach
you describe makes that possibility something you don't need to worry
about.

[...] returning memory to the OS, etc.

Yeah, I dunno how important this is. The more you give the memory
back, the more you end up negotiating for your memory, which again
slows you down.

Yes, but the memory you don't give back slows other programs down; in
many contexts, that's a serious issue.

Oct 15 '08 #10

Lew Pitcher

On October 15, 2008 16:45, in comp.lang.c, Paul Hsieh (we******@gmail.com)
wrote:

On Oct 15, 10:24Â*am, Nate Eldredge <n...@vulcan.lanwrote:
>pushpakul...@gmail.com writes:
Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same.

[snip]

>Now, the phrase "general-purpose" gives a caveat. Â*There are a lot of
possible factors to consider in writing an allocator: speed, efficient
usage of memory,

After a little while of thinking about this, you should realize that
these are all actually the same thing.

I beg to differ. Speed is not the same as efficient use of memory, which is
not the same as "reduction in fragmentation".

In general, a "first fit" allocator will be faster than a "best fit"
allocator, and a "best fit" allocator will make better use of memory than
a "first fit" allocator. However, a "best fit" allocator "tends to increase
the number of very small blocks, and proliferation of small blocks is
usually undesirable." (Knuth TAOCP Vol 1, Chapter 2.5).
[snip]
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

Oct 15 '08 #11

Bartc

<pu**********@gmail.comwrote in message
news:9b**********************************@b31g2000 prf.googlegroups.com...

Hi all,

Many users claim that malloc() and free() provided by the language
library results in considerable
amount of overhead in terms of calling the same. Can we really
eliminate overhead by using
used defined functions/user defined memory manager. Is there any
significant advantage in providing user defined malloc() and free(). I
want to exclude realtime systems or embedded systems which are memory
constrained here.

I use a special allocator, for sizes up to 1KB, built /on top/ of malloc()
and free().

This outperforms the malloc() associated with my gcc for x86 by about 3 to 1
(with other libraries this figure can vary). But, I use a few extra
restrictions (for example freed small allocations are returned to a local
pool but never submitted to free()).

With more restrictions and for even more specialised applications, more
speedup can be achieved: you can get allocation and deallocation down to a
few machine instructions each.

You should profile your use of malloc/free to see if they are actual
bottlenecks.

--
bartc

Oct 16 '08 #12

Similar topics