Why does ANSI not define a function to determine the size of (m)allocatedmem? (like _msize)

bilbothebagginsbab5 AT freenet DOT de

Hello, hello.

So. I've read what I could find on google(groups) for this, also the faq of
comp.lang.c.

But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is". (
Question 7.27)

Is there somwhere explained why - because it would seem to me, that free()
and realloc(..) would have to know the size of allocated space, and I would
like to know the reason this information is not "disclosed" by a std-library.

Best regards,
Martin

Nov 14 '05 #1

Subscribe Post Reply

3670

Mike Wahler

"bilbothebagginsbab5 AT freenet DOT de" <"bilbothebagginsbab5 AT freenet DOT
de"> wrote in message news:41********@e-post.inode.at...

Hello, hello.

So. I've read what I could find on google(groups) for this, also the faq of comp.lang.c.

But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is". (
First, most obvious question is: Why do you need to know?

You'll have to go visit the folks at comp.std.c to discuss
how and why the language is as it is. However, my stance is
because it's not necessary. When you allocate, you know how
much you're allocating. Simply 'remember' this value (store
it in a variable), and refer to it when needed.
Question 7.27)

Is there somwhere explained why - because it would seem to me, that free()
and realloc(..) would have to know the size of allocated space,
They do need to 'know' (but this 'knowledge' might be
implemented at a lower-level, i.e. in the OS itself; iow
'free()' might simply query the OS for this info[1]). The
mechanical details of allocation/deallocation are left up
to the implementation, and are not specified by the language.
You as the programmer don't need to know. 'free()' is required
to Do The Right Thing(tm).
and I would
like to know the reason this information is not "disclosed" by a

std-library.

Again, you'll need to ask in comp.std.c

To me, it's simple. Not needed. The smaller the library, the
less stuff imposed on folks that don't want or need it. IMO
a Good Thing(tm).

[1] If your implementation indeed does work this way and documents
exactly what it does and how it works, you might want to check
your OS API to see if the same info can be obtained via an API
call, or perhaps a (nonstandard) library extension. But personally
I would not go to such lengths without a very compelling reason
to do so.

-Mike

Nov 14 '05 #2

mfhaigh

bilbothebagginsbab5 AT freenet DOT de wrote:

Hello, hello.

So. I've read what I could find on google(groups) for this, also the faq of comp.lang.c.

But still I do not understand why there is not standard method to "(...) query the malloc package to find out how big an allocated block is". ( Question 7.27)

Is there somwhere explained why - because it would seem to me, that free() and realloc(..) would have to know the size of allocated space, and I would like to know the reason this information is not "disclosed" by a

std-library.

Why should it be disclosed? I see no compelling reason. If you need
the size of a block later, then keep track of it when you allocate it.
You could also easily wrap malloc and free to provide this
functionality.

Different implementations have different allocation strategies to fit
their needs. Adding an additional, rarely used requirement will
negatively impact some implementations in terms of memory usage and
execution time.
Mark F. Haigh
mf*****@sbcglobal.net

Nov 14 '05 #3

Peter Nilsson

mf*****@sbcglobal.net wrote:

... If you need
the size of a block later, then keep track of it when you allocate it. You could also easily wrap malloc and free to provide this
functionality.

You can do it if you know _all_ the types being allocated with malloc
by a given program, but there's no bullet proof general purpose way in
standard C alone.

--
Peter

Nov 14 '05 #4

Michael Mair

Peter Nilsson wrote:

mf*****@sbcglobal.net wrote:
... If you need
the size of a block later, then keep track of it when you allocate
it.
You could also easily wrap malloc and free to provide this
functionality.

You can do it if you know _all_ the types being allocated with malloc
by a given program, but there's no bullet proof general purpose way in
standard C alone.

Have I missed something? What speaks against
struct my_malloc_entry {
void *data;
size_t size;
};
or the linked list equivalent and keeping track of address
and size? After handling size==0 and checking whether malloc()
was successful, you store the data you need. If you are asked
for the size, you go through the array/list/whatever. At freeing,
you either "invalidate" the memory or actually free it.
If you want, you can implement a test mode which does not free
the memory and enables you to ask for potentially invalid
pointers or whatever.
All in standard C.
Or you allocate a large chunk of memory and manage "dealing out"
parts of it by yourself. In standard C.
Either way, the requested functionality _can_ be provided.

I do not claim that these are efficient ways of doing it, hence
probably not "general purpose" enough for some people.
Maybe you thought only of storing the struct immediately before
the malloc()ed memory; it would at least explain your remark
about the types.

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Nov 14 '05 #5

Dik T. Winter

In article <11*********************@z14g2000cwz.googlegroups. com> mf*****@sbcglobal.net writes:

bilbothebagginsbab5 AT freenet DOT de wrote: ....
But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is".

.... Why should it be disclosed? I see no compelling reason. If you need
the size of a block later, then keep track of it when you allocate it.
You could also easily wrap malloc and free to provide this
functionality.

But there are allocaters that will allocate more than requested in many
cases. If you are tight on memory and want to keep track of what you
are actually using you might wish such a capability.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Nov 14 '05 #6

E. Robert Tisdale

bilbothebagginsbab5 wrote:

So. I've read what I could find on google(groups) for this,
also the faq of comp.lang.c.

But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is".
(Question 7.27)

Is there somwhere explained why - because it would seem to me that
free() and realloc(..) would have to know the size of allocated space,
and I would like to know the reason [why]
this information is not "disclosed" by a std-library. cat main.c #include <stdio.h>
#include <stdlib.h>

size_t allocation(const void* p) {
return ((size_t*)p)[-1] - 1;
}

int main(int argc, char* argv[]) {
if (1 < argc) {
const size_t n = atoi(argv[1]);
const void* p = malloc(n);
fprintf(stdout, "size = %u\n", allocation(p));
free((void*)p);
}
return EXIT_SUCCESS;
}
gcc -Wall -std=c99 -pedantic -o main main.c
./main 33 size = 40 gcc --version

gcc (GCC) 3.4.1

The ANSI/ISO C standards don't specify such a function
because it isn't necessary.
The standard doesn't even require the implementation
to keep track of the size of the memory allocated.
But all viable implementations *do* keep track
of the size information and you only need to find out where.
My implementation reserves an extra double word (8 bytes)
just before p to store the size information and sets

((size_t*)p)[-1] = ((size + 11)%8)*8 + 1

Nov 14 '05 #7

Martin T.

bilbothebagginsbab5 AT freenet DOT de wrote:

Hello, hello.

But still I do not understand ...
Best regards,
Martin

Dear Folks.
May I thank everybody who shared some thoughts on the issue.

After a good night's sleep, I came up with the following conclusion:

What I didn't take into account was, that - as is mentioned by some -
the implementaion only has to guarantee the _at_least_ the size of
memory requested is allocated.
So if the impl. would reserve more memory it would only have to keep
track of the memory it actually reserved, and not the size which the
programmer wants to use.
So if _msize() cannot tell me what I've requested, but only what is
actually reserved, it's pretty useless for the thing I wanted it for.
(Well, propably it would have worked anyway, since I have an allocated
array of a pretty big struct, so it seems *very* unlikely to me that
the system would reserve excess mem longer than this struct ... but I
don't think I will take the risk :-) )

best regards,
Martin

Nov 14 '05 #8

Lawrence Kirby

On Fri, 17 Dec 2004 00:43:52 +0100, Michael Mair wrote:

....

Have I missed something? What speaks against
struct my_malloc_entry {
void *data;
size_t size;
};
or the linked list equivalent and keeping track of address
and size? After handling size==0 and checking whether malloc()
was successful, you store the data you need. If you are asked
for the size, you go through the array/list/whatever. At freeing,
you either "invalidate" the memory or actually free it.
If you want, you can implement a test mode which does not free
the memory and enables you to ask for potentially invalid
pointers or whatever.
All in standard C.
That's fine because you're leaving malloc() to handle all alignment
issues.
Or you allocate a large chunk of memory and manage "dealing out" parts
of it by yourself. In standard C. Either way, the requested
functionality _can_ be provided.

This is problematic because to deal out parts of an allocated area you
must ensure that each part is correctly aligned. There is no portable way
of doing this.

Lawrence

Nov 14 '05 #9

websnarf

> First, most obvious question is: Why do you need to know?

Performance reasons. Generally malloc actually allocates somewhat more
memory that was requested. If malloc is being used to back a resizable
array, and there is actually memory available that wasn't originally
asked for, then knowing the real size can let you postpone the realloc
as you, perhaps, add entries to your vector. There are definately
quantifiable cases where my string library http://bstring.sf.net/ could
improve its performance with this knowledge if it was available.

They do need to 'know' (but this 'knowledge' might be
implemented at a lower-level, i.e. in the OS itself; iow
'free()' might simply query the OS for this info[1]).
[...] The smaller the library, the
less stuff imposed on folks that don't want or need it.

I think you are missing the OP's point. He is saying that there is
code linked into your application that accesses the allocated size
anyways (hidden in the code for free() and realloc()), whether you want
it to or not. You just don't have any direct access to it.

The only real advantage of the standard not exposing this might be if
this, in fact, was not exactly true. For example, the size might
*change* as a side effect of other allocations. I.e., the size might
increase because the sliver between it and an adjacent allocation is
too small and so it gets attached to a previous allocation to decrease
fragmentation or leaks. Or the allocation scheme may use a heuristic
to "predict" impending "reallocs", which may cause it to decrease the
size as a results of a later heuristic failures or something.

But, from my own research into memory allocation schemes none of these
ideas are representative of good high performance or memory stingyness
or fragment reducing solutions. I.e., I think it would be a good idea
to go ahead and add such a thing to the standard. However, there are a
lot more things I would like to add to the standard regarding memory
allocation as well.

Nov 14 '05 #10

Mike Wahler

<we******@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...

First, most obvious question is: Why do you need to know?
Performance reasons.
Generally malloc actually allocates somewhat more
memory that was requested.

It might, it might not. It's required to allocate
*at least* the requested size, and allowed to allocate
more. But your program is only allowed to legally access
the specific amount requested. Hands off the 'extra'
if you want your program's behavior to remain well-defined.
If malloc is being used to back a resizable
array, and there is actually memory available that wasn't originally
asked for, then knowing the real size can let you postpone the realloc
No. See above.
as you, perhaps, add entries to your vector. There are definately
quantifiable cases where my string library http://bstring.sf.net/ could
improve its performance with this knowledge if it was available.
I don't see how, not in a standard manner.

They do need to 'know' (but this 'knowledge' might be
implemented at a lower-level, i.e. in the OS itself; iow
'free()' might simply query the OS for this info[1]).
[...] The smaller the library, the
less stuff imposed on folks that don't want or need it.
I think you are missing the OP's point. He is saying that there is
code linked into your application that accesses the allocated size
anyways (hidden in the code for free() and realloc()), whether you want
it to or not. You just don't have any direct access to it.

Right. But accessing it is outside the realm of standard C.
The only real advantage of the standard not exposing this might be if
this, in fact, was not exactly true. For example, the size might
*change* as a side effect of other allocations.
The implementation is free to perform whatever internal
machinations it likes in order to provide the required
behavior. But again, access and manipulation of such
'internals' is necessarily nonstandard, platform-specific.
I.e., the size might
increase because the sliver between it and an adjacent allocation is
too small and so it gets attached to a previous allocation to decrease
fragmentation or leaks. Or the allocation scheme may use a heuristic
to "predict" impending "reallocs", which may cause it to decrease the
size as a results of a later heuristic failures or something.

But, from my own research into memory allocation schemes
BUT: C is intentionally designed to leave the selection of
such 'schemes' to the implementation, in the interest of
maximal portability.
none of these
ideas are representative of good high performance or memory stingyness
or fragment reducing solutions. I.e., I think it would be a good idea
to go ahead and add such a thing to the standard. However, there are a
lot more things I would like to add to the standard regarding memory
allocation as well.

Yes, I know, many people have their own favorite things they
want added to the language. But I doubt such things as this
'low level' memory stuff will be, in the interest of keeping
things as abstract as possible, to keep the language as portable
as possible.

I do realize that many times it can be necessary to work
more 'intimately' with a given platform in the interest of
e.g. performance. But such things are (imo properly) outside
the scope of the standard language.

-Mike

Nov 14 '05 #11

Keith Thompson

"Mike Wahler" <mk******@mkwahler.net> writes:

<we******@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
> First, most obvious question is: Why do you need to know?

Performance reasons.
Generally malloc actually allocates somewhat more
memory that was requested.

It might, it might not. It's required to allocate
*at least* the requested size, and allowed to allocate
more. But your program is only allowed to legally access
the specific amount requested. Hands off the 'extra'
if you want your program's behavior to remain well-defined.

[...]

Certainly, given the current language definition.

IMHO, it wouldn't be unreasonable to add something like the following
to <stdlib.h>:

size_t bytes_allocated(void *ptr);

It invokes undefined behavior in the same circumstances as free(ptr),
except that bytes_allocated(NULL) also invokes undefined behavior.
Otherwise, ptr is a pointer earlier returned by malloc(), calloc(), or
realloc(), and the function returns a number of bytes that the
implementation guarantees the program is able to access, which is at
least the number of bytes requested in the *alloc() call.

An implementation that just returns the number of bytes requested
would be conforming.

If the value returned is (sometimes) greater than the number
requested, it can sometimes save the need for a call to realloc().

This should be easy to implement (though impossible to implement
portably). If there are implementations for which it isn't easy to
implement this function, that would be a good argument against adding
it to the standard.

On the other hand, it wouldn't be unreasonable *not* to add this
function to the standard. For most purposes, the existing interface
is good enough. Creeping featurism is always a risk, and the burden
of proof is on anyone advocating an addition. I don't pretend that
I've met that burden.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #12

Jack Klein

On Sat, 18 Dec 2004 01:24:57 GMT, Keith Thompson <ks***@mib.org> wrote
in comp.lang.c:

"Mike Wahler" <mk******@mkwahler.net> writes:
<we******@gmail.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
> First, most obvious question is: Why do you need to know?

Performance reasons.
Generally malloc actually allocates somewhat more
memory that was requested.
It might, it might not. It's required to allocate
*at least* the requested size, and allowed to allocate
more. But your program is only allowed to legally access
the specific amount requested. Hands off the 'extra'
if you want your program's behavior to remain well-defined.

[...]

Certainly, given the current language definition.

IMHO, it wouldn't be unreasonable to add something like the following
to <stdlib.h>:

size_t bytes_allocated(void *ptr);

It invokes undefined behavior in the same circumstances as free(ptr),
except that bytes_allocated(NULL) also invokes undefined behavior.
Otherwise, ptr is a pointer earlier returned by malloc(), calloc(), or
realloc(), and the function returns a number of bytes that the
implementation guarantees the program is able to access, which is at
least the number of bytes requested in the *alloc() call.

An implementation that just returns the number of bytes requested
would be conforming.

If the value returned is (sometimes) greater than the number
requested, it can sometimes save the need for a call to realloc().

Now you've fallen into a very nasty trap. You're assuming not only
that the implementation tells you that memory is there, but also lets
you use more than you asked for with defined results. And that leads
to performance losses in some situations.

Let's just say that such a function exists. And let's say that you
allocate some number of bytes, indicated by the macro SIZE.

my_ptr = malloc(SIZE);

Now as your program continues, you realize that you could use a few
more bytes, let's say exactly three more.

if (bytes_allocated(my_ptr) < SIZE + 3)
{
/* use realloc() to resize the block larger */
}

/* add three more bytes to the block */

Then a little further on, you need more memory still, and your
bytes_allocated() function indicates there is not enough, so you must
call realloc(). If malloc() has to move the block to extend it to the
new size, it must copy the contents of the original block. Under
today's standard, that would be SIZE bytes.

But since you can actually store data in bytes_allocated() bytes,
without bothering to inform the library that you are doing so, it must
copy all of those bytes into the newly allocated block.

So every program pays a potentially heavy price on every call to
realloc(), just so you can avoid a realloc() once in a while. This is
quite the opposite of the spirit of C, where you don't pay for what
you don't use.
On the other hand, it wouldn't be unreasonable *not* to add this
function to the standard. For most purposes, the existing interface
is good enough. Creeping featurism is always a risk, and the burden
of proof is on anyone advocating an addition. I don't pretend that
I've met that burden.

It would be very reasonable *not* to add this functionality to the
standard library. I don't want to pay the price for the extra copying
when I call realloc().

The only other case it solves is that of the lazy programmer, who
can't be bothered to remember the size he/she asked for and pass it
around as necessary.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Nov 14 '05 #13

websnarf

> Now you've fallen into a very nasty trap. You're assuming

not only that the implementation tells you that memory is
there, but also lets you use more than you asked for with
defined results. And that leads to performance losses in
some situations.

Not necessarily. In the scenario you describe, the implementation can
track whether or not bytes_allocated() has been called on it or not.
And remember since the whole point is to reduce the *number* of
reallocs, we are gaining back that peformance in ideal situations
anyhow. And even if the implementation situation I describe is not
common you can back off to the current assumptions just by returning
the original memory size requested (which must be known to leverage the
realloc scenario you suggest.)

But the value of such a function obviously includes debugging. So I
don't see the inclusion of such a function as either a trap or an
irrelevancy.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #14

Keith Thompson

Jack Klein <ja*******@spamcop.net> writes:

On Sat, 18 Dec 2004 01:24:57 GMT, Keith Thompson <ks***@mib.org> wrote
in comp.lang.c: [...]
IMHO, it wouldn't be unreasonable to add something like the following
to <stdlib.h>:

size_t bytes_allocated(void *ptr);

[...] Now you've fallen into a very nasty trap. You're assuming not only
that the implementation tells you that memory is there, but also lets
you use more than you asked for with defined results.
Strictly speaking, I was suggesting giving the implementation an
opportunity to promise more useful memory than the user asked for.
The implementation isn't obligated to take advantage of this.

But ...

[...] But since you can actually store data in bytes_allocated() bytes,
without bothering to inform the library that you are doing so, it must
copy all of those bytes into the newly allocated block.

So every program pays a potentially heavy price on every call to
realloc(), just so you can avoid a realloc() once in a while. This is
quite the opposite of the spirit of C, where you don't pay for what
you don't use.

That's a very good point; I hadn't thought of that.

I've thought of two or three workarounds, but the results are ugly, so
I think I'll give up on the whole idea.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #15

Malcolm

"bilbothebagginsbab5 AT freenet DOT de" <"bilbothebagginsbab5 AT freenet DOT
de"> wrote

But still I do not understand why there is not standard method to "(...)
query the malloc package to find out how big an allocated block is". (
Question 7.27)

It's a decision by the original designers of the C language, and later by
ANSI.

The reason was probably that malloc() rounded up the space of the allocated
block to the nearest divisor by eight, and so an msize() method couldn't
return the exact size without rewriting the library, which was more trouble
than worth.

An msize() function for C is not necessarily a bad idea, though it
encourages functions of the form

/*
calculates the mean of an array of doubles
Notes: array must be allocated by malloc()
*/
double mean(double *x)

rather than

double mean(double *x, size_t N)

which is handier if the array happens to be on the stack.

If you think that msize() should be part of C2005 then no one will object to
you arguing this, and maybe your ideas will be incorporated into the
language. However the present situation is that it is not part of the
standard library.

Nov 14 '05 #16

Chris Croughton

On 18 Dec 2004 01:13:11 -0800, we******@gmail.com
<we******@gmail.com> wrote:

Not necessarily. In the scenario you describe, the implementation can
track whether or not bytes_allocated() has been called on it or not.
And remember since the whole point is to reduce the *number* of
reallocs, we are gaining back that peformance in ideal situations
anyhow. And even if the implementation situation I describe is not
common you can back off to the current assumptions just by returning
the original memory size requested (which must be known to leverage the
realloc scenario you suggest.)
The implementation may well have to use more memory to store the length,
thus wasting resources in the more common cases (on some machines quite
possibly wasting 16 bytes or more in order to get the alignment for the
worst case). Or it might have to spend time looking for the length in
some implementations (for instance ones using garbage collection where
separate lists of pointers and lengths are kept). There's also one
common one where the last block allocated on a 'heap' has an allocated
size of "the rest of the heap" until another block has to be allocated
after it.
But the value of such a function obviously includes debugging. So I
don't see the inclusion of such a function as either a trap or an
irrelevancy.

If you want it for debugging you can implement it on top of the existing
functions (as debuggng libraries such as dmalloc do), or your debugger
can interface with the allocation libraries at low level (since
debuggers are inherently system dependent).

How do you think C programming has survived for many decades without it?
It obviously isn't essential, it isn't even in C++ (where it could have
been added easily if they had wanted to do so). Does any language
actually allow you to allocate something anf find out how big it is
later?

Chris C

Nov 14 '05 #17

Malcolm

"Chris Croughton" <ch***@keristor.net> wrote

Does any language actually allow you to allocate something anf find out how big it is later?

In Java you can allocate an array of objects

eg
int [] catch = new int[daysfishing];

when you want you use the array you can find the length

for(i=0;i<catch.length;i++)
total += catch[i];

this is handy since it means you don't have to bother keeping track of the
array size, and also means that the array and the size cannot get out of
synch.

The disadvantage is that you pay a price for carrying about bounds
information internally.

Nov 14 '05 #18

Chris Croughton

On Sat, 18 Dec 2004 14:14:53 -0000, Malcolm
<ma*****@55bank.freeserve.co.uk> wrote:

"Chris Croughton" <ch***@keristor.net> wrote

Does any language actually allow you to allocate something anf find out how
big it is later?

In Java you can allocate an array of objects

eg
int [] catch = new int[daysfishing];

when you want you use the array you can find the length

for(i=0;i<catch.length;i++)
total += catch[i];

this is handy since it means you don't have to bother keeping track of the
array size, and also means that the array and the size cannot get out of
synch.

True, you can do it with vectors in C++ as well. C++ ones also allow
you to dynamically extend them (and get both the current length and the
current allocated size), but they are basically just structures with
dedicated functions to access them and are part of the library not of
the syntax.
The disadvantage is that you pay a price for carrying about bounds
information internally.

And probably speed penalties as well, if it uses it for length checking
on access (which Java does, I believe and C== STL vectors generally
don't).

C was designed to be lean & mean, if you don't know what you're doing
use some other language with more protection...

Chris C

Nov 14 '05 #19

Michael Mair

Lawrence Kirby wrote:

On Fri, 17 Dec 2004 00:43:52 +0100, Michael Mair wrote:

...

Have I missed something? What speaks against
struct my_malloc_entry {
void *data;
size_t size;
};
or the linked list equivalent and keeping track of address
and size? After handling size==0 and checking whether malloc()
was successful, you store the data you need. If you are asked
for the size, you go through the array/list/whatever. At freeing,
you either "invalidate" the memory or actually free it.
If you want, you can implement a test mode which does not free
the memory and enables you to ask for potentially invalid
pointers or whatever.
All in standard C.

That's fine because you're leaving malloc() to handle all alignment
issues.

Or you allocate a large chunk of memory and manage "dealing out" parts
of it by yourself. In standard C. Either way, the requested
functionality _can_ be provided.

This is problematic because to deal out parts of an allocated area you
must ensure that each part is correctly aligned. There is no portable way
of doing this.

You are of course right... I noticed it but was unable to amend it
myself (business trip).
Thanks for the correction :-)

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Nov 14 '05 #20

Keith Thompson

Chris Croughton <ch***@keristor.net> writes:
[...]

How do you think C programming has survived for many decades without it?
It obviously isn't essential, it isn't even in C++ (where it could have
been added easily if they had wanted to do so).
That argument could be used against adding any new feature. (Not that
it's invalid; any new feature needs to overcome this argument.)
Does any language actually allow you to allocate something anf find
out how big it is later?

In a language with a typed allocator (rather than one that returns an
untyped chunk of raw memory), you can generally get the size because
you know the type. Ada is one example. (That's the size of the
object, not the (possibly larger) size allocated by the underlying
system.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #21

websnarf

Ok, first, obviously, the length might not be explicitely stored -- it
might need to be derived. But the point of a byte_allocated() function
is not dictate an implementation, but rather to simply duplicate
whatever action free() or realloc() performs in implicitely working out
the length of the allocation. Most reasonable performance
implementations, just store the length, though.

The point is, since I am arguing that it doesn't really cost any more
to implement such a function, that it should always be there as an
assist to debugging. Writing wrapper functions, or using things like
dmalloc() *DOES* cost you something. I'm a big believer in built-in
error logging and recovery mechanisms such as are discussed here:
http://acmqueue.com/modules.php?name...owpage&pid=233 .
(There is this great dillusion that developers have that they can root
out all problems before a project ships -- but any realistic appraisal
of real world software should tell us that this is not the case.) In
the case of programming in C, one of the big problems with late term
debugging (i.e., dealing with problems only seen after the product
ships) is that there are no language tools for auditing the heap that
don't negatively impact performance. But my own analysis of the
problem indicates to me that there is no reason for this. MSVC++, for
example, does come with a well instrumented heap, but its tied
specifically to its debugger and thus is kind of useless for diagnosing
problems that show up in the field.

How do you think C programming has survived for many decades without
it?

Simple -- it *HASN'T*. That's why developers are leaving C behind.
The C language just doesn't scale with the complexity of applications,
and so the simplest thing, such as the implicit constructor destructor
paradigm of C++, which mitigates a lot of memory problems, are enough
for people to abandon ship.

I would also like to point out that in fact there are *C* compilers
that *DO* implement a bytes_allocated() function. Namely, WATCOM C/C++
implements _msize() which does exactly this. WATCOM also implements a
number of other interesting heap analysis functionality -- and it
remains available even for release builds.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #22

Howard Hinnant

Very interesting discussion. I wish I had noticed it earlier.

I also advocate something like _msize, and several other key additions
to the malloc interface.

I can speak as an author of a few commercial implementations of malloc:
Not one of the implementations I've done actually keeps track of the
size the user requested. And they all keep track of the size of the
block of memory returned to the client. When realloc is called, the
latter number of bytes is copied. Imho, it is faster to do that than to
track both the client requested size, and the client supplied size.
Although the former is definitely useful in a debug build.

A few people have given the main motivation for the existence of this
functionality: To delay the realloc-ing of a dynamically sized array:
If the memory is sitting there anyway, locked out of otherwise being
utilized, one might as well use it.

Yes, I would like to see this functionality in C++ as well. If it
exists at the C level first, it will be easier to bring it to C++'s
allocator concept, and C++ containers can then use it.

There is a proposal in the post-Redmond C mailing which addresses this
subject:

http://www.open-std.org/jtc1/sc22/wg...docs/n1085.htm

This is all about being able to write potentially higher performance
code and yet remain portable. It also includes "expand-in-place"
functionality to potentially further delay the need to realloc a growing
block of memory to a new location.

The latest Metrowerks products (on some platforms) contain the proposed
interface, although with alternate names in the C library, and a "malloc
allocator" in the C++ library which the std::vector knows how to make
good use of.

-Howard

Nov 14 '05 #23

Keith Thompson

we******@gmail.com writes:
[...]

I would also like to point out that in fact there are *C* compilers
that *DO* implement a bytes_allocated() function. Namely, WATCOM C/C++
implements _msize() which does exactly this. WATCOM also implements a
number of other interesting heap analysis functionality -- and it
remains available even for release builds.

Does _msize() return the number of bytes requested, or the number of
bytes actaully allocated? If the latter, it would probably require
realloc() to copy more bytes than it really needs to.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 14 '05 #24

Howard Hinnant

In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.org> wrote:

we******@gmail.com writes:
[...]
I would also like to point out that in fact there are *C* compilers
that *DO* implement a bytes_allocated() function. Namely, WATCOM C/C++
implements _msize() which does exactly this. WATCOM also implements a
number of other interesting heap analysis functionality -- and it
remains available even for release builds.

Does _msize() return the number of bytes requested, or the number of
bytes actaully allocated? If the latter, it would probably require
realloc() to copy more bytes than it really needs to.

Copying more bytes than it needs to can be faster than not. Consider a
malloc on the PPC architecture which returns 16 byte aligned blocks (the
largest alignment required for PPC vector objects). If the client
requests 75 bytes and malloc returns 80, it is much faster to move the
80 bytes with 5 vector load-stores rather than move 75 bytes with 4
vector load-stores, 1 floating point register load-store, and 3 byte
load-stores.

-Howard

Nov 14 '05 #25

Goran Larsson

In article <hi***************************@syrcnyrdrs-03-ge0.nyroc.rr.com>,
Howard Hinnant <hi*****@metrowerks.com> wrote:

A few people have given the main motivation for the existence of this
functionality: To delay the realloc-ing of a dynamically sized array:
If the memory is sitting there anyway, locked out of otherwise being
utilized, one might as well use it.

This motivation is very weak.

I am quite sure that implementors of malloc routines implement the
simple optimization of avoiding the copy and returning the original
pointer if the new size is smaller that the size of the memory area
malloc used.

Example:

p = malloc(30); /* Returns a pointer to a area */
/* from the 64 byte memory pool. */
. . .
p = realloc(p,40); /* As the new size is less than 64 */
/* no copy and same ptr is returned. */

vs.

p = malloc(30); /* Returns a pointer to a area */
/* from the 64 byte memory pool. */
. . .
if ( _msize(p) < 40 ) { /* _msize returns 64. */
p = realloc(p,40);
}

The cost of calling _msize can't be any different than the cost of
calling a realloc that doesn't need to do a copy.

What can be done with a _msize that isn't provided automatically with,
probably most (if not all), of todays malloc implementations?

--
Göran Larsson http://www.mitt-eget.com/

Nov 14 '05 #26

websnarf

> Does _msize() return the number of bytes requested, or the number of

bytes actaully allocated?
The bytes actually allocated.
[...] If the latter, it would probably require
realloc() to copy more bytes than it really needs to.

Depending on what you mean by "really needs to". I didn't implement
WATCOM C/C++ heap, and I can tell you, that they have far worse
problems than possible over-copying. The highest impact on realloc()
performance is actually the probability that it can consume adjacent
memory. Using _msize() you can deterministically avoid some small
increases, or just intrinsically avoid them because you've got the
extra space in the allocation anyway. The "extra space" should just be
a round up to the next alignment size -- implementations that try to
round up to powers of 2 or other oversized fixed block sizes (rather
than just alignment) end up paying a price in decreased locality (a
potentially far worse problem) and just plain worse memory utilization.
So I have no sympathy for these claims/concerns -- there are better
ways of designing heaps which are all around better and which don't
suffer any negative performance for exposing a bytes_allocated()
function.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #27

Howard Hinnant

In article <I8********@approve.se>, ho*@invalid.invalid (Goran Larsson)
wrote:

In article <hi***************************@syrcnyrdrs-03-ge0.nyroc.rr.com>,
Howard Hinnant <hi*****@metrowerks.com> wrote:
A few people have given the main motivation for the existence of this
functionality: To delay the realloc-ing of a dynamically sized array:
If the memory is sitting there anyway, locked out of otherwise being
utilized, one might as well use it.
This motivation is very weak.

Not really, unless we are designing a new language from scratch (see
below).
I am quite sure that implementors of malloc routines implement the
simple optimization of avoiding the copy and returning the original
pointer if the new size is smaller that the size of the memory area
malloc used.
Sure, such as the example realloc implementation in the proposal:

http://www.open-std.org/jtc1/sc22/wg...docs/n1085.htm
Example:

p = malloc(30); /* Returns a pointer to a area */
/* from the 64 byte memory pool. */
. . .
p = realloc(p,40); /* As the new size is less than 64 */
/* no copy and same ptr is returned. */

vs.

p = malloc(30); /* Returns a pointer to a area */
/* from the 64 byte memory pool. */
. . .
if ( _msize(p) < 40 ) { /* _msize returns 64. */
p = realloc(p,40);
}

The cost of calling _msize can't be any different than the cost of
calling a realloc that doesn't need to do a copy.

What can be done with a _msize that isn't provided automatically with,
probably most (if not all), of todays malloc implementations?

Some clients do not want the expense of a copy and are willing settle
for less memory than the requested size in a realloc. Other clients can
not tolerate a copy, no matter what the expense. For example some
clients may have structs that are self-referencing:

struct MyType
{
struct MyType* head; // head and tail may point to this
struct MyType* tail;
};
....
MyType* p = realloc(p, newsize*sizeof(MyType));

Self-referencing structs can not be memcpy'd, and thus arrays of them
can not be realloc'd (at least not without post detection of address
change and fix up). And so it is beneficial to see if such arrays can
be expanded in place (the real thrust of the proposal).

The proposed sizeof_alloc(void*) is a minor player in this paper,
present mainly for transitional code. If you receive a malloc'd pointer
from a legacy server, you can find out the size of the allocated block
with sizeof_alloc. A non-legacy server would simply use request_malloc
instead of malloc, and thus receive the memory block and its allocated
size at the same time (if that information is useful). So in a newly
built system today, sizeof_alloc(void*) is somewhat redundant, and thus
would have weak motivation. But in today's context, where you might be
dealing with the transfer of memory ownership from legacy code, the
motivation for sizeof_alloc(void*) is significantly stronger.

But the above is really just a subtlety. To really answer your question
head on: Some clients can only expand their array in-place, and can not
use memcpy when placing the array into a larger block. Thus realloc is
inconvenient at best, and completely wrong at worst. To be able to
discover the true size of your allocated block of memory, and to also
discover if it can be expanded in-place, and by how much, can be very a
significant optimization.

A potential pitfall: clients must not (portably) assume that
sizeof_alloc(void*) is O(1) time complexity. While all malloc systems I
can think of can return this information, some of them may not do it in
constant time. Indeed, one of the malloc implementations I've written
for a very tight embedded system returns this information in O(log)
time. And so I advise clients that need this information to cache it,
instead of recomputing it on each need, unless it is known that the
underlying implementation can deliver in O(1) time.

-Howard

Nov 14 '05 #28

websnarf

Goran Larsson wrote:

p = malloc (30);
...
p = realloc (p, 40);

vs.

p = malloc(30);
. . .
if ( _msize(p) < 40 ) {
p = realloc(p,40);
}

You misunderstand the purpose of the idea. Try the following:

.. p = malloc (30);
.. ...
.. if (_msize (p) < 40) {
.. t = realloc (p, 80); /* Double the allocation size */
.. p = t ? t : realloc (p, 40);
.. if (NULL == p) fail ("Out of memory");
.. }

See, if there is enough space, the memory is not resized. If not, we
try to actively attempt to postpone future reallocs to other size
increases by *doubling* the memory. Even linear increases in size will
only call realloc a logarithmic number of times. But in more common
situations of just doing a few small resizes attempts, knowing the
exact size can save a constant factor number of allocations.

I can't do this for http://bstring.sf.net/ because there is no portable
_msize(), so I use other strategies, which on average slightly increase
the average amount of memory used, and slightly increases the number of
calls to realloc(). Howard Hinnant's proposal would be fine for the
purposes of Bstrlib, however, if you go that far, there is a lot more
that can be done.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 14 '05 #29

CBFalconer

we******@gmail.com wrote:

.... snip ...
I would also like to point out that in fact there are *C* compilers
that *DO* implement a bytes_allocated() function. Namely, WATCOM
C/C++ implements _msize() which does exactly this. WATCOM also
implements a number of other interesting heap analysis
functionality -- and it remains available even for release builds.

No compiler provides this. Some library implementations do. The
approach I took in my nmalloc module for DJGPP (which should be
easily adapted to most POSIX like systems) is to supply a routine
in system name space which returns a record describing the internal
memory layout, together with a header describing the fields in that
record. As long as accessory routines are built using this
facilities, they can be implemented without worrying about the
actual malloc implementation details.

nmalloc.zip is available on my pages below, download section.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #30

Flash Gordon

On Sat, 18 Dec 2004 11:54:49 +0000
Chris Croughton <ch***@keristor.net> wrote:

On 18 Dec 2004 01:13:11 -0800, we******@gmail.com
<we******@gmail.com> wrote:
Not necessarily. In the scenario you describe, the implementation
can track whether or not bytes_allocated() has been called on it or
not. And remember since the whole point is to reduce the *number* of
reallocs, we are gaining back that peformance in ideal situations
anyhow. And even if the implementation situation I describe is not
common you can back off to the current assumptions just by returning
the original memory size requested (which must be known to leverage
the realloc scenario you suggest.)
The implementation may well have to use more memory to store the
length, thus wasting resources in the more common cases (on some
machines quite possibly wasting 16 bytes or more in order to get the
alignment for the worst case). Or it might have to spend time looking
for the length in some implementations (for instance ones using
garbage collection where separate lists of pointers and lengths are
kept). There's also one common one where the last block allocated on
a 'heap' has an allocated size of "the rest of the heap" until another
block has to be allocated after it.

The implementation logically must have some internal track of how much
space is allocated so that it can:

a) Copy the data on realloc if required.
b) Allocate the next block so that it does not overlap with what the
user requested for the last block on the heap.

Therefor the bytes_allocated() function could be implemented to return
the smallest number it has that

a) will be copied by realloc if realloc has to move the block
b) does not extend in to space that might be allocated by some
subsequent call to *alloc

So, as far as I can see the implementation must have some number which
it knows which is only bigger than what you requested if realloc would
copy the extra space on moving the block and is not so big that it will
lead you to use space that will get used else where.

But the value of such a function obviously includes debugging. So I
don't see the inclusion of such a function as either a trap or an
irrelevancy.

If you want it for debugging you can implement it on top of the
existing functions (as debuggng libraries such as dmalloc do), or your
debugger can interface with the allocation libraries at low level
(since debuggers are inherently system dependent).

How do you think C programming has survived for many decades without
it? It obviously isn't essential, it isn't even in C++ (where it could
have been added easily if they had wanted to do so).

Obviously it is not essential. For loops are not essential since you can
implement them using labels, if and goto.
Does any
language actually allow you to allocate something anf find out how big
it is later?

Before the first assembler was written did any language allow you to
call a function by name rather than address?

I can see that it would be useful for some code. By using such a
function if it was added to the standard a programmer would remove any
chance of failing to track the size of a buffer correctly. I know a
program *can* keep track itself correctly, since I deal with code that
does, however that does not mean it is always the best way to do it if
an alternative can be provided.

I find it hard to conceive of any allocation scheme that would not be
able to calculate an appropriate number to return when bytes_allocated()
was called, i.e. without adding overhead to the existing *alloc routines
or the data structures they use, although it might need a little code in
bytes_allocated() rather than just reading a number from the structure
and returning it.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Nov 14 '05 #31

Dave Thompson

On Sat, 18 Dec 2004 18:06:16 +0000, Chris Croughton
<ch***@keristor.net> wrote:

On Sat, 18 Dec 2004 14:14:53 -0000, Malcolm
<ma*****@55bank.freeserve.co.uk> wrote:
"Chris Croughton" <ch***@keristor.net> wrote

Does any language actually allow you to allocate something anf find out how
big it is later?

In Java you can allocate an array of objects [and use .length] <snip>

True, you can do it with vectors in C++ as well. C++ ones also allow
you to dynamically extend them (and get both the current length and the
current allocated size), but they are basically just structures with
dedicated functions to access them and are part of the library not of
the syntax.

Fortran 90 and up (F9X) for arrays (bounds), and for character strings
(length) which Fortran like COBOL and PL/I considers a separate type,
not just array of character as in C, Pascal, and Ada (sort of). But it
does so by having fat pointers. Very fat pointers. Obese, even.

IIRC PL/I CONTROLLED also allows DIMENSION (*) and character (or bit)
string length (*) meaning variable at ALLOCATE time, fetchable later
with builtins like HBOUND.

Ada somewhat more generally allows types, in particular subrange
types, to have bounds computed at compile time; and thus also arrays
(including strings) using these index types.

As a related but distinct feature, in all three languages, and also
Pascal, you can (explicitly) declare a parameter aka dummy or formal
which accepts by-reference an array or string (where distinct) of
varying bounds/sizes, and obtain them.

The disadvantage is that you pay a price for carrying about bounds
information internally.

Exactly.
And probably speed penalties as well, if it uses it for length checking
on access (which Java does, I believe and C== STL vectors generally
don't).

C was designed to be lean & mean, if you don't know what you're doing
use some other language with more protection...

Chris C

- David.Thompson1 at worldnet.att.net

Nov 14 '05 #32

Why does ANSI not define a function to determine the size of (m)allocatedmem? (like _msize)

Similar topics