By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,272 Members | 1,399 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,272 IT Pros & Developers. It's quick & easy.

Storing the size of an array in the structure itself

P: n/a
I think every C programmer can relate to the frustrations that malloc
allocated arrays bring. In particular, I've always found the fact that
the size of an array must be stored separately to be a nightmare.

There are of course many solutions, but they all end up forcing you to
abandon the array syntax in favour of macros or functions.

Now I have two questions - one is historical, and the other practical.

1.) Surely malloc (and friends) must store the size allocated for a
particular memory allocation, if malloc is to know how much to
deallocate when a free() occurs? Thus, why was the C library designed
in such a fashion as not to make this information available? Or I am
seriously missing something here?

2.) Why not store the size of the array in its first four bytes (or
first sizeof( size_t ) bytes ), and then shift the pointer to the
array on by four bytes? Thus one has:

first 4 bytes everything else
[ size ][ data ]
/\
void * blah ---'

Then it should behave as a "normal" array, with the added advantage of
knowing its size. The reason I have doubts here, is that if this was
such a good idea, I'm sure it would already have been widely used. Any
compelling reason for avoiding this? This is a bit of hackery, but the
hackery will be confined to the functions for allocating, resizing and
checking the size of the array.

The code could work as follows:

void*
malloc_array( size_t element_size, size_t items )
{
size_t sz = element_size * items;
void* result = malloc( sz + sizeof( size_t ) ); /* allocate memory
for array and for size chunk */
*((size_t*) result) = items; /* assign the size to
the first few bytes */
return sz + sizeof( size_t ); /* return a pointer
to the array pointing just beyond size chunk */
}

size_t
sizeof_array( void *array )
{
return *( (size_t*) (array - sizeof( size_t )) );
}

This technique of course could also be used to store the byte size of
the elements in the array. Oh yes, and in order to detect whether the
size value was corrupted by accidentally writing over it, one could
use a magic number (which would again be added by the same technique),
which could be consulted with debugging code.
Nov 14 '05 #1
Share this Question
Share on Google+
22 Replies


P: n/a
"Wynand Winterbach" <wy****@realtimerodeo.net> wrote in message
news:7b**************************@posting.google.c om...
I think every C programmer can relate to the frustrations that malloc
allocated arrays bring.
I don't find them frustrating at all.

In particular, I've always found the fact that
the size of an array must be stored separately to be a nightmare.
Why? Why is it any more 'frustrating' than keeping track
of any other piece of information in a program? Also note
that memory obtained with 'malloc()' isn't inherently an 'array',
it's just a 'chunk' of memory, which you might or might not use
to store an array.

There are of course many solutions, but they all end up forcing you to
abandon the array syntax in favour of macros or functions.
Not at all. The 'best' solution imo is to simply save the
size you allocate. And again, allocated memory isn't required
to be used as an array.

Now I have two questions - one is historical, and the other practical.

1.) Surely malloc (and friends) must store the size allocated for a
particular memory allocation, if malloc is to know how much to
deallocate when a free() occurs?
An implementation of 'malloc()' must of course keep 'housekeeping'
information. But each implementation is free to implement 'malloc()'
with whatever method is most appropriate for the target platform.
The language standard only dictates the *behavior* of 'malloc()',
not how it is to be implemented.

Thus, why was the C library designed
in such a fashion as not to make this information available?
Think about it. WHen you call 'malloc()', you *have* this information.
Otherwise you couldn't tell it how much to allocate.

Also, if you're willing to go nonstandard and platform-specific,
many implementations do provide a function to give the information
you're after. Check your documentation.
Or I am
seriously missing something here?.
I think you're just being lazy. :-)

2.) Why not store the size of the array in its first four bytes (or
first sizeof( size_t ) bytes ), and then shift the pointer to the
array on by four bytes? Thus one has:

first 4 bytes everything else
[ size ][ data ]
/\
void * blah ---'
This might indeed be the way it is done for some implementations,
but it's not required. Perhaps for a given architecture it's
simply not possible or too inefficient.
Then it should behave as a "normal" array,
IMO you need to stop automatically thinking of allocated memory as
an 'array'. It's simply allocated memory, to be used as desired.

with the added advantage of
knowing its size.
You allocated it, you already know its size. Also note that
the requirement for 'malloc()' is that it allocate *at least*
the number of requested bytes, but it's allowed to allocate more
(would typically be done in the interest of meeting the target
platform's alignment requirements and/or of efficiency).
The reason I have doubts here, is that if this was
such a good idea, I'm sure it would already have been widely used.
It would unnecessarily restrict implementors and possibly which
platforms the C standard library could be implemented for.

Any
compelling reason for avoiding this? This is a bit of hackery, but the
hackery will be confined to the functions for allocating, resizing and
checking the size of the array.
Right, it's 'hackery'. Keep It Simple. Just Remember The Size.
(Pass it to any functions that need it).

The code could work as follows:

void*
malloc_array( size_t element_size, size_t items )
{
size_t sz = element_size * items;
void* result = malloc( sz + sizeof( size_t ) ); /* allocate memory
for array and for size chunk */
*((size_t*) result) = items; /* assign the size to
the first few bytes */
return sz + sizeof( size_t ); /* return a pointer
to the array pointing just beyond size chunk */
}

size_t
sizeof_array( void *array )
{
return *( (size_t*) (array - sizeof( size_t )) );
}
If you want to go to all that trouble, be my guest. But I wouldn't
bother.

This technique of course could also be used to store the byte size of
the elements in the array.
But the memory allocated by 'malloc()' needn't necessarily be
used as an array.

Oh yes, and in order to detect whether the
size value was corrupted by accidentally writing over it, one could
use a magic number (which would again be added by the same technique),
which could be consulted with debugging code.


Perhaps some implementations do this. But again, they're not
required to.

-Mike
Nov 14 '05 #2

P: n/a

"Wynand Winterbach" <wy****@realtimerodeo.net> wrote

I think every C programmer can relate to the frustrations that malloc
allocated arrays bring. In particular, I've always found the fact that
the size of an array must be stored separately to be a nightmare.
"Nightmare" is way too strong. It is a slight inconvenience to have to keep
track of array size separately.
There are of course many solutions, but they all end up forcing you to
abandon the array syntax in favour of macros or functions.
So these are basically non-solutions. If you want a higher level language
that does array management for you, then use C++. Trying to use some sort to
hand-rolled definearray() macro just makes your C code harder to read and
to maintain.
Now I have two questions - one is historical, and the other practical.

1.) Surely malloc (and friends) must store the size allocated for a
particular memory allocation, if malloc is to know how much to
deallocate when a free() occurs? Thus, why was the C library designed
in such a fashion as not to make this information available? Or I am
seriously missing something here?
I wouldn't say "seriously missing". ANSI C could easily have demanded that
the library provide an msize() function, and it could have been added with
minor overhead. However in their wisdom they decided against this, probably
to keep old implementations in business.
2.) Why not store the size of the array in its first four bytes (or
first sizeof( size_t ) bytes ), and then shift the pointer to the
array on by four bytes?

Internally a lot of libraries do this. The problem with doing it yourself is
that it is not the convention, so it will confuse anyone else reading your
code. You've also got to consider that, strictly, if you allocate an array
of structures alignment issues may preclude you from grabbing the first four
bytes. This problem can be solved, but it's another bit of fiddling and
ugliness.

malloc() and free() provide a clean, conceptually simple pair of routines
for memory allocation and deallocation. Once you start messing with them you
also begin to destroy the essential simplicity of the C language.
Nov 14 '05 #3

P: n/a
In article <7b**************************@posting.google.com >
Wynand Winterbach <wy****@realtimerodeo.net> writes:
1.) Surely malloc (and friends) must store the size allocated for a
particular memory allocation, if malloc is to know how much to
deallocate when a free() occurs?
Perhaps. Or perhaps the size is computed via a long, painstaking
process when you call free(), rather than being stored explicitly.

Moreover, which size do you suspect that different malloc()
implementations remember: the size you asked for, or the size
you got? (You may get more than you asked for -- some malloc()s
will round the size up in some cases. For instance, certain fast
but somewhat-space-wasteful malloc()s will give you 4096 bytes
when you ask for 2100. Indeed, almost all malloc()s probably
round up in many cases, if not quite so severely.)

None of this would prohibit a future Standard C from requiring
some kind of "mallocsize" function, but it would require some
debate as to whether mallocsize() must return n for all successful
malloc(n) calls, or whether it could return rounded_up(n). It
might also constrain future implementors (if mallocsize() were
"expected" to be fast, and/or if it must return n rather than
rounded_up(n)).

All of this adds up to: "It is certainly possible, and not necessarily
a bad idea, but it is not as simple as it looks at first either."
2.) Why not store the size of the array in its first four bytes (or
first sizeof( size_t ) bytes ), and then shift the pointer to the
array on by four bytes? Thus one has:

first 4 bytes everything else
[ size ][ data ]
/\
void * blah ---'


If you try this on a Sun SPARCstation (in 32-bit "size_t" mode),
you will find that this technique works for "int"s, "longs", and
"floats", but fails for "long long"s and "double"s. (In 64-bit
mode it will work if size_t is itself a 64-bit type.) The reason
is that the hardware requires 8-byte alignment for 8-byte data
types loaded or stored via ldd/std/ldx/stx/lddf/stdf, and the
compiler tends to use those instructions for those datatypes (with
some exceptions -- function parameters of type "double" are misaligned
in some of the subroutine-call protocols).

Many other architectures have similar restrictions. Even the
otherwise-quite-liberal x86 architecture has strong alignment
constraints for its MMX and SSE instructions (at least if you
want to use the "fast" MOVAPS instruction). Here the required
alignment is not 8 but 16 bytes. (It gets even worse for special
SMP instructions, where "good performance" uses 128-byte alignment!)

There are nonportable ways to make this work -- basically you need
to prefix the allocated space with a union of a single size_t, and
the machine's most restrictive data type (whatever that is) or an
array of bytes of the size of the most restrictive type (whatever
that is, again). Unfortunately, the C Standard gives you no help
in finding this most-restrictive type or its size. Such a type/size
must in fact exist (because malloc() works), but the Standard does
not export it to user code.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #4

P: n/a
I used an implementation of malloc that I wrote for windows 16 bit
that essentially was what you propose: I stored a cookie (magic number)
the size, and at the end I stored another cookie, to check if the block was
overwritten.

But basicaly what you want is a bounded pointer.

A bounded pointer is a pointer that can move within a certain memory area
and not elsewhee.

Support for bounded pointers is inexistent in standard C and you must figure
it
out yourself. You do:

someFn(ptr,siz);

It is up to you to never make a mistake.

This is a hole in the language, and recently I wrote an article about this
in comp.lang.lcc. You can participate in that discussion if you wish.

jacob
Nov 14 '05 #5

P: n/a

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:cc**********@news-reader1.wanadoo.fr...
I used an implementation of malloc that I wrote for windows 16 bit
that essentially was what you propose: I stored a cookie (magic number)
the size, and at the end I stored another cookie, to check if the block was overwritten.

But basicaly what you want is a bounded pointer.

A bounded pointer is a pointer that can move within a certain memory area
and not elsewhee.

Support for bounded pointers is inexistent in standard C and you must figure it
out yourself. You do:

someFn(ptr,siz);

It is up to you to never make a mistake.

This is a hole in the language,


I wouldn't call it (lack of a memory access 'safety net)'
a 'hole' in the language at all. Some might deem having it
a nicety, but it's certainly not needed. It would also impose
an unnecessary restriction on implementations, especially those
tailored for maximum efficiency (rather than 'safety').
Someone here recently said "C is a sharp tool". I agree, and
I like it that way. :-)

-Mike
Nov 14 '05 #6

P: n/a

"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:b2****************@newsread1.news.pas.earthli nk.net...
Someone here recently said "C is a sharp tool". I agree, and
I like it that way. :-)


Mike:

The prototype of a sharp tool is a knife. But you have surely remarked that
it as two sides:

A sharp side, the blade, that YOU DO NOT TOUCH.

A blunt side, the handle, that allows you to drive the blade SAFELY.

Without this blunt side, a sharp knife would be unusable because
you would always CUT YOURSELF THE FINGERS when using it,
unless extremely careful.

What I am missing in C is exactly this blunt side that would allow you to
use safely a sharp tool. A sharp tool without this is UNUSABLE.
You end always with bleeding fingers at the end. You are bound to
make a mistake.

jacob
Nov 14 '05 #7

P: n/a
On Wed, 7 Jul 2004, jacob navia wrote:

jn>
jn>"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
jn>news:b2****************@newsread1.news.pas.eart hlink.net...
jn>> Someone here recently said "C is a sharp tool". I agree, and
jn>> I like it that way. :-)
jn>
jn>Mike:
jn>
jn>The prototype of a sharp tool is a knife. But you have surely remarked that
jn>it as two sides:
jn>
jn>A sharp side, the blade, that YOU DO NOT TOUCH.
jn>
jn>A blunt side, the handle, that allows you to drive the blade SAFELY.
jn>
jn>Without this blunt side, a sharp knife would be unusable because
jn>you would always CUT YOURSELF THE FINGERS when using it,
jn>unless extremely careful.
jn>
jn>What I am missing in C is exactly this blunt side that would allow you to
jn>use safely a sharp tool. A sharp tool without this is UNUSABLE.
jn>You end always with bleeding fingers at the end. You are bound to
jn>make a mistake.

I fail to see why one should bloat the C standard just to allow
programmers not to think? With every language that allows you what C
allows you to do, programmers that are too lazy to think, will shot
themself into their knees. snprintf() has been around for years, yet in
new code you'll find that people just use sprintf() in places where they
really shouldn't. On the other hand someone has still to show me why

void
foo(int i)
{
char buf[100];

sprintf(buf, "%d", i);

...
}

is unsafe (unless the implementation has an int with a size of thousands
of bits). sprintf() is not unsafe by itself, but the usage of it may be
unsafe.

The only way to make people to do programming more secure is to teach them
(and just to avoid unteachable ones) and, if appropriate, to layer
policies on top of the language. There are appropriate standards available
that cripple the usage of C (not C itself) that seems 'secure' to the
writers of the standard. I've been told, for example, that for satellite
on-board control software the use of dynamic memory (malloc() and friends)
is excluded. But again, this is not a technical matter, but one of policy.

harti
Nov 14 '05 #8

P: n/a
jacob navia <ja***@jacob.remcomp.fr> wrote:
"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:b2****************@newsread1.news.pas.earthli nk.net...
Someone here recently said "C is a sharp tool". I agree, and
I like it that way. :-)
Mike: The prototype of a sharp tool is a knife. But you have surely remarked that
it as two sides:


Perhaps Mike was refering to a two-edged sword, knives you can even
put (more or less safely) in the hands of children;-)

Regards, Jens
--
\ Jens Thoms Toerring ___ Je***********@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Nov 14 '05 #9

P: n/a

<Je***********@physik.fu-berlin.de> a écrit dans le message de
news:2l************@uni-berlin.de...
jacob navia <ja***@jacob.remcomp.fr> wrote:
Perhaps Mike was refering to a two-edged sword, knives you can even
put (more or less safely) in the hands of children;-)


Swords (as knives) have handles too. You do NOT touch the sharp edge
with your hands. All sharp tools have blunt edges. We need this for
C. It is not a matter of making C what it isn't, it is just making C
safer to use and KEEPING its qualities!

Nov 14 '05 #10

P: n/a
Hiho,
2.) Why not store the size of the array in its first four bytes (or
first sizeof( size_t ) bytes ), and then shift the pointer to the
array on by four bytes? Thus one has:

first 4 bytes everything else
[ size ][ data ]
/\
void * blah ---'

Then it should behave as a "normal" array, with the added advantage of
knowing its size. The reason I have doubts here, is that if this was
such a good idea, I'm sure it would already have been widely used. Any
compelling reason for avoiding this? This is a bit of hackery, but the
hackery will be confined to the functions for allocating, resizing and
checking the size of the array.

The code could work as follows:

void*
malloc_array( size_t element_size, size_t items )
{
size_t sz = element_size * items;
void* result = malloc( sz + sizeof( size_t ) ); /* allocate memory
for array and for size chunk */
*((size_t*) result) = items; /* assign the size to
the first few bytes */
return sz + sizeof( size_t ); /* return a pointer
to the array pointing just beyond size chunk */
}

size_t
sizeof_array( void *array )
{
return *( (size_t*) (array - sizeof( size_t )) );
}

This technique of course could also be used to store the byte size of
the elements in the array. Oh yes, and in order to detect whether the
size value was corrupted by accidentally writing over it, one could
use a magic number (which would again be added by the same technique),
which could be consulted with debugging code.


Other people have pointed out reasons why not to use this approach
for pointers and how it could go wrong.

However, as you seem to think "array" when you hear or say pointer,
you maybe should have a look at variable length arrays as last
entry of a structure (variable array member). Just put the size in as a
first element of that structure. This, in essence, produces the same
thing as you want to have in a clean way, and if you need a pointer,
you can generate it from the address of the array; however, you have to
do this for every type if you do not want to run into memory alignment
issues.

Something like that:

struct arrayplussize_double {
size_t size;
double array[];
};

with sizeof(struct arrayplussize_double) *ignoring* the flexible
array member but extending the struct size such that the flexible
array member would be correctly aligned.
Allocation works like this

struct arrayplussize_double *myarray = malloc(sizeof(struct
arrayplussize_double)+sizeof(double)*desiredsize);

where desiredsize is the desired size of the array.
Cheers,
Michael

Nov 14 '05 #11

P: n/a
Hi Jacob,

Perhaps Mike was refering to a two-edged sword, knives you can even
put (more or less safely) in the hands of children;-)

Swords (as knives) have handles too. You do NOT touch the sharp edge
with your hands. All sharp tools have blunt edges. We need this for
C. It is not a matter of making C what it isn't, it is just making C
safer to use and KEEPING its qualities!


I have not had a look at lcc nor at your proposals, so
I cannot say anything to what you actually did.
I just wanted to point out that the others do not criticise
your wanting to fit a more convenient and safe handle to a sharp
tool but your remaking this sharp tool into a spoon (in their
opinion)... whether this keeps the qualities you need but
not theirs, I cannot say.

Back to somewhat more on-topical: You are free to sell your bounded
pointers as compiler extensions. I consider it certainly safer seeing
some people getting kitchen knives and spoons when learning to use
tools instead of swords and daggers.

However, I am perhaps not up to the sword yet but I certainly
appreciate having a dagger when I need one.
Apart from all this nice pictures:
None of us is unhappy when you point out possible *additional*
solutions or mention (together with a caveat) that there is
an incredibly fool proof extension in your compiler suite.
The thing is more that we do not want to get these things
*instead* and do not want to discuss your compiler only.

Especially if you are handling things in a way which is not
conforming to the standard as you sometimes point out
then you are making your comments off-topic. Also, some people
react to a tone sounding too much like heralding the only true
and of course different from all others solution.
I think that you have presented some nice ideas up to now and
come time I certainly will have a look at lcc if I need a
compiler in an environment where lcc fits in.
Cheers
Michael

Nov 14 '05 #12

P: n/a

"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:cc**********@news-reader4.wanadoo.fr...

"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:b2****************@newsread1.news.pas.earthli nk.net...
Someone here recently said "C is a sharp tool". I agree, and
I like it that way. :-)
Mike:

The prototype of a sharp tool is a knife. But you have surely remarked

that it as two sides:

A sharp side, the blade, that YOU DO NOT TOUCH.
Right. Or only touch it very lightly.

A blunt side,
Some knife blades have a blunt side, others are sharp on
both edges.
the handle, that allows you to drive the blade SAFELY.
Right. My 'handle' is my mind, my judgement, my experience.


Without this blunt side, a sharp knife would be unusable because
you would always CUT YOURSELF THE FINGERS when using it,
unless extremely careful.
Right. That's why when I write C, I think carefully while doing it.

What I am missing in C is exactly this blunt side that would allow you to
use safely a sharp tool.
I'm not missing it. It's part of me.

A sharp tool without this is UNUSABLE.
So think while you code in C.
You end always with bleeding fingers at the end. You are bound to
make a mistake.


Everyone makes mistakes, with or without 'safety features'. How
many times have you heard of folks seriously injured or killed
in auto accidents because they failed to fasten their seatbelts?

-Mike
Nov 14 '05 #13

P: n/a
In article <cc**********@news-reader4.wanadoo.fr>,
jacob navia <ja***@jacob.remcomp.fr> wrote:

"Mike Wahler" <mk******@mkwahler.net> a écrit dans le message de
news:b2****************@newsread1.news.pas.earthl ink.net...
Someone here recently said "C is a sharp tool". I agree, and
I like it that way. :-)

<snip>What I am missing in C is exactly this blunt side that would allow you to
use safely a sharp tool. A sharp tool without this is UNUSABLE.
You end always with bleeding fingers at the end. You are bound to
make a mistake.
Funny, I thought that being able to keep the size around so you knew
how big the array is was that blunt side.

There's nothing wrong with requiring programmers to be careful with
languages like C, any more than with requiring anybody else to be careful
using potentially dangerous tools. If you want Java, you know where to
find it.
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca The smartest people I know aren't programmers. What does that say?

Nothing surprising!
--Andrew Dalke and Coby Beck in comp.lang.scheme
Nov 14 '05 #14

P: n/a
On Wed, 7 Jul 2004 15:07:35 +0200, "jacob navia"
<ja***@jacob.remcomp.fr> wrote:

<Je***********@physik.fu-berlin.de> a écrit dans le message de
news:2l************@uni-berlin.de...
jacob navia <ja***@jacob.remcomp.fr> wrote:
Perhaps Mike was refering to a two-edged sword, knives you can even
put (more or less safely) in the hands of children;-)


Swords (as knives) have handles too. You do NOT touch the sharp edge
with your hands. All sharp tools have blunt edges. We need this for
C. It is not a matter of making C what it isn't, it is just making C
safer to use and KEEPING its qualities!


With C, you are free to design the hilt in any way you like. You may
encrust it with diamonds, you can wrap a towel around the blade's
base, or just use it bare and with bare hands, risking very nasty
cuts. You have done a great job with LCC (and I love using it), but
IMHO _the_standard_ is fine the way it is. But this HO is coming from
a person who doesn't like that fact that functions can return structs,
so feel free to discard it :).

--
aib

ISP e-mail accounts are good for receiving spam.
Nov 14 '05 #15

P: n/a
"Malcolm" <ma*****@55bank.freeserve.co.uk> wrote in message news:<cc**********@news8.svr.pol.co.uk>...
"Wynand Winterbach" <wy****@realtimerodeo.net> wrote

I think every C programmer can relate to the frustrations that malloc
allocated arrays bring. In particular, I've always found the fact that
the size of an array must be stored separately to be a nightmare.

"Nightmare" is way too strong. It is a slight inconvenience to have to keep
track of array size separately.


Ok, well, it annoys the hell out of me. Really.

There are of course many solutions, but they all end up forcing you to
abandon the array syntax in favour of macros or functions.

So these are basically non-solutions. If you want a higher level language
that does array management for you, then use C++. Trying to use some sort to
hand-rolled definearray() macro just makes your C code harder to read and
to maintain.


I agree that macros make code harder to read, but this is no reason to
go for C++. C++ has the major drawback, that one is not guaranteed that
&a[0], where a is a std::vector of some type, will point to a chunk of
memory containing your values. It happens to work for the my g++ library's
implementation, but AFAIK, it is not required to work. This makes it
a real burden to ensure portable code that must also work with C functions.

So why not write a CArray type, which gives a function to do this? I could,
but I happen to want to port my code Plan 9, and hence I want to keep it
in C. I don't find the GCC port to Plan 9 satisfactory, and I like C,
besides.
Now I have two questions - one is historical, and the other practical.

1.) Surely malloc (and friends) must store the size allocated for a
particular memory allocation, if malloc is to know how much to
deallocate when a free() occurs? Thus, why was the C library designed
in such a fashion as not to make this information available? Or I am
seriously missing something here?

I wouldn't say "seriously missing". ANSI C could easily have demanded that
the library provide an msize() function, and it could have been added with
minor overhead. However in their wisdom they decided against this, probably
to keep old implementations in business.

2.) Why not store the size of the array in its first four bytes (or
first sizeof( size_t ) bytes ), and then shift the pointer to the
array on by four bytes?

Internally a lot of libraries do this. The problem with doing it yourself is
that it is not the convention, so it will confuse anyone else reading your
code. You've also got to consider that, strictly, if you allocate an array
of structures alignment issues may preclude you from grabbing the first four
bytes. This problem can be solved, but it's another bit of fiddling and
ugliness.

malloc() and free() provide a clean, conceptually simple pair of routines
for memory allocation and deallocation. Once you start messing with them you
also begin to destroy the essential simplicity of the C language.


No, that is patently untrue. I cannot see how this would destroy the
simplicity of C. My method is also conceptually simple (I think).
Nov 14 '05 #16

P: n/a
"jacob navia" <ja***@jacob.remcomp.fr> wrote in message news:<cc**********@news-reader1.wanadoo.fr>...
I used an implementation of malloc that I wrote for windows 16 bit
that essentially was what you propose: I stored a cookie (magic number)
the size, and at the end I stored another cookie, to check if the block was
overwritten.

But basicaly what you want is a bounded pointer.

A bounded pointer is a pointer that can move within a certain memory area
and not elsewhee.

Support for bounded pointers is inexistent in standard C and you must figure
it
out yourself. You do:

someFn(ptr,siz);

It is up to you to never make a mistake.

This is a hole in the language, and recently I wrote an article about this
in comp.lang.lcc. You can participate in that discussion if you wish.

jacob


Actually I don't want a bounded pointer, although one could certainly
provide a bounds checking indexing function. I only really want to be
able to forget about maintaining an array's size. However, I think
what you implemented was pretty much what I have in mind.

I don't know whether I consider the lack of bounded pointers to be a
hole in the language. I don't even consider my critisism really to
indicate a hole in the language either. I just find it a pain to
maintain an array size separately.
Nov 14 '05 #17

P: n/a
Michael Mair <ma********************@ians.uni-stuttgart.de> wrote in message news:<cc**********@infosun2.rus.uni-stuttgart.de>...
Hiho,
2.) Why not store the size of the array in its first four bytes (or
first sizeof( size_t ) bytes ), and then shift the pointer to the
array on by four bytes? Thus one has:

first 4 bytes everything else
[ size ][ data ]
/\
void * blah ---'

Then it should behave as a "normal" array, with the added advantage of
knowing its size. The reason I have doubts here, is that if this was
such a good idea, I'm sure it would already have been widely used. Any
compelling reason for avoiding this? This is a bit of hackery, but the
hackery will be confined to the functions for allocating, resizing and
checking the size of the array.

The code could work as follows:

void*
malloc_array( size_t element_size, size_t items )
{
size_t sz = element_size * items;
void* result = malloc( sz + sizeof( size_t ) ); /* allocate memory
for array and for size chunk */
*((size_t*) result) = items; /* assign the size to
the first few bytes */
return sz + sizeof( size_t ); /* return a pointer
to the array pointing just beyond size chunk */
}

size_t
sizeof_array( void *array )
{
return *( (size_t*) (array - sizeof( size_t )) );
}

This technique of course could also be used to store the byte size of
the elements in the array. Oh yes, and in order to detect whether the
size value was corrupted by accidentally writing over it, one could
use a magic number (which would again be added by the same technique),
which could be consulted with debugging code.
Other people have pointed out reasons why not to use this approach
for pointers and how it could go wrong.

However, as you seem to think "array" when you hear or say pointer,


Err no. I don't know why everyone is so eager to assume that because I
was talking about arrays that I think all pointers point to arrays.
It's a bit patronising to make such assumptions.
you maybe should have a look at variable length arrays as last
entry of a structure (variable array member). Just put the size in as a
first element of that structure. This, in essence, produces the same
thing as you want to have in a clean way, and if you need a pointer,
you can generate it from the address of the array; however, you have to
do this for every type if you do not want to run into memory alignment
issues.

Something like that:

struct arrayplussize_double {
size_t size;
double array[];
};

with sizeof(struct arrayplussize_double) *ignoring* the flexible
array member but extending the struct size such that the flexible
array member would be correctly aligned.
Allocation works like this

struct arrayplussize_double *myarray = malloc(sizeof(struct
arrayplussize_double)+sizeof(double)*desiredsize);

where desiredsize is the desired size of the array.
I don't like this approach. It's used by GLib, and it's exactly this
that made me think of another approach.

Cheers,
Michael

Nov 14 '05 #18

P: n/a
Harti Brandt <br****@dlr.de> wrote in message news:<20******************@beagle.kn.op.dlr.de>...
On Wed, 7 Jul 2004, jacob navia wrote:

jn>
jn>"Mike Wahler" <mk******@mkwahler.net> a crit dans le message de
jn>news:b2****************@newsread1.news.pas.eart hlink.net...
jn>> Someone here recently said "C is a sharp tool". I agree, and
jn>> I like it that way. :-)
jn>
jn>Mike:
jn>
jn>The prototype of a sharp tool is a knife. But you have surely remarked t
hat
jn>it as two sides:
jn>
jn>A sharp side, the blade, that YOU DO NOT TOUCH.
jn>
jn>A blunt side, the handle, that allows you to drive the blade SAFELY.
jn>
jn>Without this blunt side, a sharp knife would be unusable because
jn>you would always CUT YOURSELF THE FINGERS when using it,
jn>unless extremely careful.
jn>
jn>What I am missing in C is exactly this blunt side that would allow you t
o
jn>use safely a sharp tool. A sharp tool without this is UNUSABLE.
jn>You end always with bleeding fingers at the end. You are bound to
jn>make a mistake.

I fail to see why one should bloat the C standard just to allow
programmers not to think? With every language that allows you what C
allows you to do, programmers that are too lazy to think, will shot
themself into their knees. snprintf() has been around for years, yet in
new code you'll find that people just use sprintf() in places where they
really shouldn't. On the other hand someone has still to show me why
Because even smart programmers make mistakes. If we took the argument
that programmers "have to think" to its fullest, then I see no reason
why we abandoned
assembler. Sure, assembler isn't portable, but I hardly think people
made
the shift for this reason. The less you have to keep programming
details in
mind, the more you can concentrate on the problem at hand.
void
foo(int i)
{
char buf[100];

sprintf(buf, "%d", i);

...
}

is unsafe (unless the implementation has an int with a size of thousands
of bits). sprintf() is not unsafe by itself, but the usage of it may be
unsafe.

The only way to make people to do programming more secure is to teach them
(and just to avoid unteachable ones) and, if appropriate, to layer
policies on top of the language. There are appropriate standards available

that cripple the usage of C (not C itself) that seems 'secure' to the
writers of the standard. I've been told, for example, that for satellite
on-board control software the use of dynamic memory (malloc() and friends)

is excluded. But again, this is not a technical matter, but one of policy.
This has more to do with the fact that such programs must meet
realtime demands.
This isn't generally possible with dynamic allocation.

Proper usage of tools like splint and spin are far more beneficial
than the application of some Draconian policy on how to do secure
programming.
harti
--

Nov 14 '05 #19

P: n/a
"Mike Wahler" <mk******@mkwahler.net> wrote in message news:<dR*****************@newsread1.news.pas.earth link.net>...
"Wynand Winterbach" <wy****@realtimerodeo.net> wrote in message
news:7b**************************@posting.google.c om...
I think every C programmer can relate to the frustrations that malloc
allocated arrays bring.
I don't find them frustrating at all.

In particular, I've always found the fact that
the size of an array must be stored separately to be a nightmare.


Why? Why is it any more 'frustrating' than keeping track
of any other piece of information in a program?


Anything that decouples information in this way makes the life
of the programmer harder, and the code harder to read! Array size
is an integral part of the concept of an array. Would you have
an employee struct, only to store an employee's birth date separately?
I doubt it.

Also note that memory obtained with 'malloc()' isn't inherently an 'array',
it's just a 'chunk' of memory, which you might or might not use
to store an array.
All I'm going to say here is DUH. Geesh, I cannot think of a single
mammal capable of C programming that would assume a malloc chunk
is anything more than just that - a chunk of memory. I never even
once intimated to any such suggestion.
There are of course many solutions, but they all end up forcing you to
abandon the array syntax in favour of macros or functions.
Not at all. The 'best' solution imo is to simply save the
size you allocate. And again, allocated memory isn't required
to be used as an array.

Now I have two questions - one is historical, and the other practical.

1.) Surely malloc (and friends) must store the size allocated for a
particular memory allocation, if malloc is to know how much to
deallocate when a free() occurs?


An implementation of 'malloc()' must of course keep 'housekeeping'
information. But each implementation is free to implement 'malloc()'
with whatever method is most appropriate for the target platform.
The language standard only dictates the *behavior* of 'malloc()',
not how it is to be implemented.

Thus, why was the C library designed
in such a fashion as not to make this information available?


Think about it. WHen you call 'malloc()', you *have* this information.
Otherwise you couldn't tell it how much to allocate.


Yes, but again you're missing the point. Of course you have it! But
this is not my principle problem.
Also, if you're willing to go nonstandard and platform-specific,
many implementations do provide a function to give the information
you're after. Check your documentation.
Or I am
seriously missing something here?.


I think you're just being lazy. :-)


There is no virtue in the calvinistic discipline wrought onto the
programmer who has to keep track of so many details of the language
at the expense of the problem s/he is solving.

This is why Python is so great - the ultimate language for the lazy
programmer. You also happen to produce pretty solid code with it,
due to the concentration on the problem at hand. And when you need
speed, use SWIG and C to build a module.

2.) Why not store the size of the array in its first four bytes (or
first sizeof( size_t ) bytes ), and then shift the pointer to the
array on by four bytes? Thus one has:

first 4 bytes everything else
[ size ][ data ]
/\
void * blah ---'


This might indeed be the way it is done for some implementations,
but it's not required. Perhaps for a given architecture it's
simply not possible or too inefficient.
Then it should behave as a "normal" array,


IMO you need to stop automatically thinking of allocated memory as
an 'array'. It's simply allocated memory, to be used as desired.

with the added advantage of
knowing its size.


You allocated it, you already know its size. Also note that
the requirement for 'malloc()' is that it allocate *at least*
the number of requested bytes, but it's allowed to allocate more
(would typically be done in the interest of meeting the target
platform's alignment requirements and/or of efficiency).
The reason I have doubts here, is that if this was
such a good idea, I'm sure it would already have been widely used.


It would unnecessarily restrict implementors and possibly which
platforms the C standard library could be implemented for.

Any
compelling reason for avoiding this? This is a bit of hackery, but the
hackery will be confined to the functions for allocating, resizing and
checking the size of the array.


Right, it's 'hackery'. Keep It Simple. Just Remember The Size.
(Pass it to any functions that need it).

The code could work as follows:

void*
malloc_array( size_t element_size, size_t items )
{
size_t sz = element_size * items;
void* result = malloc( sz + sizeof( size_t ) ); /* allocate memory
for array and for size chunk */
*((size_t*) result) = items; /* assign the size to
the first few bytes */
return sz + sizeof( size_t ); /* return a pointer
to the array pointing just beyond size chunk */
}

size_t
sizeof_array( void *array )
{
return *( (size_t*) (array - sizeof( size_t )) );
}


If you want to go to all that trouble, be my guest. But I wouldn't
bother.

This technique of course could also be used to store the byte size of
the elements in the array.


But the memory allocated by 'malloc()' needn't necessarily be
used as an array.

Oh yes, and in order to detect whether the
size value was corrupted by accidentally writing over it, one could
use a magic number (which would again be added by the same technique),
which could be consulted with debugging code.


Perhaps some implementations do this. But again, they're not
required to.

-Mike

Nov 14 '05 #20

P: n/a
"Mike Wahler" <mk******@mkwahler.net> wrote in message news:<b2****************@newsread1.news.pas.earthl ink.net>...
"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:cc**********@news-reader1.wanadoo.fr...
I used an implementation of malloc that I wrote for windows 16 bit
that essentially was what you propose: I stored a cookie (magic number)
the size, and at the end I stored another cookie, to check if the block

was
overwritten.

But basicaly what you want is a bounded pointer.

A bounded pointer is a pointer that can move within a certain memory area
and not elsewhee.

Support for bounded pointers is inexistent in standard C and you must

figure
it
out yourself. You do:

someFn(ptr,siz);

It is up to you to never make a mistake.

This is a hole in the language,


I wouldn't call it (lack of a memory access 'safety net)'
a 'hole' in the language at all. Some might deem having it
a nicety, but it's certainly not needed. It would also impose
an unnecessary restriction on implementations, especially those
tailored for maximum efficiency (rather than 'safety').
Someone here recently said "C is a sharp tool". I agree, and
I like it that way. :-)

-Mike


Just out of curiosity, why do you like it that way?

I've always found it irritating, but tolerable.
Nov 14 '05 #21

P: n/a
In <1a**************************@posting.google.com > ro***********@antenova.com (Rob Thorpe) writes:
"Mike Wahler" <mk******@mkwahler.net> wrote in message news:<b2****************@newsread1.news.pas.earthl ink.net>...
"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:cc**********@news-reader1.wanadoo.fr...
> I used an implementation of malloc that I wrote for windows 16 bit
> that essentially was what you propose: I stored a cookie (magic number)
> the size, and at the end I stored another cookie, to check if the block

was
> overwritten.
>
> But basicaly what you want is a bounded pointer.
>
> A bounded pointer is a pointer that can move within a certain memory area
> and not elsewhee.
>
> Support for bounded pointers is inexistent in standard C and you must

figure
> it
> out yourself. You do:
>
> someFn(ptr,siz);
>
> It is up to you to never make a mistake.
>
> This is a hole in the language,


I wouldn't call it (lack of a memory access 'safety net)'
a 'hole' in the language at all. Some might deem having it
a nicety, but it's certainly not needed. It would also impose
an unnecessary restriction on implementations, especially those
tailored for maximum efficiency (rather than 'safety').
Someone here recently said "C is a sharp tool". I agree, and
I like it that way. :-)

-Mike


Just out of curiosity, why do you like it that way?


Because correct code doesn't have to suffer from the overhead related to
bound checking.

Because such checking is next to impossible to get right in a language
like C. I have seen a confused poster: his code was correct, but the
compiler got the bound checking wrong.

Because people tend to be less careful when they know a "safety net"
exists, no matter how large the holes in this safety net are.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #22

P: n/a
Da*****@cern.ch (Dan Pop) wrote in message news:<cd**********@sunnews.cern.ch>...
In <1a**************************@posting.google.com > ro***********@antenova.com (Rob Thorpe) writes:
"Mike Wahler" <mk******@mkwahler.net> wrote in message news:<b2****************@newsread1.news.pas.earthl ink.net>...
"jacob navia" <ja***@jacob.remcomp.fr> wrote in message
news:cc**********@news-reader1.wanadoo.fr...
> I used an implementation of malloc that I wrote for windows 16 bit
> that essentially was what you propose: I stored a cookie (magic number)
> the size, and at the end I stored another cookie, to check if the block was > overwritten.
>
> But basicaly what you want is a bounded pointer.
>
> A bounded pointer is a pointer that can move within a certain memory area
> and not elsewhee.
>
> Support for bounded pointers is inexistent in standard C and you must figure > it
> out yourself. You do:
>
> someFn(ptr,siz);
>
> It is up to you to never make a mistake.
>
> This is a hole in the language,

I wouldn't call it (lack of a memory access 'safety net)'
a 'hole' in the language at all. Some might deem having it
a nicety, but it's certainly not needed. It would also impose
an unnecessary restriction on implementations, especially those
tailored for maximum efficiency (rather than 'safety').
Someone here recently said "C is a sharp tool". I agree, and
I like it that way. :-)

-Mike
Just out of curiosity, why do you like it that way?


Because correct code doesn't have to suffer from the overhead related to
bound checking.


Why should it anyway?

Because such checking is next to impossible to get right in a language
like C. I have seen a confused poster: his code was correct, but the
compiler got the bound checking wrong.
No it isn't. See the link I sent last time we discussed this.
Because people tend to be less careful when they know a "safety net"
exists, no matter how large the holes in this safety net are


Maybe people tend to be less careful driving cars with airbags in than
those without. It doesn't make it a bad idea.
Nov 14 '05 #23

This discussion thread is closed

Replies have been disabled for this discussion.