By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,079 Members | 926 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,079 IT Pros & Developers. It's quick & easy.

type of array index?

P: n/a

For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?

OT: near as I can tell my implementation of gcc doesn't have an
<stddef.h> file with ptrdiff_t defined. Am I overlooking something?

--
Nov 14 '05 #1
Share this Question
Share on Google+
29 Replies


P: n/a
> what should be the type of 'k'? Should it be ptrdiff_t?

It should be any type of integer.

Jon
----
Learn to program using Linux assembly language
http://www.cafeshops.com/bartlettpublish.8640017
Nov 14 '05 #2

P: n/a
If you really want ptrdiff_t, it appears to be in malloc.h. stdint.h
has its limits. Interestingly, obstack.h thinks that it appears in
stddef.h, too. Perhaps someone should report that to the maintainers.
Using gcc 3.3.1.

Jon
----
Learn to program using Linux assembly language
http://www.cafeshops.com/bartlettpublish.8640017
Nov 14 '05 #3

P: n/a
sh********@ticnet.com wrote:
For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?
Essentially, there are two choices:
1. Use int whenever sufficient, use a wider type when necessary.
Sufficient means sufficient with respect to the minimal requirements
from the standard.
2. Always use size_t.
size_t is sufficient for all arrays and for indexing all storage you
can allocate (bytewise).

Personally, I favor the second approach but I am not religious
about it. Drawbacks: You have to be more careful due to the inherent
nonnegativity of the index type, e.g.
for (i=MAX-1; i>=0; i--)
operation_on(array[i]);
either has to become
for (i=MAX; i>0; i--)
operation_on(array[i-1]);
or
for (i=MAX-1; i!=-1; i--)
operation_on(array[i]);
which I like better.
As for int: It is possible that int indices are "faster" as int usually
is the "natural" integer type of the system but if this is really
the case, one can still optimise where necessary. Undefined behaviour
due to forgetting about "old-fashioned" 16 bit ints and resulting
strange errors are less nice.

However, there have been many discussions about that.
Do as you like and look for range/sanity checks as necessary (in both
cases).

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?
Any integer type which suffices for your range requirements.
I would use ptrdiff_t only where appropriate.

If ssize_t had made it into the standard, I'd suggest that.

OT: near as I can tell my implementation of gcc doesn't have an
<stddef.h> file with ptrdiff_t defined. Am I overlooking something?


That you cannot find it in the header file does not mean that
it is not there by some magic (or simply comes in from another
included file).

Just try it out:
If

#include <stddef.h>
int main (void) { int a[3]; ptrdiff_t b = (a+2)-(a+1); return 0; }

does not compile, file a bug report.
If it does compile without the #include, do the same.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 14 '05 #4

P: n/a
Jonathan Bartlett wrote:
If you really want ptrdiff_t, it appears to be in malloc.h. stdint.h
malloc.h is not a C standard header, thus offtopic.
has its limits. Interestingly, obstack.h thinks that it appears in
dito for obstack.h
stddef.h, too. Perhaps someone should report that to the maintainers.
Using gcc 3.3.1.


If including <stddef.h> has the effect that size_t or ptrdiff_t
are available when they were not before, everything is fine:
The implementation may do each and everything behind the scenes.

I did not yet have any problems with gcc's compliance in _this_
respect.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 14 '05 #5

P: n/a
sh********@ticnet.com wrote:
For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?


It depends.

In concrete (non-generic) code the type is usually dictated by the
natural properties of the application area. You should normally have a
type that designates the total amount of objects of 'MYTYPE' already.
This type is an obvious candidate for the array index type. For example,
if this is an array of, say, file handles and you use 'unsigned short'
object to store the total number of files, then 'unsigned short' would
be a natural choice for the index type for this array. This, of course,
applies only if you don't care about negative indexing. For negative
indexing you'd have to use either 'short' or 'int', depending on the
required range.

In generic code 'ptrdiff_t' is the first candidate for array index type,
which also supports negative indices. If you don't care about negative
indices or you want to emphasize the fact that negative indices are not
allowed in some context, then you might want to go with 'size_t'. This
unsigned type is large enough for indexing of any array.

However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more related
to the concept of "container element count". These two concepts are not
related and using 'size_t' for array indexing is a conceptual error. It
might be more elegant to "hide" the 'size_t' behind a typedef name as
follows

typedef size_t pos_ptrdiff_t;

and use 'pos_ptrdiff_t' for generic non-negative array indexing.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #6

P: n/a
Andrey Tarasevich wrote:
sh********@ticnet.com wrote:
For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?

It depends.

In concrete (non-generic) code the type is usually dictated by the
natural properties of the application area. You should normally have a
type that designates the total amount of objects of 'MYTYPE' already.
This type is an obvious candidate for the array index type. For example,
if this is an array of, say, file handles and you use 'unsigned short'
object to store the total number of files, then 'unsigned short' would
be a natural choice for the index type for this array. This, of course,
applies only if you don't care about negative indexing. For negative
indexing you'd have to use either 'short' or 'int', depending on the
required range.

In generic code 'ptrdiff_t' is the first candidate for array index type,
which also supports negative indices. If you don't care about negative
indices or you want to emphasize the fact that negative indices are not
allowed in some context, then you might want to go with 'size_t'. This
unsigned type is large enough for indexing of any array.

However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more related
to the concept of "container element count". These two concepts are not
related and using 'size_t' for array indexing is a conceptual error. It
might be more elegant to "hide" the 'size_t' behind a typedef name as
follows

typedef size_t pos_ptrdiff_t;

and use 'pos_ptrdiff_t' for generic non-negative array indexing.


Nicely put.
Something for a confessing size_t-user like me to think about :-)
Grabbed and stored.

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 14 '05 #7

P: n/a
Jonathan Bartlett <jo*****@eskimo.com> writes:
If you really want ptrdiff_t, it appears to be in malloc.h. stdint.h
has its limits. Interestingly, obstack.h thinks that it appears in
stddef.h, too. Perhaps someone should report that to the
maintainers. Using gcc 3.3.1.


ptrdiff_t is declared in <stddef.h>. <malloc.h> is not a standard
header (the malloc() function is declared in <stdlib.h>), nor is
<obstack.h>. And <stdint.h> is a standard header in C99, but not in
C90, so not all current implementations will provide it (though it's
not too difficult to roll your own).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #8

P: n/a
in comp.lang.c i read:
OT: near as I can tell my implementation of gcc doesn't have an
<stddef.h> file with ptrdiff_t defined. Am I overlooking something?


there is no requirement by the standard that a file exist -- it's allowed
to be magical, perhaps entirely internal. though in fact it does exist,
it's just that it's not where you are looking.

--
a signature
Nov 14 '05 #9

P: n/a
"Andrey Tarasevich" <an**************@hotmail.com> wrote in message
news:11*************@news.supernews.com...
sh********@ticnet.com wrote:
For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?
It depends.

.... In generic code 'ptrdiff_t' is the first candidate for array index type,
which also supports negative indices. If you don't care about negative
indices or you want to emphasize the fact that negative indices are not
allowed in some context, then you might want to go with 'size_t'. This
unsigned type is large enough for indexing of any array.

However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more related
to the concept of "container element count". These two concepts are not
related and using 'size_t' for array indexing is a conceptual error.


Since an array is an object, and size_t is defined to be large enough to
represent the size of any object, size_t naturally must be able to represent
any possible array subscript.

In _The Standard C Library_, P. J. Plauger writes:

"Unlike ptrdiff_t, however, size_t is /very/ useful. It is the safest type
to represent any integer data object you use as an array subscript. You
don't have to worry if a small array evolves into a very large one as the
program changes. Subscript arithmetic will never overflow when performed in
type size_t." and "You should make a point of using type size_t /anywhere/
your program performs array subscripting or address arithmetic." (emphasis
in original)

The only reason one might prefer ptrdiff_t for array subscripts is where,
due to looping, you need the ability to count down past zero. In this case,
I would prefer ssize_t (where available) over ptrdiff_t, but first I'd
consider whether it was feasible to restructure the loop condition such that
a signed subscript wasn't needed.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin

Nov 14 '05 #10

P: n/a
Stephen Sprunk wrote:
> For maximum portability what should the type of an array index be? Can
> any integer type be used safely? Or should I only use an unsigned type?
> Or what?
>
> If I'm using pointers to access array elements as *(mptr+k) where I've
> declared
>
> MYTYPE *mptr;
>
> what should be the type of 'k'? Should it be ptrdiff_t?
It depends.

...
In generic code 'ptrdiff_t' is the first candidate for array index type,
which also supports negative indices. If you don't care about negative
indices or you want to emphasize the fact that negative indices are not
allowed in some context, then you might want to go with 'size_t'. This
unsigned type is large enough for indexing of any array.

However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more related
to the concept of "container element count". These two concepts are not
related and using 'size_t' for array indexing is a conceptual error.


Since an array is an object, and size_t is defined to be large enough to
represent the size of any object, size_t naturally must be able to represent
any possible array subscript.


That's the very reason why I suggest using 'size_t' for array indexing
in generic context. For example, in a generic array-support library.

At application level there's rarely a need to have an array just for the
sake of having an array. What is normally needed at application level is
a container with index-based random-access interface. This could be a
"traditional" array, this could be an associative array, this could be
something like a deque. Today it can be one, tomorrow - another. At this
conceptual level there's no relation between container element count and
object size. The fact that this relation holds for a "traditional" array
is noting more than an accident, a low level detail, which has
absolutely no reason to play any role in the process of choosing the
index type.
In _The Standard C Library_, P. J. Plauger writes:

"Unlike ptrdiff_t, however, size_t is /very/ useful. It is the safest type
to represent any integer data object you use as an array subscript. You
don't have to worry if a small array evolves into a very large one as the
program changes. Subscript arithmetic will never overflow when performed in
type size_t." and "You should make a point of using type size_t /anywhere/
your program performs array subscripting or address arithmetic." (emphasis
in original)
Great, but applies mostly to library-level (generic) code.

If in my program a have a dedicated typename for designating the day of
the week, say

typedef enum DayOfTheWeek { /* ... */ } DayOfTheWeek;

and in some place I need to iterate through an array indexed by the day
of the week, then I'd make a point of using 'DayOfTheWeek' to represent
the index, never the completely irrelevant 'size_t'.
The only reason one might prefer ptrdiff_t for array subscripts is where,
due to looping, you need the ability to count down past zero. In this case,
I would prefer ssize_t (where available) over ptrdiff_t, but first I'd
consider whether it was feasible to restructure the loop condition such that
a signed subscript wasn't needed.


I would do the latter.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #11

P: n/a
"Andrey Tarasevich" <an**************@hotmail.com> wrote in message
sh********@ticnet.com wrote:
For maximum portability what should the type of an array index be?
Can any integer type be used safely? Or should I only use an
unsigned type? Or what?
Any integer type is allowed. For example main() uses argc:

int main(int argc, char* argv[])

If I'm using pointers to access array elements as *(mptr+k) where
I've declared

[...]
However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more
related to the concept of "container element count". These two
concepts are not related and using 'size_t' for array indexing
is a conceptual error.


Not correct, an array is an object and

sizeof array

return the size of that object. Hence, size_t is a good type choice
for an array index . Note that number of array elements

size_t N = sizeof (array) / sizeof (array[0]);

will never overflow with size_t.

--
Tor <torust AT online DOT no>
Nov 14 '05 #12

P: n/a
Tor Rustad wrote:
...
However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more
related to the concept of "container element count". These two
concepts are not related and using 'size_t' for array indexing
is a conceptual error.


Not correct, an array is an object and

sizeof array

return the size of that object. Hence, size_t is a good type choice
for an array index . Note that number of array elements

size_t N = sizeof (array) / sizeof (array[0]);

will never overflow with size_t.
...


And? This is exactly what I said in my message before the quoted part.
But how is this relevant within the context of paragraph quoted above?

Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #13

P: n/a
Andrey Tarasevich <an**************@hotmail.com> writes:
[...]
Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.


Object size and container element count are the same for character
arrays, and since an array element can't be smaller than one byte,
you're guaranteed that size_t is at least big enough to hold any array
index.

If you're writing code that deals with arrays generically (such as the
standard qsort() function), it's probably safest to use size_t. If
you're using something more specific to a given problem domain, and
you happen to know that an array will never have more than INT_MAX
elements, you can use int.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #14

P: n/a

"Andrey Tarasevich" <an**************@hotmail.com> wrote in message
news:11*************@news.supernews.com...
Stephen Sprunk wrote:
However, from the pedantic point of view, 'size_t' is intended to
implement a concept of "object size", while array index is more
related to the concept of "container element count". These two
concepts are not related and using 'size_t' for array indexing is
a conceptual error.
Since an array is an object, and size_t is defined to be large
enough to represent the size of any object, size_t naturally
must be able to represent any possible array subscript.


That's the very reason why I suggest using 'size_t' for array indexing
in generic context. For example, in a generic array-support library.

At application level there's rarely a need to have an array just for
the sake of having an array.


Right, but if you get in the habit of writing code where the index is an
int, sooner or later you'll write code that (after it's been edited by a
dozen other coders) overflows that int but size_t would have worked.

Sure, size_t may be less efficient on some systems, but IMHO rarely enough
to offset the possibility of bugs in the future.
In _The Standard C Library_, P. J. Plauger writes:

"Unlike ptrdiff_t, however, size_t is /very/ useful. It is the
safest type to represent any integer data object you use as
an array subscript. You don't have to worry if a small array
evolves into a very large one as the program changes.
Subscript arithmetic will never overflow when performed in
type size_t." and "You should make a point of using type
size_t /anywhere/ your program performs array subscripting
or address arithmetic." (emphasis in original)


Great, but applies mostly to library-level (generic) code.


IMHO, it applies to nearly all application code as well.
If in my program a have a dedicated typename for designating
the day of the week, say
...


In that case, yes, I might declare the index as an int or even short, but
clear-cut cases like that are exceptions.

In most cases, the original coder has no idea what his code will eventually
evolve into, how other components of the same system will abuse it, or
whether the index will overflow ten years later and crash some
mission-critical system. Why take the chance unless you can prove it's
affecting performance?

S
Nov 14 '05 #15

P: n/a
Keith Thompson wrote:
[...]
Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.
Object size and container element count are the same for character
arrays, and since an array element can't be smaller than one byte,
you're guaranteed that size_t is at least big enough to hold any array
index.


That's pretty much what I said in the message quoted by the previous poster.
If you're writing code that deals with arrays generically (such as the
standard qsort() function), it's probably safest to use size_t.
Also, exactly what I said in that message.
If
you're using something more specific to a given problem domain, and
you happen to know that an array will never have more than INT_MAX
elements, you can use int.


Same here.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #16

P: n/a
Andrey Tarasevich <an**************@hotmail.com> writes:
Keith Thompson wrote:
[...]
Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.


Object size and container element count are the same for character
arrays, and since an array element can't be smaller than one byte,
you're guaranteed that size_t is at least big enough to hold any array
index.


That's pretty much what I said in the message quoted by the previous poster.


You said that "object size" and "container element count" are not
related. I showed how they are related. They're not identical, of
course, but the nature of their relationship is such that it's
(almost) always safe to use size_t as an array index.

(If you use a pointer into the middle of an array object to provide
negative indices, then size_t won't work, but that's fairly unusual.
If you have a data structure that implements bit arrays, size_t may
not be big enough, but such a data structure can't be implemented with
array syntax in standard C.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #17

P: n/a
Keith Thompson wrote:
[...]
Once again, the concept of "object size" is not related to the concept
of "container element count". The connection between the two for
built-in C arrays is purely parasitic. It is a mere coincidence, which
makes absolutely no difference at conceptual level.

Object size and container element count are the same for character
arrays, and since an array element can't be smaller than one byte,
you're guaranteed that size_t is at least big enough to hold any array
index.
That's pretty much what I said in the message quoted by the previous poster.


You said that "object size" and "container element count" are not
related.


I said that they are not related at conceptual level. These are two
different concepts.
I showed how they are related.
Not as concepts.
They're not identical, of
course, but the nature of their relationship is such that it's
(almost) always safe to use size_t as an array index.


As an array index - of course. As an index into a generic container -
no. As I said before, is most cases application does not specifically
need an array. It just needs a container with similar interface. In
situations like this an array can be easily "unplugged" from the client
code and another container can be "plugged-in" instead (assuming that
the interface specifications match). Hardcoding an array-specific index
type in this case will only introduce another unnecessary coupling
between the client code and the concrete type of the container.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #18

P: n/a
Stephen Sprunk wrote:
>> However, from the pedantic point of view, 'size_t' is intended to
>> implement a concept of "object size", while array index is more
>> related to the concept of "container element count". These two
>> concepts are not related and using 'size_t' for array indexing is
>> a conceptual error.
>
> Since an array is an object, and size_t is defined to be large
> enough to represent the size of any object, size_t naturally
> must be able to represent any possible array subscript.
That's the very reason why I suggest using 'size_t' for array indexing
in generic context. For example, in a generic array-support library.

At application level there's rarely a need to have an array just for
the sake of having an array.


Right, but if you get in the habit of writing code where the index is an
int, sooner or later you'll write code that (after it's been edited by a
dozen other coders) overflows that int but size_t would have worked.


Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works. One day one
of those other coders decides to replace the built-in array with a
custom implementation of some "disjoint" array, like 'deque', for
example. Size of that container is no longer limited by the range of
'size_t'.
Sure, size_t may be less efficient on some systems, but IMHO rarely enough
to offset the possibility of bugs in the future.
> In _The Standard C Library_, P. J. Plauger writes:
>
> "Unlike ptrdiff_t, however, size_t is /very/ useful. It is the
> safest type to represent any integer data object you use as
> an array subscript. You don't have to worry if a small array
> evolves into a very large one as the program changes.
> Subscript arithmetic will never overflow when performed in
> type size_t." and "You should make a point of using type
> size_t /anywhere/ your program performs array subscripting
> or address arithmetic." (emphasis in original)
Great, but applies mostly to library-level (generic) code.


IMHO, it applies to nearly all application code as well.
If in my program a have a dedicated typename for designating
the day of the week, say
...


In that case, yes, I might declare the index as an int or even short, but
clear-cut cases like that are exceptions.


In my opinion, a closer look at most practical cases usually reveals
that in essence they are indeed such clear-cut cases.
In most cases, the original coder has no idea what his code will eventually
evolve into, how other components of the same system will abuse it, or
whether the index will overflow ten years later and crash some
mission-critical system. Why take the chance unless you can prove it's
affecting performance?


That's exactly what I'm talking about. The code can easily evolve from
using an array to using a different type of container, with element
count is no longer restrained by the range of 'size_t'. In situations
like this one has to look through the entire code and carefully replace
'size_t' with some other type. This usually involves a lot of chance-taking.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #19

P: n/a
"Andrey Tarasevich" <an**************@hotmail.com> wrote in message
news:11*************@news.supernews.com...
Stephen Sprunk wrote:
>> However, from the pedantic point of view, 'size_t' is intended
>> to implement a concept of "object size", while array index is
>> more related to the concept of "container element count".
>> These two concepts are not related and using 'size_t' for
>> array indexing is a conceptual error.
>
> Since an array is an object, and size_t is defined to be large
> enough to represent the size of any object, size_t naturally
> must be able to represent any possible array subscript.

That's the very reason why I suggest using 'size_t' for array
indexing in generic context. For example, in a generic array-
support library.

At application level there's rarely a need to have an array just for
the sake of having an array.
Right, but if you get in the habit of writing code where the index
is an int, sooner or later you'll write code that (after it's been
edited by a dozen other coders) overflows that int but size_t
would have worked.


Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works.


Per the standard, size_t CANNOT overflow when used as an array index.
One day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array, like
'deque', for example. Size of that container is no longer limited by
the range of 'size_t'.


A 'deque' is not an array and therefore any problem indexing them with
size_t is moot.

[ Rest of post snipped because it doesn't apply to arrays. ]

The OP specifically asked about the type to use for an array index.
Hijacking the thread to generalize to other non-standard container types is
irrelevant at best.

S
Nov 14 '05 #20

P: n/a
Andrey Tarasevich wrote:
Stephen Sprunk wrote:

.... snip ...

Right, but if you get in the habit of writing code where the
index is an int, sooner or later you'll write code that (after
it's been edited by a dozen other coders) overflows that int but
size_t would have worked.


Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works. One
day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array,
like 'deque', for example. Size of that container is no longer
limited by the range of 'size_t'.


All of which leads to easy generation of buffer overflows. What is
really needed is an index type to match the array, i.e. a
subrange. Unfortunately C doesn't have such a thing. Even an
enumeration won't work. An enhancement of typedef would allow
stronger typing in future, but probably won't happen. I.E. let
typedef create a real type:

typedef 0U ... 10U myindex; /* reusing the ... varargs token */

T arraything[tmax(myindex)];

for (myindex i = 0; i < tmax(myindex); i++) ...

using tmax (and tmin) as operators, similar to sizeof.

crossed to comp.std.c in case it stirs some thoughts.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 14 '05 #21

P: n/a
Stephen Sprunk wrote:
>> >> However, from the pedantic point of view, 'size_t' is intended
>> >> to implement a concept of "object size", while array index is
>> >> more related to the concept of "container element count".
>> >> These two concepts are not related and using 'size_t' for
>> >> array indexing is a conceptual error.
>> >
>> > Since an array is an object, and size_t is defined to be large
>> > enough to represent the size of any object, size_t naturally
>> > must be able to represent any possible array subscript.
>>
>> That's the very reason why I suggest using 'size_t' for array
>> indexing in generic context. For example, in a generic array-
>> support library.
>>
>> At application level there's rarely a need to have an array just for
>> the sake of having an array.
>
> Right, but if you get in the habit of writing code where the index
> is an int, sooner or later you'll write code that (after it's been
> edited by a dozen other coders) overflows that int but size_t
> would have worked.
Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works.


Per the standard, size_t CANNOT overflow when used as an array index.


That's exactly what I said in my original message (the one you initially
replied to). You omitted that portion from the quote. What's the point
of repeating it here?
One day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array, like
'deque', for example. Size of that container is no longer limited by
the range of 'size_t'.


A 'deque' is not an array and therefore any problem indexing them with
size_t is moot.
[ Rest of post snipped because it doesn't apply to arrays. ]


Quite the opposite. Within this context anything that applies
specifically to indexing of arrays is moot. This whole branch of this
thread doesn't apply to arrays. This whole branch is started by your
reply to the "generic" portion of my message (quoted at the very top).
And I clearly and repeatedly explained that issue I'm talking about is
the _conceptual_ issue of using 'size_t' as an index type for a generic
indexed container.
The OP specifically asked about the type to use for an array index.
Hijacking the thread to generalize to other non-standard container types is
irrelevant at best.


You did not reply to the OP. You replied to me. You replied to something
that was noting more than a side note in my response to the OP. You
extracted this side note from my message, robbing it of its original
context, and tried to misrepresent it as thread hijacking? Sorry, but
that's you who hijacked this thread. If you did this unintentionally,
the I can only suggest that in the future, when you reply to a
particular portions of the people messages, make sure you understand
what they are talking about.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #22

P: n/a
CBFalconer wrote:
...
Right, but if you get in the habit of writing code where the
index is an int, sooner or later you'll write code that (after
it's been edited by a dozen other coders) overflows that int but
size_t would have worked.


Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works. One
day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array,
like 'deque', for example. Size of that container is no longer
limited by the range of 'size_t'.


All of which leads to easy generation of buffer overflows. What is
really needed is an index type to match the array, i.e. a
subrange. Unfortunately C doesn't have such a thing. Even an
enumeration won't work. An enhancement of typedef would allow
stronger typing in future, but probably won't happen. I.E. let
typedef create a real type:

typedef 0U ... 10U myindex; /* reusing the ... varargs token */

T arraything[tmax(myindex)];

for (myindex i = 0; i < tmax(myindex); i++) ...

using tmax (and tmin) as operators, similar to sizeof.

crossed to comp.std.c in case it stirs some thoughts.
...


That's true, but again, at application level the issue of choosing the
index type for an array is not really related to arrays in any way,
hoiwever strange it might sound. In other words, at application level
the original issue does not exist at all. The very fact that someone is
asking about the choice of type for array index already indicates that
there's a more generic problem with the way that someone designs its
code or with his way of thinking about the issue. Let me explain once again.

In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing). It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.

However, in specific (application-level) code the choice of index type
immediately follows from the choice of the corresponding "quantity"
type. Whenever there is a need to store an application-specific
"quantity" in the program (number of cars on the lot, number of
employees in the company, number of lines in the file etc.) there's an
issue of choosing a type to represent that quantity. Assuming that we
are choosing from the set of built-in types, there's will always be a
limitation on the maximum quantity we can handle in the program. This
limitation will become a part of the program's specification. It is the
program's responsibility to observe that this part of the specification
is met. Otherwise, the chosen quantity type will overflow with
disastrous results. Your suggestion concerning "range types" is directly
relevant to this particular issue - the issue of choosing (or creating)
a type for representing an application specific "quantity" (although the
need to watch for overflows will always be there, as long as we are
using a fixed-range type). Note, that so far I never mentioned any
arrays or any other containers. There might be no containers in the
program at all. The issue of choosing the "quantity" type, however,
still exists.

Now, let's assume that one needs an array in the program and needs to
choose the index type. The first question one should ask himself is what
this array is going to store, what is the number of elements in that
array and (!) what type is _currently_ used to represent that quantity.
Note, that by the time one needs an array the choice of the "quantity"
type has _already_ been made. _That_ type is the type that should be
used for array indexing, period. Someone here suggested that if array
index type might overflow, if it is not 'size_t'. Wrong. As long as the
same type is used to represent the number of elements in the array and
the index (regardless of the concrete type, could be 'unsigned char' for
example) no overflow can ever occur. More precisely, at that stage
there's no issue of index type overflow. What _can_ overflow is the
originally chosen "quantity" type, but this issue has absolutely nothing
to do with array indexing. The overflow will happen long before one even
gets to any array indexing. The only thing that overflow will mean is
that the author of the code made a bad choice of "quantity" type (not
index type) or failed to enforce the program specification.

The suggestion to _unconditionally_ use 'size_t' for array indexing
because it "can never overflow" is one of those "fake wisdoms" that
float around the net in great numbers. It is as correct in theory as it
is useless and irrelevant in practice. Something from the area of "Never
use division, because there's a danger of dividing something by zero".

As a disclaimer: I said that many times already, but just in case, for
"I'm not a reader, I'm a writer" type of posters - 'size_t' is a correct
choice of index type for generic non-negative array indexing. Generic
array processing functions should always use this particular type for
representing array sizes and array indices (note, BTW, that in this case
the latter follows form the former too, i.e. the index type is
determined by the corresponding "quantity" type).

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #23

P: n/a
Andrey Tarasevich wrote:
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.


What type do you like for counting list nodes?

--
pete
Nov 14 '05 #24

P: n/a
pete wrote:
...
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.


What type do you like for counting list nodes?
...


At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.

In application-level code the choice is dictated by the application
area, just like with arrays.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #25

P: n/a
Andrey Tarasevich <an**************@hotmail.com> writes:
pete wrote:
...
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.


What type do you like for counting list nodes?
...


At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.


And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)

Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.

On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #26

P: n/a
Keith Thompson wrote:
...
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.

What type do you like for counting list nodes?
...
At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.


And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)


If faced with a task of implementing a portable generic list library I'd
still never use 'size_t' to represent list element count. Platforms that
have 'size_t's range narrower than that of 'unsigned long' are also
possible (moreover, I think they are far less exotic than the one you
mention). Range of the type 'size_t' is too "volatile" to be used for
this purpose. And the conceptual reason for that "volatility" is the one
I already mentioned - 'size_t' is directly related to the concept of
"object size" but completely unrelated to the concept of "object count".

Also, consider the applications that would use this library. What are
they going to use to represent application-specific quantities on the
platform you describe? If they use 'int' or 'long' or 'long long' types,
then the issue doesn't exist - they should simply never have lists that
are longer than the ranges of those types. Otherwise, I simply don't see
any good solution for this situation (besides the one described in the
next paragraph).

In situations when providing the source code of the library to the
client is an option, a possible solution would be to implement it in
template-like manner, i.e. make the source code to depend on an
externally '#define'd parameter and give the client the responsibility
to '#define' it and "instantiate" the code with whatever count/index
type he/she prefers. This is, of course, not always an option.
Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.
I don't really see what guarantees you are talking about. 'size_t's
range can easily be narrower than that of, for example, 'unsigned long'.
There's no guarantees against that.
On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.


Surprised? Why? This is a typical situation for segmented-memory
platforms. Many of those not-so-old 16 bit platforms possessed this
property. Most (if not all) C compilers on DOS or Win16 platform were
like that. 'size_t' had 0-65535 range, but there was absolutely no
problem in allocating more than 65535 objects.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #27

P: n/a
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.org> wrote:
Andrey Tarasevich <an**************@hotmail.com> writes:
pete wrote:
...
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.

What type do you like for counting list nodes?
...


At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.


And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)

Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.

On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.


Consider old 8086 systems, where the size of on object was restricted to
64KB, so size_t = 16 bit, but the total size of all objects was
restricted to 1 MB. Should be quite possible to allocate more than 65535
very small objects in that situation. (80286 systems might have had
limits of 65535 bytes per object and 16 MB total, but I am not sure
about that).
Nov 14 '05 #28

P: n/a
Christian Bau <ch***********@cbau.freeserve.co.uk> writes:
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.org> wrote:

[...]
On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.


Consider old 8086 systems, where the size of on object was restricted to
64KB, so size_t = 16 bit, but the total size of all objects was
restricted to 1 MB. Should be quite possible to allocate more than 65535
very small objects in that situation. (80286 systems might have had
limits of 65535 bytes per object and 16 MB total, but I am not sure
about that).


Ok, so color me surprised (and yes, I should have thought of that).

How about using a nest of #ifs to create a typedef "count_t" that's an
alias for the first of uintmax_t, unsigned long long, or unsigned long
that exists? (The second test caters to C90 implementations with
extensions, the third to unenhanced C90 implementations.) This should
get you the biggest available unsigned integer type. It might be
bigger than you need, but unless you're going to create huge arrays of
them, or unless it imposes a serious performance hit, that shouldn't
be too much of a problem.

Whatever count type you use, you should probably define it in a single
location, so even if users can't easily customize it, future
maintainers can.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #29

P: n/a
On Fri, 01 Apr 2005 01:05:37 GMT, Keith Thompson
<ks***@mib.org> wrote:
How about using a nest of #ifs to create a typedef "count_t" that's an
alias for the first of uintmax_t, unsigned long long, or unsigned long
that exists? (The second test caters to C90 implementations with
extensions, the third to unenhanced C90 implementations.) This should
get you the biggest available unsigned integer type. It might be
bigger than you need, but unless you're going to create huge arrays of
them, or unless it imposes a serious performance hit, that shouldn't
be too much of a problem.
That's basically what I do, except that I do it with a header file, and
that file is created by a program which includes limits.h for that
system (also includes float.h and spits out some of the limits as
#defines, because C89 says "Of the values in the <float.h> header,
FLT_RADIX shall be a constant expression suitable for use in #if
preprocessing directives; all other values need not be constant
expressions" which makes them not suitable to do such type
determination). (The header file could of course be edited by hand if
it is not possible to run that program on the target or with the target
header files to generate the header.)
Whatever count type you use, you should probably define it in a single
location, so even if users can't easily customize it, future
maintainers can.


That should be true for a lot of implementation determined things (like
byte order where that matters, sizes of system specific things like
filesnames, etc.).

Chris C
Nov 14 '05 #30

This discussion thread is closed

Replies have been disabled for this discussion.