Using size_t clearly (appropriately?)

Mark Odell

I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing. I also thought that size_t could be signed but
it seems I was wrong on that one.

So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing? Should I have
used an index of type int or unsigned int instead?

Thanks,
--
- Mark

Jun 28 '06 #1

Subscribe Reply

7977

Morris Dovey

Mark Odell (in 11**********************@y41g2000cwy.googlegroups. com)
said:

| I've always declared variables used as indexes into arrays to be of
| type 'size_t'. I have had it brought to my attention, recently, that
| size_t is used to indicate "a count of bytes" and that using it
| otherwise is confusing. I also thought that size_t could be signed
| but it seems I was wrong on that one.
|
| So if you were to see code iterating through a table of Foo objects
| using an index of size_t type, would it be confusing? Should I have
| used an index of type int or unsigned int instead?

Certainly not confusing. Perhaps confidence-building.

--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto

Jun 28 '06 #2

Michael Mair

Mark Odell schrieb:

I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing. I also thought that size_t could be signed but
it seems I was wrong on that one.

So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing? Should I have
used an index of type int or unsigned int instead?

I would think "here is someone who thought about what an index is"...
:-)
If ssize_t were standard C, I'd accept that as well for the reason
that you can easier deal with loops that count downwards.

Typedefs used to define certain roles, say
typedef .... Index;
inspire the same confidence.

int, long, size_t, and maybe unsigned long are perfectly fine
choices for array indices.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Jun 28 '06 #3

Andrew Poelstra

On 2006-06-28, Mark Odell <mr********@gmail.com> wrote:

I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing. I also thought that size_t could be signed but
it seems I was wrong on that one.

So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing? Should I have
used an index of type int or unsigned int instead?

Thanks,

It wouldn't be confusing at all. In fact, there are situations where
you would /want/ to have size_t as your type. For example, you could
be working with strings and be counting length.

I can't see why size_t would ever be signed. However, you shouldn't
be using negative numbers in most loops.

Now, if your coding guidelines tell you not to use "size_t" for
applications that are not "a count of bytes" (array indexing /is/
a count of bytes, IMHO), then go with that. Random people from
USENet don't trump your boss, even though we think we do. :-)

--
Andrew Poelstra < http://www.wpsoftware.net/blog >
To email me, use "apoelstra" at the above address.
I know that area of town like the back of my head.

Jun 28 '06 #4

Richard Heathfield

Mark Odell said:

I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes"
Who says?

Typical standard library functions use size_t in contexts where the value in
question is either the /size/ of an object, in bytes, or the /number/ of
objects that are relevant in the call. Look, for example, at calloc, fread,
fwrite.
and that using it otherwise is confusing.
It's a great type for an index, too. Someone said it's harder to use size_t
to count backwards, but it's not.

for(i = n; i-- > 0; )
{
foo(bar + i);
}
I also thought that size_t could be signed but it seems I was wrong on
that one.
Yes, you're right - you were wrong. :-) It must be an unsigned type.
So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing?

Not in the slightest.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jun 28 '06 #5

Default User

Andrew Poelstra wrote:

I can't see why size_t would ever be signed.

The Standard requires it to be unsigned.

Brian

Jun 28 '06 #6

Keith Thompson

Andrew Poelstra <ap*******@localhost.localdomain> writes:

On 2006-06-28, Mark Odell <mr********@gmail.com> wrote:
I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing. I also thought that size_t could be signed but
it seems I was wrong on that one.

So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing? Should I have
used an index of type int or unsigned int instead?

Thanks,

It wouldn't be confusing at all. In fact, there are situations where
you would /want/ to have size_t as your type. For example, you could
be working with strings and be counting length.

I can't see why size_t would ever be signed. However, you shouldn't
be using negative numbers in most loops.

size_t is guaranteed to be unsigned.

One possible drawback of using size_t (or any unsigned type) is that a
loop like this:

size_t i;
for (i = MAX; i >= 0; i --) {
/* ... */
}

will never terminate, since i will *always* be >= 0. The same issue
applies to signed types:

int i;
for (i = whatever; i <= INT_MAX; i ++) {
/* ... */
}

but it doesn't come up as often. (Also, decrementing a size_t with
the value 0 is well defined; incrementing an int with the value
INT_MAX, or decrementing an int with the value INT_MIN, invokes
undefined behavior.)

Both signed and unsigned integers behave like mathematical integers as
long as you stay away from the ends of their ranges. The difference
is that the ends of the range of a signed integer type are way out
there, and you're likely not to encounter them; the lower range of an
unsigned type is 0, and it's easy to run into that if you're not
careful.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 28 '06 #7

pete

Mark Odell wrote:

I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing.
Then the nmemb parameter of qsort
must be more confusing to them, than it is to you.

#include <stdlib.h>
void qsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
I also thought that size_t could be signed but
it seems I was wrong on that one.

So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing?

No.

The one and only problem that I have with size_t,
is the lack of a size_t format specifier for fprintf in C89.

--
pete

Jun 28 '06 #8

Al Balmer

On Wed, 28 Jun 2006 21:19:33 GMT, Andrew Poelstra
<ap*******@localhost.localdomain> wrote:

On 2006-06-28, Mark Odell <mr********@gmail.com> wrote:
I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing. I also thought that size_t could be signed but
it seems I was wrong on that one.

So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing? Should I have
used an index of type int or unsigned int instead?

Thanks,
It wouldn't be confusing at all. In fact, there are situations where
you would /want/ to have size_t as your type. For example, you could
be working with strings and be counting length.

I can't see why size_t would ever be signed. However, you shouldn't
be using negative numbers in most loops.

Posix puts their ssize_t (signed size_t) to use for functions that
return either a count or -1. I don't know of anything in standard C
that could use that feature.
Now, if your coding guidelines tell you not to use "size_t" for
applications that are not "a count of bytes" (array indexing /is/
a count of bytes, IMHO), then go with that. Random people from
USENet don't trump your boss, even though we think we do. :-)

The standard specifies size_t for some things that are not a count of
bytes.

--
Al Balmer
Sun City, AZ

Jun 28 '06 #9

William Ahern

On Wed, 28 Jun 2006 22:41:45 +0000, Al Balmer wrote:

On Wed, 28 Jun 2006 21:19:33 GMT, Andrew Poelstra
<ap*******@localhost.localdomain> wrote:

<snip>

I can't see why size_t would ever be signed. However, you shouldn't be
using negative numbers in most loops.

Posix puts their ssize_t (signed size_t) to use for functions that return
either a count or -1. I don't know of anything in standard C that could
use that feature.

snprintf. Or, basically, anything printf. Some, like snprintf(), even take
size_t lengths as arguments. Very awkward. Not that ssize_t is
particularly less awkward, but at least they provide a
greater range in practice, and in some scenarios ssize_t could even
solve the issue entirely:

typedef unsigned long size_t;
typedef long long ssize_t;

Where LLONG_MAX >= ULONG_MAX.

Jun 28 '06 #10

Andrey Tarasevich

Mark Odell wrote:

I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing.
It might me. Not as much "confusing", as conceptually incorrect. 'size_t' type
is intended to be used to represent a concept of 'size of an object'. Number of
elements in the array is described by a completely different concept of 'number
of elements in a container'. Note, that is case of generic container these two
concepts are completely unrelated. In the particular case of an _array_ there's
certain "parasitic" relationship between the two: the latter cannot be greater
than the former. This is often used as a justification for using 'size_t' to
represent array indices. This is a false reasoning. In general case, once again,
using 'size_t' for this purpose is a conceptual error.

In certain particular cases though 'size_t' could be appropriate as an array
index type. For example, when one needs to iterate through an array of raw
memory bytes (i.e. array of 'unsigned char'). Another example would be generic
purpose functions that work with "generic" arrays, i.e. functions that are not
tied to a concrete application-specific area. String processing functions and
functions of 'memset'/'memcpy'/etc group, 'bsearch' and 'qsort' functions belong
to that category.

It is also worth noting (and looks like you know that already) that operator
'[]' accepts signed integral arguments, which indicated that in the most generic
case of using the '[]' ('<pointer>[<integer>]') the more appropriate integral
argument type might be 'ptrdiff_t', not 'size_t'.
I also thought that size_t could be signed but
it seems I was wrong on that one.
Yes, 'size_t' is always unsigned.
So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing?
The first question that has to be answered here is what exactly is 'Foo'. If
this is an application-specific type, then the use of 'size_t' for indexing
would be incorrect. Normally, regardless of whether there are any arrays of
'Foo' in the code, the programmer would have already made a choice of type that
should be used to represent the quantities of 'Foo'. For example, that could be
something like 'typedef unsigned TFooQuantity;' or simply 'unsigned' without any
extra 'typedef's. That type is the type that should be used as index type in
'Foo' arrays, not 'size_t'.
Should I have
used an index of type int or unsigned int instead?

See above. You should ask yourself: what type are you using to represent the
concept of 'quantity' of objects of type 'Foo' in your code. That's exactly the
type you should use for array indexing.

--
Best regards,
Andrey Tarasevich

Jun 28 '06 #11

Keith Thompson

pete <pf*****@mindspring.com> writes:

Mark Odell wrote:

[...]

So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing?

No.

The one and only problem that I have with size_t,
is the lack of a size_t format specifier for fprintf in C89.

Which, of course, is easy to work around:

fprintf(some_file, "size = %lu\n", (unsigned long)sizeof whatever);

This isn't guaranteed to work in C99, but a #if test on
__STDC_VERSION__ will solve that.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 28 '06 #12

pete

Andrey Tarasevich wrote:

Mark Odell wrote:
I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing.

It might me. Not as much "confusing", as conceptually incorrect. 'size_t' type
is intended to be used to represent a concept of 'size of an object'. Number of
elements in the array is described by a completely different concept of 'number
of elements in a container'. Note, that is case of generic container these two
concepts are completely unrelated. In the particular case of an _array_ there's
certain "parasitic" relationship between the two: the latter cannot be greater
than the former. This is often used as a justification for using 'size_t' to
represent array indices. This is a false reasoning. In general case, once again,
using 'size_t' for this purpose is a conceptual error.

In certain particular cases though 'size_t' could be appropriate as an array
index type. For example, when one needs to iterate through an array of raw
memory bytes (i.e. array of 'unsigned char'). Another example would be generic
purpose functions that work with "generic" arrays, i.e. functions that are not
tied to a concrete application-specific area. String processing functions and
functions of 'memset'/'memcpy'/etc group, 'bsearch' and 'qsort' functions belong
to that category.

That's not a bad explanation.

--
pete

Jun 28 '06 #13

pete

pete wrote:

Andrey Tarasevich wrote:

Mark Odell wrote:
I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing.

It might me. Not as much "confusing", as conceptually incorrect. 'size_t' type
is intended to be used to represent a concept of 'size of an object'. Number of
elements in the array is described by a completely different concept of 'number
of elements in a container'. Note, that is case of generic container these two
concepts are completely unrelated. In the particular case of an _array_ there's
certain "parasitic" relationship between the two: the latter cannot be greater
than the former. This is often used as a justification for using 'size_t' to
represent array indices. This is a false reasoning. In general case, once again,
using 'size_t' for this purpose is a conceptual error.

In certain particular cases though 'size_t' could be appropriate as an array
index type. For example, when one needs to iterate through an array of raw
memory bytes (i.e. array of 'unsigned char'). Another example would be generic
purpose functions that work with "generic" arrays, i.e. functions that are not
tied to a concrete application-specific area. String processing functions and
functions of 'memset'/'memcpy'/etc group, 'bsearch' and 'qsort' functions belong
to that category.

That's not a bad explanation.

But, if I were going to compare the array index
to a size_t expression or assign a size_t value
to an index variable, I would still probably use
a size_t type index variable.

--
pete

Jun 28 '06 #14

Richard Heathfield

Andrey Tarasevich said:

Mark Odell wrote:
I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing.

It might me. Not as much "confusing", as conceptually incorrect. 'size_t'
type is intended to be used to represent a concept of 'size of an object'.

The calloc, qsort, bsearch, fread, and fwrite standard library functions all
use size_t to count a number of objects, and are therefore counter-examples
(insofar as the Standard is definitively correct).

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jun 29 '06 #15

Al Balmer

On Wed, 28 Jun 2006 15:59:15 -0700, William Ahern
<wi*****@25thandClement.com> wrote:

On Wed, 28 Jun 2006 22:41:45 +0000, Al Balmer wrote:
On Wed, 28 Jun 2006 21:19:33 GMT, Andrew Poelstra
<ap*******@localhost.localdomain> wrote:<snip>
I can't see why size_t would ever be signed. However, you shouldn't be
using negative numbers in most loops.

Posix puts their ssize_t (signed size_t) to use for functions that return
either a count or -1. I don't know of anything in standard C that could
use that feature.

snprintf. Or, basically, anything printf.

Of course. My mind apparently went blank for a bit :-)
Some, like snprintf(), even take
size_t lengths as arguments. Very awkward. Not that ssize_t is
particularly less awkward, but at least they provide a
greater range in practice, and in some scenarios ssize_t could even
solve the issue entirely:

typedef unsigned long size_t;
typedef long long ssize_t;

Where LLONG_MAX >= ULONG_MAX.

--
Al Balmer
Sun City, AZ

Jun 29 '06 #16

Keith Thompson

Andrey Tarasevich <an**************@hotmail.com> writes:

Mark Odell wrote:
I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing.

It might me. Not as much "confusing", as conceptually
incorrect. 'size_t' type is intended to be used to represent a
concept of 'size of an object'. Number of elements in the array is
described by a completely different concept of 'number of elements
in a container'. Note, that is case of generic container these two
concepts are completely unrelated. In the particular case of an
_array_ there's certain "parasitic" relationship between the two:
the latter cannot be greater than the former. This is often used as
a justification for using 'size_t' to represent array indices. This
is a false reasoning. In general case, once again, using 'size_t'
for this purpose is a conceptual error.

That's well argued, but I disagree.

We use what we have. We have a type size_t that's designed to count
sizes (in bytes) of objects. We don't have a similar type that's
designed to count the number of elements in an array of struct foobar.
If we had such a type, I'd advocate using it (for example, if
declaring "struct foobar" implicitly created an unsigned int typedef
called, say, "struct_foobar_count").

Using size_t to count objects isn't ideal, but it's what we have.
Since objects (other than bit fields, which we generally wouldn't be
interested in counting) are at least one byte each, we know that
size_t has *at least* enough range for the purpose. I don't believe
any other type would be any better, and size_t isn't sufficiently bad
that I'd recommend avoiding it.

If the language had a type to be used generically for counting
objects, surely it would be just an alias for size_t, since the
objects could be bytes in an array. I'm not greatly distressed by the
fact that it's called "size_t" rather than "object_count_t".

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 29 '06 #17

Andrey Tarasevich

Richard Heathfield wrote:

I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing.

It might me. Not as much "confusing", as conceptually incorrect. 'size_t'
type is intended to be used to represent a concept of 'size of an object'.

The calloc, qsort, bsearch, fread, and fwrite standard library functions all
use size_t to count a number of objects, and are therefore counter-examples
(insofar as the Standard is definitively correct).

All these functions are excellent examples of generinc array processing
functions, with which 'size_t' is perfectly appropriate. I explicitly
mentioned it in my message. I actually mentioned some of these functions
as well.

--
Best regards,
Andrey Tarasevich

Jun 29 '06 #18

Andrey Tarasevich

Keith Thompson wrote:

It might me. Not as much "confusing", as conceptually
incorrect. 'size_t' type is intended to be used to represent a
concept of 'size of an object'. Number of elements in the array is
described by a completely different concept of 'number of elements
in a container'. Note, that is case of generic container these two
concepts are completely unrelated. In the particular case of an
_array_ there's certain "parasitic" relationship between the two:
the latter cannot be greater than the former. This is often used as
a justification for using 'size_t' to represent array indices. This
is a false reasoning. In general case, once again, using 'size_t'
for this purpose is a conceptual error.
That's well argued, but I disagree.

I think we already had this discussion before.
We use what we have. We have a type size_t that's designed to count
sizes (in bytes) of objects. We don't have a similar type that's
designed to count the number of elements in an array of struct foobar.
If we had such a type, I'd advocate using it (for example, if
declaring "struct foobar" implicitly created an unsigned int typedef
called, say, "struct_foobar_count").
That only appears so. Whenever some type (say 'struct foobar') is given
some application-specific meaning (say, describe an employee in a
company) and represent a 'countable' object (say, we normally have many
employees in a company) there exists a need to choose a type that will
be used to represent these 'counts', these application-specific
quantities. Note, that we are not talking about any "arrays" yet, but
the need to have the type that represents the 'quantity' already exists.

Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

Once again, 'quantities' predate 'arrays'. By the time we get to
'arrays' (or any other containers, for that matter) we should have
already made all the necessary choices about 'quantity' types.
Using size_t to count objects isn't ideal, but it's what we have.
Since objects (other than bit fields, which we generally wouldn't be
interested in counting) are at least one byte each, we know that
size_t has *at least* enough range for the purpose.
In general case 'size_t' is not applicable for counting objects at all.
In general case it's range is not sufficient (16-bit platform again).
Yes, 'size_t' is applicable for counting _array_ _elements_, but that's
nothing more than a language-specific parasitic relationship between the
byte-size of array and the number of elements in it. Letting this
parasitic relationship to seep into the design of application-specific
code is not the right thing to do.
I don't believe
any other type would be any better, and size_t isn't sufficiently bad
that I'd recommend avoiding it.

If the language had a type to be used generically for counting
objects, surely it would be just an alias for size_t, since the
objects could be bytes in an array. I'm not greatly distressed by the
fact that it's called "size_t" rather than "object_count_t".

Once again, on a traditional 16-bit segmented platform with 16-bit
'size_t' the difference between the concept of 'object size' and 'object
count' is especially obvious. As is the inappropriateness of choosing
'size_t' as generic object count type.

--
Best regards,
Andrey Tarasevich

Jun 29 '06 #19

Richard Heathfield

Andrey Tarasevich said:

<snip>

All these functions are excellent examples of generinc array processing
functions, with which 'size_t' is perfectly appropriate. I explicitly
mentioned it in my message. I actually mentioned some of these functions
as well.

True enough, but I fail to see why you consider them exceptions.

Okay, let's take a different tack. The canonical way to determine the number
of elements in an array (cf C89 3.3.3.4) is: sizeof array / sizeof array[0]

Now, sizeof yields size_t. What is the natural type to use for storing the
result of a division of size_t by size_t? I would argue that it's size_t.
Certainly the division will yield an unsigned type as its result. So it
makes perfect sense to do this:

size_t i;

for(i = 0; i < sizeof array / sizeof array[0]; i++)

Yes? Well, I doubt whether I've convinced you, but maybe some others here
will be swayed by this argument. :-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jun 29 '06 #20

Michael Mair

Richard Heathfield schrieb:

Mark Odell said: <snip: size_t among other things good index type?>
It's a great type for an index, too. Someone said it's harder to use size_t
to count backwards, but it's not.

for(i = n; i-- > 0; )
{
foo(bar + i);
}

True enough. I love it when that clashes with <insert
adjective here> company coding guidelines which prohibit
expressions with side effects for tests (exception:
function calls to functions returning a success status).
That makes for, while, do--while, if, and switch a little bit
safer but does not help in the above case.
In addition to "only the init part of a for loop may be
omitted", you get
i = n;
while (i != 0) {
--i;
....
}
which is not the intuitive thing to write.
Worse yet, if people worked with a signed index type before,
they just may not be aware of that one.
If "cleverly" hidden in a series of filters, even the wrong
for (i = n-1; i >= 0; --i) {
....
/* break/return for some condition */
....
}
may "work" for a while...
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Jun 29 '06 #21

Marc Thrun

Andrey Tarasevich schrieb:
[...]

Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

size_t will always have the needed range by definition, as it has to be
able to represent the size of the largest possible object. So even on a
16-bit platform with a 16-bit size_t you will not be able to create an
array, which is an object, with a total size of more than (size_t)-1.

--
Marc Thrun
http://www.tekwarrior.de/

Jun 29 '06 #22

Keith Thompson

Marc Thrun <Te********@gmx.de> writes:

Andrey Tarasevich schrieb:
[...]
Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

size_t will always have the needed range by definition, as it has to
be able to represent the size of the largest possible object. So even
on a 16-bit platform with a 16-bit size_t you will not be able to
create an array, which is an object, with a total size of more than
(size_t)-1.

In some (fairly odd) circumstances, the maximum size of a single
object might be 65535, but you might be able to allocate a greater
number of individual objects.

On the other hand, I don't know of any such systems, and it's easy
enough for the implementation to make size_t 32 bits. (Perhaps Andrey
knows more about the practical aspects of this.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 29 '06 #23

Stephen Sprunk

"Michael Mair" <Mi**********@invalid.invalid> wrote in message
news:4g*************@individual.net...

Mark Odell schrieb:
I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing. I also thought that size_t could be signed but
it seems I was wrong on that one.

So if you were to see code iterating through a table of Foo objects
using an index of size_t type, would it be confusing? Should I have
used an index of type int or unsigned int instead?

I would think "here is someone who thought about what an index is"...
:-)
If ssize_t were standard C, I'd accept that as well for the reason
that you can easier deal with loops that count downwards.

Typedefs used to define certain roles, say
typedef .... Index;
inspire the same confidence.

int, long, size_t, and maybe unsigned long are perfectly fine
choices for array indices.

int could be too small to hold a valid array index, and the same is true for
long, though less likely.

Unfortunately, if one is counting downwards in a loop, one may rely on being
able to get to -1, which makes size_t a worse choice than long in most
cases. ssize_t, where available, would be better.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin
--
Posted via a free Usenet account from http://www.teranews.com

Jun 29 '06 #24

Al Balmer

On Thu, 29 Jun 2006 01:19:35 +0000, Richard Heathfield
<in*****@invalid.invalid> wrote:

Andrey Tarasevich said:

<snip>
All these functions are excellent examples of generinc array processing
functions, with which 'size_t' is perfectly appropriate. I explicitly
mentioned it in my message. I actually mentioned some of these functions
as well.

True enough, but I fail to see why you consider them exceptions.

Okay, let's take a different tack. The canonical way to determine the number
of elements in an array (cf C89 3.3.3.4) is: sizeof array / sizeof array[0]

Now, sizeof yields size_t. What is the natural type to use for storing the
result of a division of size_t by size_t? I would argue that it's size_t.
Certainly the division will yield an unsigned type as its result. So it
makes perfect sense to do this:

size_t i;

for(i = 0; i < sizeof array / sizeof array[0]; i++)

Yes? Well, I doubt whether I've convinced you, but maybe some others here
will be swayed by this argument. :-)

From the rationale:

"The type of sizeof, whatever it is, is published (in the library
header <stddef.h>) as size_t, since it is useful for the programmer to
be able to refer to this type. This requirement implicitly restricts
size_t to be a synonym for an existing unsigned integer type. Note
also that, although size_t is an unsigned type, sizeof does not
involve any arithmetic operations or conversions that would result in
modulus behavior if the size is too large to represent as a size_t,
thus quashing any notion that the largest declarable object might be
too big to span even with an unsigned long in C89 or uintmax_t in C9X.
This also restricts the maximum number of elements that may be
declared in an array, since for any array a of N elements,

N == sizeof(a)/sizeof(a[0])

Thus size_t is also a convenient type for array sizes, and is so used
in several library functions."

--
Al Balmer
Sun City, AZ

Jun 29 '06 #25

Andrey Tarasevich

Marc Thrun wrote:

Andrey Tarasevich schrieb:
[...]
Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

size_t will always have the needed range by definition, as it has to be
able to represent the size of the largest possible object. So even on a
16-bit platform with a 16-bit size_t you will not be able to create an
array, which is an object, with a total size of more than (size_t)-1.

What I'm trying to say in the quited paragraph is that choosing 'size_t'
to represent _generic_ quantities (any quantities, not just 'number of
elements in an array') is bad idea. And the fact that some quantity
might be somehow related to some array somewhere in the code is not an
argument for choosing 'size_t'. 'Quantities' predate 'containers'.
Deriving 'quantity' type from 'container' type is no different from
putting horse behind the carriage.

The above applies to specific code. In generic code 'size_t' is an
excellent choice of 'quantity' and 'index' type, no argument here.

--
Best regards,
Andrey Tarasevich

Jun 29 '06 #26

Andrey Tarasevich

Richard Heathfield wrote:

...
All these functions are excellent examples of generinc array processing
functions, with which 'size_t' is perfectly appropriate. I explicitly
mentioned it in my message. I actually mentioned some of these functions
as well.

True enough, but I fail to see why you consider them exceptions.

Okay, let's take a different tack. The canonical way to determine the number
of elements in an array (cf C89 3.3.3.4) is: sizeof array / sizeof array[0]

Now, sizeof yields size_t. What is the natural type to use for storing the
result of a division of size_t by size_t? I would argue that it's size_t.
Certainly the division will yield an unsigned type as its result. So it
makes perfect sense to do this:

size_t i;

for(i = 0; i < sizeof array / sizeof array[0]; i++)

Yes? Well, I doubt whether I've convinced you, but maybe some others here
will be swayed by this argument. :-)

What you are saying here applies to abstract, generic arrays. And I have
absolutely no problem with using 'size_t' for representing the number of
elements in an array as well as array index in _generic_ context, i.e.
when we are working with arrays that are just... well, abstract arrays
and nothing more.

Whatever I said against using 'size_t' applies to application-specific
(or should we say "application domain-specific) context, when an array
is not just an array, but one particular implementation of a linear
container, whose maximum size is dictated by the requirements of
application domain and designed limitations of the code, not by some
internal rules of C language. Today it is an array, tomorrow it might be
replaced with a linked list, then it's suddenly a tree, and then it
might be changed back to an array again. A hardcoded 'size_t' has no
place in such a context.

--
Best regards,
Andrey Tarasevich

Jun 29 '06 #27

Ben Pfaff

Al Balmer <al******@att.net> writes:

Posix puts their ssize_t (signed size_t) to use for functions that
return either a count or -1. I don't know of anything in standard C
that could use that feature.

Do the types "ptrdiff_t" and "ssize_t" ever differ in practice?
--
"When in doubt, treat ``feature'' as a pejorative.
(Think of a hundred-bladed Swiss army knife.)"
--Kernighan and Plauger, _Software Tools_

Jun 29 '06 #28

Al Balmer

On Thu, 29 Jun 2006 11:43:45 -0700, Ben Pfaff <bl*@cs.stanford.edu>
wrote:

Al Balmer <al******@att.net> writes:
Posix puts their ssize_t (signed size_t) to use for functions that
return either a count or -1. I don't know of anything in standard C
that could use that feature.

Do the types "ptrdiff_t" and "ssize_t" ever differ in practice?

ptrdiff_t is signed.

--
Al Balmer
Sun City, AZ

Jun 29 '06 #29

pete

Al Balmer wrote:

On Thu, 29 Jun 2006 11:43:45 -0700, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
Al Balmer <al******@att.net> writes:
Posix puts their ssize_t (signed size_t) to use for functions that
return either a count or -1. I don't know of anything in standard C
that could use that feature.

Do the types "ptrdiff_t" and "ssize_t" ever differ in practice?

ptrdiff_t is signed.

Read the question again.

--
pete

Jun 29 '06 #30

Keith Thompson

Al Balmer <al******@att.net> writes:

On Thu, 29 Jun 2006 11:43:45 -0700, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
Al Balmer <al******@att.net> writes:
Posix puts their ssize_t (signed size_t) to use for functions that
return either a count or -1. I don't know of anything in standard C
that could use that feature.

Do the types "ptrdiff_t" and "ssize_t" ever differ in practice?

ptrdiff_t is signed.

So is ssize_t (as distinct from size_t).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jun 29 '06 #31

Al Balmer

On Thu, 29 Jun 2006 19:18:48 GMT, pete <pf*****@mindspring.com> wrote:

Al Balmer wrote:

On Thu, 29 Jun 2006 11:43:45 -0700, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
>Al Balmer <al******@att.net> writes:
>
>> Posix puts their ssize_t (signed size_t) to use for functions that
>> return either a count or -1. I don't know of anything in standard C
>> that could use that feature.
>
>Do the types "ptrdiff_t" and "ssize_t" ever differ in practice?

ptrdiff_t is signed.

Read the question again.

Sorry, I read size_t.

Don't know, but can't think of any implementation that would have a
reason to make them different.

--
Al Balmer
Sun City, AZ

Jun 29 '06 #32

Mark F. Haigh

Andrey Tarasevich wrote:

Mark Odell wrote:
I've always declared variables used as indexes into arrays to be of
type 'size_t'. I have had it brought to my attention, recently, that
size_t is used to indicate "a count of bytes" and that using it
otherwise is confusing.

It might me. Not as much "confusing", as conceptually incorrect. 'size_t' type
is intended to be used to represent a concept of 'size of an object'. Number of
elements in the array is described by a completely different concept of 'number
of elements in a container'. Note, that is case of generic container these two
concepts are completely unrelated. In the particular case of an _array_ there's
certain "parasitic" relationship between the two: the latter cannot be greater
than the former. This is often used as a justification for using 'size_t' to
represent array indices. This is a false reasoning. In general case, once again,
using 'size_t' for this purpose is a conceptual error.

In certain particular cases though 'size_t' could be appropriate as an array
index type. For example, when one needs to iterate through an array of raw
memory bytes (i.e. array of 'unsigned char'). Another example would be generic
purpose functions that work with "generic" arrays, i.e. functions that are not
tied to a concrete application-specific area. String processing functions and
functions of 'memset'/'memcpy'/etc group, 'bsearch' and 'qsort' functions belong
to that category.

<snip>

When indexing C arrays with the subscript operator ([]), you can't go
wrong with size_t, regardless of what you claim about its conceptual
status. On the other hand, when you're dealing with custom data
structures (judy arrays, search trees, etc), size_t may be too
restrictive, and you may want to use a wider type.

Custom data structures may be "arrays" in an abstract sense, but they
are not "C arrays". Any standard C array object can be exhaustively
indexed with a size_t. An array that size_t cannot index is not a
standard "C array" (ie it can't be sorted with qsort, exhaustively
indexed with [], etc).
Mark F. Haigh
mf*****@sbcglobal.net

Jun 30 '06 #33

Mark F. Haigh

Keith Thompson wrote:

Marc Thrun <Te********@gmx.de> writes:
Andrey Tarasevich schrieb:
[...]
Now, once we start using arrays, that 'quantity' type immediately
springs to mind as the best choice for index type. Note, that we indeed
"use what we have", as you said in your message. I just want to say that
by the time we get to arrays, we will already "have" the index type, and
it is not 'size_t'. 'size_t' is a bad choice to represent generic
'quantities' for obvious reasons (it might simply not have the range,
think of segmented 16-bit platform with 16-bit 'size_t').

size_t will always have the needed range by definition, as it has to
be able to represent the size of the largest possible object. So even
on a 16-bit platform with a 16-bit size_t you will not be able to
create an array, which is an object, with a total size of more than
(size_t)-1.

In some (fairly odd) circumstances, the maximum size of a single
object might be 65535, but you might be able to allocate a greater
number of individual objects.

IIRC the compact and large memory models with the Microsoft C compiler
for MSDOS. SIZE_MAX is 65535, but all data pointers are "far" and the
total memory one can allocate is somewhere under 640K.

When a far pointer like DEAD:FFFF is incremented (or indexed with the
subscript operator), only the offset portion wraps around (ie
DEAD:0000), which is why SIZE_MAX is 65535 on those particular
platforms. It's the theoretical maximum size of a C object.

To even get your hands on a chunk of memory larger than the maximum C
object size, you need to call a platform-specific huge allocator. It
may return an "array", but in this context, it's not a "C array".

On the other hand, I don't know of any such systems, and it's easy
enough for the implementation to make size_t 32 bits. (Perhaps Andrey
knows more about the practical aspects of this.)

You can compile it targeting a huge memory model (ie one where
segment-offset pairs are internally normalized by the compiler), or
litter the code with platform-specific magic to make it happen. The
former has the benefit of having bsearch, qsort, etc, work on your
large arrays.
Mark F. Haigh
mf*****@sbcglobal.net

Jun 30 '06 #34

Dietmar Schindler

Stephen Sprunk wrote:

int could be too small to hold a valid array index, and the same is true for
long, though less likely.

You can't possibly mean what you wrote. int can hold 0; 0 is a valid
array index; therefore int is not too small to hold a valid array index.
--
Dietmar Schindler

Jul 5 '06 #35

Harald van DÄ³k

Dietmar Schindler wrote:

Stephen Sprunk wrote:
int could be too small to hold a valid array index, and the same is true for
long, though less likely.

You can't possibly mean what you wrote. int can hold 0; 0 is a valid
array index; therefore int is not too small to hold a valid array index.

int could be too small to hold *a* valid array index. In other words,
there may exist a valid array index which is outside of int's range
(regardless of other array indices which aren't). You read "a" as
"any", but that changes the meaning.

Jul 5 '06 #36

Stephen Sprunk

"Dietmar Schindler" <dS***@arcor.dewrote in message
news:44***********@arcor.de...

Stephen Sprunk wrote:
>int could be too small to hold a valid array index, and the same is true
for
long, though less likely.

You can't possibly mean what you wrote. int can hold 0; 0 is a valid
array index; therefore int is not too small to hold a valid array index.

Depending on the implementation, it is possible for INT_MAX+1 to be a valid
array index. On such systems, my statement holds true.

In at least two common 64-bit systems, int is 32-bit yet malloc() can return
objects larger than 2^32 bytes. One of those also has a 32-bit long, which
means size_t is the only type you can safely use as an array index in
portable code (since long long isn't yet available on many implementations,
and even that may be one bit too small).

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin
--
Posted via a free Usenet account from http://www.teranews.com

Jul 5 '06 #37

Dietmar Schindler

=?utf-8?B?SGFyYWxkIHZhbiBExLNr?= wrote:

int could be too small to hold *a* valid array index. In other words,
there may exist a valid array index which is outside of int's range
(regardless of other array indices which aren't). You read "a" as
"any", but that changes the meaning.

You read "a" as "every", but are you sure that "a" means "every" rather
than "any"?
(I'm asking what "a" truly means, not what it got to be meaning to make
the sentence "int could be too small to hold *a* valid array index"
meaningful.)

Jul 6 '06 #38

Dave Thompson

On Thu, 29 Jun 2006 01:19:35 +0000, Richard Heathfield
<in*****@invalid.invalidwrote:
<snip>

Okay, let's take a different tack. The canonical way to determine the number
of elements in an array (cf C89 3.3.3.4) is: sizeof array / sizeof array[0]

Now, sizeof yields size_t. What is the natural type to use for storing the
result of a division of size_t by size_t? I would argue that it's size_t.
Certainly the division will yield an unsigned type as its result. So it

<pedanticNot if size_t('s actual type) is lower in rank (= narrower)
than signed int. Then both operands promote to int and the division is
done in int. The result value is in range for size_t however.

makes perfect sense to do this:

size_t i;

for(i = 0; i < sizeof array / sizeof array[0]; i++)

Yes? Well, I doubt whether I've convinced you, but maybe some others here
will be swayed by this argument. :-)

Personally I usually use size_t for bound or subscript in routines
(esp libraries) that are intended to be fully generic. But in code
where I know* what sorts of bounds or (ranges of) subscripts will be
used I often just use unsigned int or long, and sometimes even signed.

* For values of know that depending on mood may include have some
evidence for, strongly suspect, guess, and have a vague hunch. <G>

- David.Thompson1 at worldnet.att.net

Jul 10 '06 #39

Stephen Sprunk

"Mark L Pappin" <ml*@acm.orgwrote in message
news:m3************@Claudio.Messina...

Dietmar Schindler <dS***@arcor.dewrites:

>=?utf-8?B?SGFyYWxkIHZhbiBExLNr?= wrote:
int could be too small to hold *a* valid array index. In other words,
there may exist a valid array index which is outside of int's range
(regardless of other array indices which aren't). You read "a" as
"any", but that changes the meaning.

You read "a" as "every", but are you sure that "a" means "every" rather
than "any"?
(I'm asking what "a" truly means, not what it got to be meaning to make
the sentence "int could be too small to hold *a* valid array index"
meaningful.)

English is strange.

"a" has many possible meanings, and which one is appropriate is
determined by context.

In the original statement (which I don't recall having seen other than
as the subject of dissection in this sub-thread), the writer might
have meant either "every" or "any". "any" makes the statement
trivially false (as you pointed out, 0 is a valid index and 'int' can
certainly hold that value); "every" makes the statement true.

As the writer of that statement, I meant it to mean "there exist potentially
valid array index values which will not fit in an int".

Now, if you know with absolute certainty that your program cannot generate
such values, then it may be safe to use int, but such assumptions often
prove untrue after the program evolves for several years.

Take, for example, the NASDAQ ECN protocol where trade numbers are
represented as six digits; that was considered far more than could possibly
happen within a day, but during the dot-com bubble there was a day where
over a million trades happened and every ECN and most broker systems crashed
and/or lost hundreds of thousands of transactions; it took days of manual
labor to sort out that and weeks to rebuild all the systems to handle a
rollover.

Most people are not creative enough to discern the difference between
"impossible" and "unlikely". Things which are "impossible" happen with
surprising regularity in the real world.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin
--
Posted via a free Usenet account from http://www.teranews.com

Jul 13 '06 #40

Using size_t clearly (appropriately?)

Similar topics