473,769 Members | 2,143 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

type of array index?


For maximum portability what should the type of an array index be? Can
any integer type be used safely? Or should I only use an unsigned type?
Or what?

If I'm using pointers to access array elements as *(mptr+k) where I've
declared

MYTYPE *mptr;

what should be the type of 'k'? Should it be ptrdiff_t?

OT: near as I can tell my implementation of gcc doesn't have an
<stddef.h> file with ptrdiff_t defined. Am I overlooking something?

--
Nov 14 '05
29 5479
Andrey Tarasevich wrote:
Stephen Sprunk wrote:

.... snip ...

Right, but if you get in the habit of writing code where the
index is an int, sooner or later you'll write code that (after
it's been edited by a dozen other coders) overflows that int but
size_t would have worked.


Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works. One
day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array,
like 'deque', for example. Size of that container is no longer
limited by the range of 'size_t'.


All of which leads to easy generation of buffer overflows. What is
really needed is an index type to match the array, i.e. a
subrange. Unfortunately C doesn't have such a thing. Even an
enumeration won't work. An enhancement of typedef would allow
stronger typing in future, but probably won't happen. I.E. let
typedef create a real type:

typedef 0U ... 10U myindex; /* reusing the ... varargs token */

T arraything[tmax(myindex)];

for (myindex i = 0; i < tmax(myindex); i++) ...

using tmax (and tmin) as operators, similar to sizeof.

crossed to comp.std.c in case it stirs some thoughts.

--
"If you want to post a followup via groups.google.c om, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 14 '05 #21
Stephen Sprunk wrote:
>> >> However, from the pedantic point of view, 'size_t' is intended
>> >> to implement a concept of "object size", while array index is
>> >> more related to the concept of "container element count".
>> >> These two concepts are not related and using 'size_t' for
>> >> array indexing is a conceptual error.
>> >
>> > Since an array is an object, and size_t is defined to be large
>> > enough to represent the size of any object, size_t naturally
>> > must be able to represent any possible array subscript.
>>
>> That's the very reason why I suggest using 'size_t' for array
>> indexing in generic context. For example, in a generic array-
>> support library.
>>
>> At application level there's rarely a need to have an array just for
>> the sake of having an array.
>
> Right, but if you get in the habit of writing code where the index
> is an int, sooner or later you'll write code that (after it's been
> edited by a dozen other coders) overflows that int but size_t
> would have worked.
Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works.


Per the standard, size_t CANNOT overflow when used as an array index.


That's exactly what I said in my original message (the one you initially
replied to). You omitted that portion from the quote. What's the point
of repeating it here?
One day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array, like
'deque', for example. Size of that container is no longer limited by
the range of 'size_t'.


A 'deque' is not an array and therefore any problem indexing them with
size_t is moot.
[ Rest of post snipped because it doesn't apply to arrays. ]


Quite the opposite. Within this context anything that applies
specifically to indexing of arrays is moot. This whole branch of this
thread doesn't apply to arrays. This whole branch is started by your
reply to the "generic" portion of my message (quoted at the very top).
And I clearly and repeatedly explained that issue I'm talking about is
the _conceptual_ issue of using 'size_t' as an index type for a generic
indexed container.
The OP specifically asked about the type to use for an array index.
Hijacking the thread to generalize to other non-standard container types is
irrelevant at best.


You did not reply to the OP. You replied to me. You replied to something
that was noting more than a side note in my response to the OP. You
extracted this side note from my message, robbing it of its original
context, and tried to misrepresent it as thread hijacking? Sorry, but
that's you who hijacked this thread. If you did this unintentionally ,
the I can only suggest that in the future, when you reply to a
particular portions of the people messages, make sure you understand
what they are talking about.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #22
CBFalconer wrote:
...
Right, but if you get in the habit of writing code where the
index is an int, sooner or later you'll write code that (after
it's been edited by a dozen other coders) overflows that int but
size_t would have worked.


Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works. One
day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array,
like 'deque', for example. Size of that container is no longer
limited by the range of 'size_t'.


All of which leads to easy generation of buffer overflows. What is
really needed is an index type to match the array, i.e. a
subrange. Unfortunately C doesn't have such a thing. Even an
enumeration won't work. An enhancement of typedef would allow
stronger typing in future, but probably won't happen. I.E. let
typedef create a real type:

typedef 0U ... 10U myindex; /* reusing the ... varargs token */

T arraything[tmax(myindex)];

for (myindex i = 0; i < tmax(myindex); i++) ...

using tmax (and tmin) as operators, similar to sizeof.

crossed to comp.std.c in case it stirs some thoughts.
...


That's true, but again, at application level the issue of choosing the
index type for an array is not really related to arrays in any way,
hoiwever strange it might sound. In other words, at application level
the original issue does not exist at all. The very fact that someone is
asking about the choice of type for array index already indicates that
there's a more generic problem with the way that someone designs its
code or with his way of thinking about the issue. Let me explain once again.

In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing). It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.

However, in specific (application-level) code the choice of index type
immediately follows from the choice of the corresponding "quantity"
type. Whenever there is a need to store an application-specific
"quantity" in the program (number of cars on the lot, number of
employees in the company, number of lines in the file etc.) there's an
issue of choosing a type to represent that quantity. Assuming that we
are choosing from the set of built-in types, there's will always be a
limitation on the maximum quantity we can handle in the program. This
limitation will become a part of the program's specification. It is the
program's responsibility to observe that this part of the specification
is met. Otherwise, the chosen quantity type will overflow with
disastrous results. Your suggestion concerning "range types" is directly
relevant to this particular issue - the issue of choosing (or creating)
a type for representing an application specific "quantity" (although the
need to watch for overflows will always be there, as long as we are
using a fixed-range type). Note, that so far I never mentioned any
arrays or any other containers. There might be no containers in the
program at all. The issue of choosing the "quantity" type, however,
still exists.

Now, let's assume that one needs an array in the program and needs to
choose the index type. The first question one should ask himself is what
this array is going to store, what is the number of elements in that
array and (!) what type is _currently_ used to represent that quantity.
Note, that by the time one needs an array the choice of the "quantity"
type has _already_ been made. _That_ type is the type that should be
used for array indexing, period. Someone here suggested that if array
index type might overflow, if it is not 'size_t'. Wrong. As long as the
same type is used to represent the number of elements in the array and
the index (regardless of the concrete type, could be 'unsigned char' for
example) no overflow can ever occur. More precisely, at that stage
there's no issue of index type overflow. What _can_ overflow is the
originally chosen "quantity" type, but this issue has absolutely nothing
to do with array indexing. The overflow will happen long before one even
gets to any array indexing. The only thing that overflow will mean is
that the author of the code made a bad choice of "quantity" type (not
index type) or failed to enforce the program specification.

The suggestion to _unconditionall y_ use 'size_t' for array indexing
because it "can never overflow" is one of those "fake wisdoms" that
float around the net in great numbers. It is as correct in theory as it
is useless and irrelevant in practice. Something from the area of "Never
use division, because there's a danger of dividing something by zero".

As a disclaimer: I said that many times already, but just in case, for
"I'm not a reader, I'm a writer" type of posters - 'size_t' is a correct
choice of index type for generic non-negative array indexing. Generic
array processing functions should always use this particular type for
representing array sizes and array indices (note, BTW, that in this case
the latter follows form the former too, i.e. the index type is
determined by the corresponding "quantity" type).

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #23
Andrey Tarasevich wrote:
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.


What type do you like for counting list nodes?

--
pete
Nov 14 '05 #24
pete wrote:
...
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.


What type do you like for counting list nodes?
...


At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.

In application-level code the choice is dictated by the application
area, just like with arrays.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #25
Andrey Tarasevich <an************ **@hotmail.com> writes:
pete wrote:
...
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.


What type do you like for counting list nodes?
...


At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.


And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)

Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.

On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #26
Keith Thompson wrote:
...
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.

What type do you like for counting list nodes?
...
At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.


And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)


If faced with a task of implementing a portable generic list library I'd
still never use 'size_t' to represent list element count. Platforms that
have 'size_t's range narrower than that of 'unsigned long' are also
possible (moreover, I think they are far less exotic than the one you
mention). Range of the type 'size_t' is too "volatile" to be used for
this purpose. And the conceptual reason for that "volatility " is the one
I already mentioned - 'size_t' is directly related to the concept of
"object size" but completely unrelated to the concept of "object count".

Also, consider the applications that would use this library. What are
they going to use to represent application-specific quantities on the
platform you describe? If they use 'int' or 'long' or 'long long' types,
then the issue doesn't exist - they should simply never have lists that
are longer than the ranges of those types. Otherwise, I simply don't see
any good solution for this situation (besides the one described in the
next paragraph).

In situations when providing the source code of the library to the
client is an option, a possible solution would be to implement it in
template-like manner, i.e. make the source code to depend on an
externally '#define'd parameter and give the client the responsibility
to '#define' it and "instantiat e" the code with whatever count/index
type he/she prefers. This is, of course, not always an option.
Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.
I don't really see what guarantees you are talking about. 'size_t's
range can easily be narrower than that of, for example, 'unsigned long'.
There's no guarantees against that.
On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.


Surprised? Why? This is a typical situation for segmented-memory
platforms. Many of those not-so-old 16 bit platforms possessed this
property. Most (if not all) C compilers on DOS or Win16 platform were
like that. 'size_t' had 0-65535 range, but there was absolutely no
problem in allocating more than 65535 objects.

--
Best regards,
Andrey Tarasevich
Nov 14 '05 #27
In article <ln************ @nuthaus.mib.or g>,
Keith Thompson <ks***@mib.or g> wrote:
Andrey Tarasevich <an************ **@hotmail.com> writes:
pete wrote:
...
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.

What type do you like for counting list nodes?
...


At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.


And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)

Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.

On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.


Consider old 8086 systems, where the size of on object was restricted to
64KB, so size_t = 16 bit, but the total size of all objects was
restricted to 1 MB. Should be quite possible to allocate more than 65535
very small objects in that situation. (80286 systems might have had
limits of 65535 bytes per object and 16 MB total, but I am not sure
about that).
Nov 14 '05 #28
Christian Bau <ch***********@ cbau.freeserve. co.uk> writes:
In article <ln************ @nuthaus.mib.or g>,
Keith Thompson <ks***@mib.or g> wrote:

[...]
On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.


Consider old 8086 systems, where the size of on object was restricted to
64KB, so size_t = 16 bit, but the total size of all objects was
restricted to 1 MB. Should be quite possible to allocate more than 65535
very small objects in that situation. (80286 systems might have had
limits of 65535 bytes per object and 16 MB total, but I am not sure
about that).


Ok, so color me surprised (and yes, I should have thought of that).

How about using a nest of #ifs to create a typedef "count_t" that's an
alias for the first of uintmax_t, unsigned long long, or unsigned long
that exists? (The second test caters to C90 implementations with
extensions, the third to unenhanced C90 implementations .) This should
get you the biggest available unsigned integer type. It might be
bigger than you need, but unless you're going to create huge arrays of
them, or unless it imposes a serious performance hit, that shouldn't
be too much of a problem.

Whatever count type you use, you should probably define it in a single
location, so even if users can't easily customize it, future
maintainers can.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #29
On Fri, 01 Apr 2005 01:05:37 GMT, Keith Thompson
<ks***@mib.or g> wrote:
How about using a nest of #ifs to create a typedef "count_t" that's an
alias for the first of uintmax_t, unsigned long long, or unsigned long
that exists? (The second test caters to C90 implementations with
extensions, the third to unenhanced C90 implementations .) This should
get you the biggest available unsigned integer type. It might be
bigger than you need, but unless you're going to create huge arrays of
them, or unless it imposes a serious performance hit, that shouldn't
be too much of a problem.
That's basically what I do, except that I do it with a header file, and
that file is created by a program which includes limits.h for that
system (also includes float.h and spits out some of the limits as
#defines, because C89 says "Of the values in the <float.h> header,
FLT_RADIX shall be a constant expression suitable for use in #if
preprocessing directives; all other values need not be constant
expressions" which makes them not suitable to do such type
determination). (The header file could of course be edited by hand if
it is not possible to run that program on the target or with the target
header files to generate the header.)
Whatever count type you use, you should probably define it in a single
location, so even if users can't easily customize it, future
maintainers can.


That should be true for a lot of implementation determined things (like
byte order where that matters, sizes of system specific things like
filesnames, etc.).

Chris C
Nov 14 '05 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
10637
by: leslie_tighe | last post by:
Hello, I have a method on a com+ object that is returning an array of objects. I know the array is popluated as calls to check the ubound and lbound show valid values. However, any calls to get the value of a cell in the array results in a type mismatch error.
17
2451
by: benben | last post by:
Given a class template Vector<>, I would like to overload operator +. But I have a hard time deciding whether the return type should be Vector<U> or Vector<V>, as in: template <typename U, typename V> Vector<U_or_V> operator+ ( const Vector<U>&, const Vector<V>&);
1
1505
by: Allen Maki | last post by:
Why can I get the index of the item of the array when I use string*, but can not get the index of the array when I use any other type (such as Int32)? This code will compile perfectly, but if I replace String* with Int32, the code would not compile? The code using String* type: #include "stdafx.h"
51
23732
by: Pedro Graca | last post by:
I run into a strange warning (for me) today (I was trying to improve the score of the UVA #10018 Programming Challenge). $ gcc -W -Wall -std=c89 -pedantic -O2 10018-clc.c -o 10018-clc 10018-clc.c: In function `main': 10018-clc.c:22: warning: array subscript has type `char' I don't like warnings ... or casts.
669
26192
by: Xah Lee | last post by:
in March, i posted a essay “What is Expressiveness in a Computer Language”, archived at: http://xahlee.org/perl-python/what_is_expresiveness.html I was informed then that there is a academic paper written on this subject. On the Expressive Power of Programming Languages, by Matthias Felleisen, 1990. http://www.ccs.neu.edu/home/cobbe/pl-seminar-jr/notes/2003-sep-26/expressive-slides.pdf
7
9837
by: Harris | last post by:
Dear all, I have the following codes: ====== public enum Enum_Value { Value0 = 0, Value1 = 10,
28
1953
by: VK | last post by:
A while ago I wrote a "Vector data type" script using DOM interface to select.options. That was a (beautiful) mind game :-) rather than a practical thing to use. Here is another attempt open for criticism, this time dead serious. I really need an effective Vector emulator for a project (as much effective as something not implemeted in language but emulated by means of the language itself is: a
3
1728
by: Hkan Johansson | last post by:
Being new to C# I wonder if the index of an array can be some other ordinal type than int? Coming from Delphi, I can determine the type of the index as well as the type of the elements, but haven't had any success with C# so far. The following example yields the compiler error: error CS0118: 'DemoClass.Months' is a 'type' but is used like a 'variable'. class DemoClass { enum Months {
26
8385
by: Szabolcs Nagy | last post by:
what type can/should be used as array index? eg. can i use unsigned long? or should it be size_t? example: int sum(int *a, index_t n) { int s = 0; index_t i;
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10210
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10039
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9990
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9860
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7406
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5445
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3955
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3560
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.