473,836 Members | 2,171 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

array subscript type cannot be `char`?

I run into a strange warning (for me) today (I was trying to improve
the score of the UVA #10018 Programming Challenge).

$ gcc -W -Wall -std=c89 -pedantic -O2 10018-clc.c -o 10018-clc
10018-clc.c: In function `main':
10018-clc.c:22: warning: array subscript has type `char'

I don't like warnings ... or casts.
#include <stdio.h>

#define SIGNEDNESS
/* #define SIGNEDNESS signed */ /* either of these */
/* #define SIGNEDNESS unsigned */ /* defines "works" */

static int charval['9' + 1];
static unsigned long x;

int main(void) {
SIGNEDNESS char test[] = "9012";
SIGNEDNESS char *p = test;

charval['1'] = 1;
charval['2'] = 2;
/* similarly for 3 to 8 */
charval['9'] = 9;

x = 0; /* redundant */
while (*p) {
x *= 10;
x += charval[*p]; /* line 22 */

/* casts to get rid of warning: all of them "work"! */
/* x += charval[ (int) *p]; */
/* x += charval[ (size_t) *p]; */
/* x += charval[ (unsigned) *p]; */
/* x += charval[ (long) *p]; */
/* x += charval[ (wchar_t) *p]; */
/* x += charval[ (signed char) *p]; */
/* x += charval[ (unsigned char) *p]; */

++p;
}

printf("%lu\n", x);
return 0;
}
Is this only a question of portability? (I realize the warning appears
only because of the -Wall option to gcc)

What is the type of an array subscript?
I'd guess size_t, and other types would be promoted automatically.

Should I make an effort to declare all char stuff as either signed or
unsigned? ... before it runs on a DS 9000 :)

--
If you're posting through Google read <http://cfaj.freeshell. org/google>
Mar 22 '06
51 23756
>> > What about this then?
>
> #include <stdio.h>
>
> int main(void)
> {
> int x[][3] =
> {
> {1, 2, 3},
> {4, 5, 6},
> {7, 8, 9}
> };
>
> printf("%d\n", x[1][-1]);
>
> return 0;
> }
>
> x is an array, not a pointer. I believe there is nothing "undefined"
> here.


I think that's actually a matter of some dispute.


It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16
[...]:


Most interesting. Thank you, I stand corrected!
Mar 23 '06 #21
en******@yahoo. com wrote:
Robert Gamble wrote:
Keith Thompson wrote:
Ben C <sp******@spam. eggs> writes:
> On 2006-03-22, Robert Gamble <rg*******@gmai l.com> wrote:
>> Ben C wrote:
>>> On 2006-03-22, Old Wolf <ol*****@inspir e.net.nz> wrote:
>>> > [...] a negative subscript to an array will cause undefined
>>> > behaviour.
>
>>> [...] Are you sure?
>
>>> int x[10];
>>> int *y = x + 5;
>>> y[-1] = 100;
>>> ...
>
>> y is not an array, it is a pointer.
>
> What about this then?
>
> #include <stdio.h>
>
> int main(void)
> {
> int x[][3] =
> {
> {1, 2, 3},
> {4, 5, 6},
> {7, 8, 9}
> };
>
> printf("%d\n", x[1][-1]);
>
> return 0;
> }
>
> x is an array, not a pointer. I believe there is nothing "undefined"
> here.

I think that's actually a matter of some dispute.
It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16:

"For an array of arrays, the permitted pointer arithmetic in subclause
6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
of the word ``object'' as denoting the specific object determined
directly by the pointer's type and value, not other objects related to
that one by contiguity. Therefore, if an expression exceeds these
permissions, the behavior is undefined. For example, the following code
has undefined behavior:
int a[4][5];

a[1][7] = 0; /* undefined */
Some conforming implementations may choose to diagnose an ``array
bounds violation,'' while others may choose to interpret such attempted
accesses successfully with the ``obvious'' extended semantics."

The result of this question was to add the following to the
(informative) section G.2 which documents examples of undefined
behavior:

"An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.3.6)."


An easy/lazy/stupid response to the DR, resulting in an
easy/lazy/stupid statement in the standard.

void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;


Looks okay.
/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;
Looks okay.
/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;
Looks okay.
/*4*/ p = a[1];
p[7] = 0;
p is a pointer to int, not an array so this should be okay.
/*5*/ (p = a[1])[7] = 0;
Same as #4 as far as I can tell.
/*6*/ (a[1])[7] = 0; /*7*/ a[1][7] = 0;
I don't see how these two are different but in both cases a[1] is an
array of 5 ints with and 7 is an out of range subscript. Definitely
not okay.
At what point in 1-7 does the behavior become undefined?
I would say at #6.
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."


In addition to this, the only rationale provided for the response to
the Defect Report is that implementations may choose to diagnose out of
bounds conditions. In your example, 1-5 uses the subscript operator on
a pointer, not an array so there should not be an issue as the elements
are still garaunteed to be contigious. In 6 and 7 you are using an
invalid subscript on an array object which is clearly undefined.

Robert Gamble

Mar 23 '06 #22
In article <11************ **********@j33g 2000cwa.googleg roups.com>
<en******@yahoo .com> wrote:
void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;

/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;

/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;

/*4*/ p = a[1];
p[7] = 0;

/*5*/ (p = a[1])[7] = 0;

/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;

At what point in 1-7 does the behavior become undefined?
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."


Note: I am not sure whether I *agree* with the argument I am about
to present. I merely *present* it.

Someone -- I think Doug Gwyn -- said that the (or "an") aim of the
rules here is to allow a compiler to "cheat" by, internally, tagging
pointers with additional information about the size(s) of the
underlying object(s) from which the pointers are derived.

Imagine for a moment a machine in which "add integer value that
does not exceed 10 to pointer" is one million times faster than
"add any integer to pointer". Suppose that sizeof(int) is 2, so
that sizeof(a[0]) is 10. Suppose that the fast add gives the
"wrong" result if the integer is greater than 10 (presumably,
produces a sum that is smaller than the desired result).

Now, for (1) through (3) above, the compiler is probably forced to
use the million-times-slower addition to compute p[7]. In (4),
it probably still uses this. In (5), the compiler may have a little
more information, and by assignment 6, the compiler *definitely*
has more information.

In particular, in assignment 6, the compiler is allowed to "know"
that a[1] is an "array 5 of int" so that sizeof a[1] is 10. It
can therefore use the "fast add" to add 14 to the pointer, on the
assumption that the integer (14) must be 10-or-less (even though
it is not). This, of course, gives the "wrong" result, addressing
a[1][2] or a[1][3] or some such (maybe even halfway between the two).

According to this argument, the behavior becomes undefined
somewhere around assignment 5 or 6, and is definitely undefined
by assignment 7.

Obviously, real machines do not have instructions like "add restricted
integer not exceeding 10 to pointer" -- but they *do* have
range-restricted instructions. A compiler for a CPU like the 80186
might want to use 16-bit addition to address "small" arrays (with
segment:offset addressing), and 32-bit addition to address "large"
arrays (still with segment:offset but now using the full 20-bit
addressing mode, doing normalization and so on, instead of assuming
that the entire array fits in a single segment).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Mar 23 '06 #23
en******@yahoo. com writes:
Keith Thompson wrote:
Ben C <sp******@spam. eggs> writes: [...]
> int x[][3] =
> {
> {1, 2, 3},
> {4, 5, 6},
> {7, 8, 9}
> }; [...] > x is an array, not a pointer. I believe there is nothing "undefined"
> here.


I think that's actually a matter of some dispute. x[1] is a pointer
to an array of 3 ints,


You mean x[1] is an array of 3 ints. In context x[1] does turn
into a pointer, but it turns into a pointer to int.


You're right, thanks.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Mar 23 '06 #24
Chris Torek wrote:
Note: I am not sure whether I *agree* with the argument I am about
to present. I merely *present* it.


Is that an Alfred Hitchcock quote?

--
pete
Mar 23 '06 #25
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
wrote:
Old Wolf wrote:
Pedro Graca wrote:
Should I make an effort to declare all char stuff as either signed or
unsigned? ... before it runs on a DS 9000 :)


Just make sure your code does not rely on chars being either
signed or unsigned.
If you need to rely on unsignedness (eg. an array of all possible
char values) then you should explicitly use unsigned chars.


Thank you for your answers.

Is it guaranteed that all characters available on some implementation
for which there is a standards compliant compiler are positive?

AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No. It is on the specific compiler if char defaults to signed or
unsigned. Beside that some compilers have a switch to declare the
default signness of char explicity, some may hafe a #pragma, some may
have both for that.

At least you can override the compiler default anyway by expicite
define any char object by 'unsigned char' or 'signed char'. I found
out that forcing the comiler to interprete 'char' as 'unsigned char'
anyway when it has to hold characters and define explicite 'signed
char' when char is using as short short int but not as holder for
characters.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
Mar 24 '06 #26
"Herbert Rosenau" <os****@pc-rosenau.de> writes:
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
wrote:

[...]
Is it guaranteed that all characters available on some implementation
for which there is a standards compliant compiler are positive?

AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No.


You are mistaken.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Mar 24 '06 #27
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
wrote:
AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No. It is on the specific compiler if char defaults to signed or
unsigned.


Wrong. All characters in the basic execution character set must have a
positive value. '0' is a member of the basic execution charset. It must
therefore be above zero. If it also has the value 0xF0 and CHAR_BIT is
8, there's only one way for all the requirements to fit: plain char must
be unsigned.

Richard
Mar 24 '06 #28
On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
wrote:
> AFAICT, in EBCDIC the character '0' has value 0xF0.
> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> for conforming compilers?
>

No. It is on the specific compiler if char defaults to signed or
unsigned.


Wrong.


I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.

Mark McIntyre
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Mar 24 '06 #29
Mark McIntyre <ma**********@s pamcop.net> writes:
On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
wrote:

> AFAICT, in EBCDIC the character '0' has value 0xF0.
> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> for conforming compilers?
>
No. It is on the specific compiler if char defaults to signed or
unsigned.


Wrong.


I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.


Yes, but nobody was asking about the general case.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Mar 24 '06 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.