473,883 Members | 1,528 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

array subscript type cannot be `char`?

I run into a strange warning (for me) today (I was trying to improve
the score of the UVA #10018 Programming Challenge).

$ gcc -W -Wall -std=c89 -pedantic -O2 10018-clc.c -o 10018-clc
10018-clc.c: In function `main':
10018-clc.c:22: warning: array subscript has type `char'

I don't like warnings ... or casts.
#include <stdio.h>

#define SIGNEDNESS
/* #define SIGNEDNESS signed */ /* either of these */
/* #define SIGNEDNESS unsigned */ /* defines "works" */

static int charval['9' + 1];
static unsigned long x;

int main(void) {
SIGNEDNESS char test[] = "9012";
SIGNEDNESS char *p = test;

charval['1'] = 1;
charval['2'] = 2;
/* similarly for 3 to 8 */
charval['9'] = 9;

x = 0; /* redundant */
while (*p) {
x *= 10;
x += charval[*p]; /* line 22 */

/* casts to get rid of warning: all of them "work"! */
/* x += charval[ (int) *p]; */
/* x += charval[ (size_t) *p]; */
/* x += charval[ (unsigned) *p]; */
/* x += charval[ (long) *p]; */
/* x += charval[ (wchar_t) *p]; */
/* x += charval[ (signed char) *p]; */
/* x += charval[ (unsigned char) *p]; */

++p;
}

printf("%lu\n", x);
return 0;
}
Is this only a question of portability? (I realize the warning appears
only because of the -Wall option to gcc)

What is the type of an array subscript?
I'd guess size_t, and other types would be promoted automatically.

Should I make an effort to declare all char stuff as either signed or
unsigned? ... before it runs on a DS 9000 :)

--
If you're posting through Google read <http://cfaj.freeshell. org/google>
Mar 22 '06
51 23776
On 2006-03-24, Keith Thompson <ks***@mib.or g> wrote:
Mark McIntyre <ma**********@s pamcop.net> writes:
On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:

On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
wrote:

> AFAICT, in EBCDIC the character '0' has value 0xF0.
> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> for conforming compilers?
>
No. It is on the specific compiler if char defaults to signed or
unsigned.

Wrong.


I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.


Yes, but nobody was asking about the general case.


Many people interpreted Pedro Graca's question as talking about the
general case - i.e. "If 0 _can_ have the high bit set, then is it the
case that char _must_ be unsigned?"
Mar 25 '06 #31
Jordan Abel wrote:
Keith Thompson
Mark McIntyre
Richard Bos
Herbert Rosenau
> Pedro Graca
> > AFAICT, in EBCDIC the character '0' has value 0xF0.
> > Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> > for conforming compilers?


[snip different answers for different interpretations of the question
and a big mess of people telling each other everyone is wrong.]

Shortly after people started answering the question I realized it didn't
come out as specific as I meant. However the discussion was (has been)
very enjoyable and I didn't want to interrupt by making a clearer
question. As a wannabe "pedantic" I apologize for my lack of pedantism.

I want to thank you all (again) for your comments and time spent on
this.
*Thank you*

Please carry on with the discussion :)

--
If you're posting through Google read <http://cfaj.freeshell. org/google>
Mar 25 '06 #32

Robert Gamble wrote:
en******@yahoo. com wrote:
Robert Gamble wrote:
Keith Thompson wrote:
> Ben C <sp******@spam. eggs> writes:
> > On 2006-03-22, Robert Gamble <rg*******@gmai l.com> wrote:
> >> Ben C wrote:
> >>> On 2006-03-22, Old Wolf <ol*****@inspir e.net.nz> wrote:
> >>> > [...] a negative subscript to an array will cause undefined
> >>> > behaviour.
> >
> >>> [...] Are you sure?
> >
> >>> int x[10];
> >>> int *y = x + 5;
> >>> y[-1] = 100;
> >>> ...
> >
> >> y is not an array, it is a pointer.
> >
> > What about this then?
> >
> > #include <stdio.h>
> >
> > int main(void)
> > {
> > int x[][3] =
> > {
> > {1, 2, 3},
> > {4, 5, 6},
> > {7, 8, 9}
> > };
> >
> > printf("%d\n", x[1][-1]);
> >
> > return 0;
> > }
> >
> > x is an array, not a pointer. I believe there is nothing "undefined"
> > here.
>
> I think that's actually a matter of some dispute.

It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16:

"For an array of arrays, the permitted pointer arithmetic in subclause
6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
of the word ``object'' as denoting the specific object determined
directly by the pointer's type and value, not other objects related to
that one by contiguity. Therefore, if an expression exceeds these
permissions, the behavior is undefined. For example, the following code
has undefined behavior:
int a[4][5];

a[1][7] = 0; /* undefined */
Some conforming implementations may choose to diagnose an ``array
bounds violation,'' while others may choose to interpret such attempted
accesses successfully with the ``obvious'' extended semantics."

The result of this question was to add the following to the
(informative) section G.2 which documents examples of undefined
behavior:

"An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.3.6)."


An easy/lazy/stupid response to the DR, resulting in an
easy/lazy/stupid statement in the standard.

void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;


Looks okay.
/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;


Looks okay.
/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;


Looks okay.
/*4*/ p = a[1];
p[7] = 0;


p is a pointer to int, not an array so this should be okay.
/*5*/ (p = a[1])[7] = 0;


Same as #4 as far as I can tell.
/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;


I don't see how these two are different but in both cases a[1] is an
array of 5 ints with and 7 is an out of range subscript. Definitely
not okay.
At what point in 1-7 does the behavior become undefined?


I would say at #6.


The type and value at #6 is the same as the type and value
at #5, #4, #3, #2 and #1.
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."


In addition to this, the only rationale provided for the response to
the Defect Report is that implementations may choose to diagnose out of
bounds conditions. In your example, 1-5 uses the subscript operator on
a pointer, not an array so there should not be an issue as the elements
are still garaunteed to be contigious. In 6 and 7 you are using an
invalid subscript on an array object which is clearly undefined.


I suggest you re-familiarize yourself with the rules for
array conversion and the Semantics paragraphs for subscript
operators. All the subscript operators above have pointer
operands. The [] operator doesn't work on array operands,
only on pointer operands.

Mar 25 '06 #33

Chris Torek wrote:
In article <11************ **********@j33g 2000cwa.googleg roups.com>
<en******@yahoo .com> wrote:
void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;

/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;

/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;

/*4*/ p = a[1];
p[7] = 0;

/*5*/ (p = a[1])[7] = 0;

/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;

At what point in 1-7 does the behavior become undefined?
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."


Note: I am not sure whether I *agree* with the argument I am about
to present. I merely *present* it.

Someone -- I think Doug Gwyn -- said that the (or "an") aim of the
rules here is to allow a compiler to "cheat" by, internally, tagging
pointers with additional information about the size(s) of the
underlying object(s) from which the pointers are derived.

Imagine for a moment a machine in which "add integer value that
does not exceed 10 to pointer" is one million times faster than
"add any integer to pointer". Suppose that sizeof(int) is 2, so
that sizeof(a[0]) is 10. Suppose that the fast add gives the
"wrong" result if the integer is greater than 10 (presumably,
produces a sum that is smaller than the desired result).

Now, for (1) through (3) above, the compiler is probably forced to
use the million-times-slower addition to compute p[7]. In (4),
it probably still uses this. In (5), the compiler may have a little
more information, and by assignment 6, the compiler *definitely*
has more information.

In particular, in assignment 6, the compiler is allowed to "know"
that a[1] is an "array 5 of int" so that sizeof a[1] is 10. It
can therefore use the "fast add" to add 14 to the pointer, on the
assumption that the integer (14) must be 10-or-less (even though
it is not). This, of course, gives the "wrong" result, addressing
a[1][2] or a[1][3] or some such (maybe even halfway between the two).

According to this argument, the behavior becomes undefined
somewhere around assignment 5 or 6, and is definitely undefined
by assignment 7. ...


My complaint is that it's unclear when there is undefined
behavior and when there isn't. It might be a good idea to
define only "in bounds" array accesses, or it might not,
but the question here is which accesses are defined and
which aren't. The "clarifying text" in appendix J states
informative information that is _not derivable_ from normative
information, and _still_ isn't particularly clarifying about
what is allowed and what isn't. I just don't think the
committee did a very good job (a) of deciding which accesses
they wanted to leave undefined, or (b) of communicating what
their decision for (a) was. My fear is that both (a) and (b)
apply.

The argument you present is reasonable, except for the
conclusion about where undefinedness happens. All of 1
through 7 have the same type and value. If which object
is relevant is "determined directly by the _pointer's_
type and value", and all the pointers have the same type
and value, how can some be defined and some be undefined?

Mar 25 '06 #34
en******@yahoo. com wrote:
Robert Gamble wrote:
en******@yahoo. com wrote:
Robert Gamble wrote:
> Keith Thompson wrote:
> > Ben C <sp******@spam. eggs> writes:
> > > On 2006-03-22, Robert Gamble <rg*******@gmai l.com> wrote:
> > >> Ben C wrote:
> > >>> On 2006-03-22, Old Wolf <ol*****@inspir e.net.nz> wrote:
> > >>> > [...] a negative subscript to an array will cause undefined
> > >>> > behaviour.
> > >
> > >>> [...] Are you sure?
> > >
> > >>> int x[10];
> > >>> int *y = x + 5;
> > >>> y[-1] = 100;
> > >>> ...
> > >
> > >> y is not an array, it is a pointer.
> > >
> > > What about this then?
> > >
> > > #include <stdio.h>
> > >
> > > int main(void)
> > > {
> > > int x[][3] =
> > > {
> > > {1, 2, 3},
> > > {4, 5, 6},
> > > {7, 8, 9}
> > > };
> > >
> > > printf("%d\n", x[1][-1]);
> > >
> > > return 0;
> > > }
> > >
> > > x is an array, not a pointer. I believe there is nothing "undefined"
> > > here.
> >
> > I think that's actually a matter of some dispute.
>
> It might have been in 1992, I think that DR #17 made it pretty clear
> that this is undefined behavior. Quote the response to question #16:
>
> "For an array of arrays, the permitted pointer arithmetic in subclause
> 6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
> of the word ``object'' as denoting the specific object determined
> directly by the pointer's type and value, not other objects related to
> that one by contiguity. Therefore, if an expression exceeds these
> permissions, the behavior is undefined. For example, the following code
> has undefined behavior:
> int a[4][5];
>
> a[1][7] = 0; /* undefined */
> Some conforming implementations may choose to diagnose an ``array
> bounds violation,'' while others may choose to interpret such attempted
> accesses successfully with the ``obvious'' extended semantics."
>
> The result of this question was to add the following to the
> (informative) section G.2 which documents examples of undefined
> behavior:
>
> "An array subscript is out of range, even if an object is apparently
> accessible with the given subscript (as in the lvalue expression
> a[1][7] given the declaration int a[4][5]) (6.3.6)."

An easy/lazy/stupid response to the DR, resulting in an
easy/lazy/stupid statement in the standard.

void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;


Looks okay.
/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;


Looks okay.
/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;


Looks okay.
/*4*/ p = a[1];
p[7] = 0;


p is a pointer to int, not an array so this should be okay.
/*5*/ (p = a[1])[7] = 0;


Same as #4 as far as I can tell.
/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;


I don't see how these two are different but in both cases a[1] is an
array of 5 ints with and 7 is an out of range subscript. Definitely
not okay.
At what point in 1-7 does the behavior become undefined?


I would say at #6.


The type and value at #6 is the same as the type and value
at #5, #4, #3, #2 and #1.
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."


In addition to this, the only rationale provided for the response to
the Defect Report is that implementations may choose to diagnose out of
bounds conditions. In your example, 1-5 uses the subscript operator on
a pointer, not an array so there should not be an issue as the elements
are still garaunteed to be contigious. In 6 and 7 you are using an
invalid subscript on an array object which is clearly undefined.


I suggest you re-familiarize yourself with the rules for
array conversion and the Semantics paragraphs for subscript
operators. All the subscript operators above have pointer
operands. The [] operator doesn't work on array operands,
only on pointer operands.


Right, the array decays into a pointer before the subscript operator is
applied. My point is that the object in #6 and #7 from which that
pointer was directly derived was an array. My interpretation is that
only in the case where the pointer for which the subscript operator is
being applied is immediately and directly the result of array decay is
the behavior undefined. I believe that this was the intent based on
the rationale but confess that the wording is not neccessarily concrete
enough to clearly and concisely delineate the exact point in your
example in which the behavior becomes undefined. I can see a possible
argument for UB to be invoked in earlier examples. I would strongly
suggest that you present your original question and examples to the
folks at comp.std.c where you are much more likely to receive an answer
from someone like Doug Gwyn who was directly involved in the response
to that DR.

Robert Gamble

Mar 26 '06 #35
On Fri, 24 Mar 2006 06:58:16 UTC, Keith Thompson <ks***@mib.or g>
wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> writes:
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
wrote:

[...]
Is it guaranteed that all characters available on some implementation
for which there is a standards compliant compiler are positive?

AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No.


You are mistaken.

EBCDIC, ASCII or what charset is even undefined by standard C:
Assuming a specific charset and then talking about signed char will
fail in some cases as the standard does no assumption on anything than
digigits. May be your compiler guarantees unsigned on an char
containing 'E0' but char may always be signed and 'E0' holds an
negative sign when CHAR_BITS == 8.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
Mar 26 '06 #36
On Fri, 24 Mar 2006 19:56:13 UTC, Keith Thompson <ks***@mib.or g>
wrote:
Mark McIntyre <ma**********@s pamcop.net> writes:
On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:

On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
wrote:

> AFAICT, in EBCDIC the character '0' has value 0xF0.
> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> for conforming compilers?
>
No. It is on the specific compiler if char defaults to signed or
unsigned.

Wrong.


I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.


Yes, but nobody was asking about the general case.


We're discussing standard C here, not EBCDIC C. What is char 'E5'? Is
it a letter? Is it a number? It is some bitpattern. Without knowledge
of the signness of the char holding it, it may be either signed or
unsigned at all.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
Mar 26 '06 #37
"Herbert Rosenau" <os****@pc-rosenau.de> writes:
On Fri, 24 Mar 2006 06:58:16 UTC, Keith Thompson <ks***@mib.or g>
wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> writes:
> On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
> wrote:

[...]
>> Is it guaranteed that all characters available on some implementation
>> for which there is a standards compliant compiler are positive?
>>
>> AFAICT, in EBCDIC the character '0' has value 0xF0.
>> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
>> for conforming compilers?
>>
> No.


You are mistaken.

EBCDIC, ASCII or what charset is even undefined by standard C:
Assuming a specific charset and then talking about signed char will
fail in some cases as the standard does no assumption on anything than
digigits. May be your compiler guarantees unsigned on an char
containing 'E0' but char may always be signed and 'E0' holds an
negative sign when CHAR_BITS == 8.


I read Pedro's question as being based on an assumption of CHAR_BIT==8
*and* an EBCDIC character set. I'm no longer sure that that was
Pedro's intent.

If CHAR_BIT==8, plain char may be either signed or unsigned.

If CHAR_BIT==8 and '0'==0xF0 (as it is in ECBDIC), then plain char
must be unsigned.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Mar 26 '06 #38
Herbert Rosenau schrieb:
On Fri, 24 Mar 2006 19:56:13 UTC, Keith Thompson <ks***@mib.or g>
wrote:

Mark McIntyre <ma**********@s pamcop.net> writes:
On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekst ra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:
>On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit .com>
>wrote:
>
>
>> AFAICT, in EBCDIC the character '0' has value 0xF0.
>> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
>> for conforming compilers?
>>
>
>No. It is on the specific compiler if char defaults to signed or
>unsigned .

Wrong.

I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.


Yes, but nobody was asking about the general case.


We're discussing standard C here, not EBCDIC C. What is char 'E5'? Is
it a letter? Is it a number? It is some bitpattern. Without knowledge
of the signness of the char holding it, it may be either signed or
unsigned at all.


Nope. We are comparing an int constant ('0') with an unsigned int
constant (0x...). If we have CHAR_BIT == 8, then "knowledge about
the signedness" of the underlying char does not play any role, as
int has at least 16 bits, i.e. the values cannot compare equal if
they are not equal. In this case, if a value >= 0x80 compares equal
to a character from the basic source character set and there are no
conversions in between (such as casting both values to unsigned
char), then char must be an unsigned integer type.

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Mar 26 '06 #39
Keith Thompson wrote:
Herbert Rosenau
Keith Thompson
Herbert Rosenau
> Pedro Graca
>> Is it guaranteed that all characters available on some implementation
>> for which there is a standards compliant compiler are positive?
>>
>> AFAICT, in EBCDIC the character '0' has value 0xF0.
>> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
>> for conforming compilers?

I read Pedro's question as being based on an assumption of CHAR_BIT==8
*and* an EBCDIC character set. I'm no longer sure that that was
Pedro's intent.


Yes, that was my intent.

The "main question" was incomplete (better one follows)
: Is it guaranteed that all characters in the basic execution set
: defined by the Standard are positive?

and the example was poorly worded (better one follows)
: AFAICT, in EBCDIC the character '0' has value 0xF0.
: For an implementation with EBCDIC for basic character set
: *and* CHAR_BIT == 8 does it follow that plain char is unsigned
: for /that/ particular implementation?
I wanted to make sure the following program would not invoke UB (by
trying to access a array with a negative index in line 22), no matter
the implementation it runs on

#include <assert.h>
#include <stdio.h>
int main(void) {
char test[] = '9012';
char charval['9'+1];
char *p;
int sum = 0;
/* assumes INT_MAX >= 9012; but I think the standard mandates
* int be at least 32767 */
assert(INT_MAX >= 9012);

charval['0'] = 0;
charval['1'] = 1;
charval['2'] = 2;
charval['9'] = 9;
while (*p) {
sum *= 10;
sum += charval[*p]; /* line 22 */
p++;
}
printf("%d\n", sum); /* print 9012 */
return 0;
}
If CHAR_BIT==8, plain char may be either signed or unsigned.

If CHAR_BIT==8 and '0'==0xF0 (as it is in ECBDIC), then plain char
must be unsigned.


I realize that no matter what character set the implementation defines
or what CHAR_BIT is for the implementation, *all* characters in the
basic set defined by the Standard (digits, lowercase and uppercase
letters, <tab>, <newline>, and a few signs, ...) must be positive.
There is no guarantee for characters outside this set:

char *p = "Pedro Graça";
int i=0;
while (*p) {
charval[*p] = i++; /* possible BANG! for 'ç' */
++p;
}

--
If you're posting through Google read <http://cfaj.freeshell. org/google>
Mar 27 '06 #40

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.