array subscript type cannot be `char`?

Pedro Graca

I run into a strange warning (for me) today (I was trying to improve
the score of the UVA #10018 Programming Challenge).

$ gcc -W -Wall -std=c89 -pedantic -O2 10018-clc.c -o 10018-clc
10018-clc.c: In function `main':
10018-clc.c:22: warning: array subscript has type `char'

I don't like warnings ... or casts.
#include <stdio.h>

#define SIGNEDNESS
/* #define SIGNEDNESS signed */ /* either of these */
/* #define SIGNEDNESS unsigned */ /* defines "works" */

static int charval['9' + 1];
static unsigned long x;

int main(void) {
SIGNEDNESS char test[] = "9012";
SIGNEDNESS char *p = test;

charval['1'] = 1;
charval['2'] = 2;
/* similarly for 3 to 8 */
charval['9'] = 9;

x = 0; /* redundant */
while (*p) {
x *= 10;
x += charval[*p]; /* line 22 */

/* casts to get rid of warning: all of them "work"! */
/* x += charval[ (int) *p]; */
/* x += charval[ (size_t) *p]; */
/* x += charval[ (unsigned) *p]; */
/* x += charval[ (long) *p]; */
/* x += charval[ (wchar_t) *p]; */
/* x += charval[ (signed char) *p]; */
/* x += charval[ (unsigned char) *p]; */

++p;
}

printf("%lu\n", x);
return 0;
}
Is this only a question of portability? (I realize the warning appears
only because of the -Wall option to gcc)

What is the type of an array subscript?
I'd guess size_t, and other types would be promoted automatically.

Should I make an effort to declare all char stuff as either signed or
unsigned? ... before it runs on a DS 9000 :)

--
If you're posting through Google read <http://cfaj.freeshell.org/google>

Mar 22 '06 #1

Subscribe Post Reply

23598

Robert Gamble

Pedro Graca wrote:

I run into a strange warning (for me) today (I was trying to improve
the score of the UVA #10018 Programming Challenge).

$ gcc -W -Wall -std=c89 -pedantic -O2 10018-clc.c -o 10018-clc
10018-clc.c: In function `main':
10018-clc.c:22: warning: array subscript has type `char'

[snip example program using char subscript]

There is technically nothing "wrong" about using char as an array
subscript, any integer type is legal as an array subscript.

According to the rationale for this warning in the gcc documentation,
many programmers forget the fact that char can be signed which could
obviously lead to unexpected problems if the char value was negative.
This warning is enabled with the -Wall option and can be disabled by
using -Wno-char-subscripts.

Robert Gamble

Mar 22 '06 #2

Old Wolf

Pedro Graca wrote:

int main(void) {
SIGNEDNESS char test[] = "9012";
SIGNEDNESS char *p = test;

charval['1'] = 1;
charval['2'] = 2;
/* similarly for 3 to 8 */
charval['9'] = 9;

x = 0; /* redundant */
while (*p) {
x *= 10;
x += charval[*p]; /* line 22 */
The warning is because chars can be negative, and a negative
subscript to an array will cause undefined behaviour. If you happen
to include some negative chars in test[], then you have UB.

This is not a required diagnostic; I guess the GCC developers feel
that this error is more likely to occur with a char than with other
signed integral types :)
Should I make an effort to declare all char stuff as either signed or
unsigned? ... before it runs on a DS 9000 :)

Just make sure your code does not rely on chars being either
signed or unsigned.
If you need to rely on unsignedness (eg. an array of all possible
char values) then you should explicitly use unsigned chars.

Mar 22 '06 #3

Keith Thompson

Pedro Graca <he****@dodgeit.com> writes:

I run into a strange warning (for me) today (I was trying to improve
the score of the UVA #10018 Programming Challenge).

$ gcc -W -Wall -std=c89 -pedantic -O2 10018-clc.c -o 10018-clc
10018-clc.c: In function `main':
10018-clc.c:22: warning: array subscript has type `char'

I don't like warnings ... or casts.
[code snipped]
Is this only a question of portability? (I realize the warning appears
only because of the -Wall option to gcc)
I think the point is that char can be either signed or unsigned,
depending on the implementation. Code that works properly where plain
char is unsigned might fail on another platform where plain char is
signed:

int arr[256];
char index = 200;
... arr[index] ...

Presumably if you use "signed char" explicitly, the compiler assumes
you know what you're doing.

Using plain int as an array index doesn't present the same problem,
because plain int is always signed; any problems will show up on any
platform.
What is the type of an array subscript?
I'd guess size_t, and other types would be promoted automatically.
The index merely has to have some integer type.
Should I make an effort to declare all char stuff as either signed or
unsigned? ... before it runs on a DS 9000 :)

If the actual values are always going to be in the range 0..127, it
shouldn't matter. If they can exceed 127 (the minimum possible value
of CHAR_MAX), you might consider either declaring your variables as
unsigned char, or casting to unsigned char when indexing:

int arr[256];
char index = 200;
... arr[(unsigned char)index] ...

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 22 '06 #4

Ben C

On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:

The warning is because chars can be negative, and a negative subscript
to an array will cause undefined behaviour.

Are you sure? There's nothing undefined about this:

#include <stdio.h>

int main(void)
{
int x[10];
int *y = x + 5;
y[-1] = 100;

printf("%d\n", y[-1]);

return 0;
}

Mar 22 '06 #5

Pedro Graca

Old Wolf wrote:

Pedro Graca wrote:
Should I make an effort to declare all char stuff as either signed or
unsigned? ... before it runs on a DS 9000 :)

Just make sure your code does not rely on chars being either
signed or unsigned.
If you need to rely on unsignedness (eg. an array of all possible
char values) then you should explicitly use unsigned chars.

Thank you for your answers.

Is it guaranteed that all characters available on some implementation
for which there is a standards compliant compiler are positive?

AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

--
If you're posting through Google read <http://cfaj.freeshell.org/google>

Mar 22 '06 #6

CBFalconer

Pedro Graca wrote:

.... snip ...
Is it guaranteed that all characters available on some implementation
for which there is a standards compliant compiler are positive?

AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No. However all chars in the required char set, which includes
'0'..'9', 'a'..'z', 'A'..'Z', '+-*!@#%^&(){}[]:;'"?\/<>.,' must be
positive.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Mar 22 '06 #7

Robert Gamble

Ben C wrote:

On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
The warning is because chars can be negative, and a negative subscript
to an array will cause undefined behaviour.

Are you sure? There's nothing undefined about this:

#include <stdio.h>

int main(void)
{
int x[10];
int *y = x + 5;
y[-1] = 100;

printf("%d\n", y[-1]);

return 0;
}

y is not an array, it is a pointer.

Robert Gamble

Mar 22 '06 #8

Richard G. Riley

On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:

Ben C wrote:
On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> The warning is because chars can be negative, and a negative subscript
> to an array will cause undefined behaviour.

Are you sure? There's nothing undefined about this:

#include <stdio.h>

int main(void)
{
int x[10];
int *y = x + 5;
y[-1] = 100;

printf("%d\n", y[-1]);

return 0;
}

y is not an array, it is a pointer.

Robert Gamble

Looks kind of ok to me : am I being too lax with pointers in thinking that?

It seems to my (becoming "more standard") eye, that y points to x[5],
and so y[-1] is perfectly valid since arr[-1] is the same as
*(arr-1). And since "arr" (or y) points into a valid data area then it
is defined. Is this wrong?

--
Debuggers : you know it makes sense.
http://heather.cs.ucdavis.edu/~matlo...g.html#tth_sEc

Mar 22 '06 #9

Richard Tobin

In article <s6************@fujitsu.mydomain.com>,

int x[10];
int *y = x + 5;
y[-1] = 100;

printf("%d\n", y[-1]);
y is not an array, it is a pointer.
Looks kind of ok to me : am I being too lax with pointers in thinking that?

It seems to my (becoming "more standard") eye, that y points to x[5],
and so y[-1] is perfectly valid since arr[-1] is the same as
*(arr-1). And since "arr" (or y) points into a valid data area then it
is defined. Is this wrong?

y[-1] is perfectly OK for the reason you give. Given an arbitrary
pointer y you can't tell whether y[-1] is valid or not. But if y were
declared as an array, you could be sure that it was wrong.

-- Richard

Mar 22 '06 #10

Kenneth Brody

Pedro Graca wrote:

I run into a strange warning (for me) today (I was trying to improve
the score of the UVA #10018 Programming Challenge).

$ gcc -W -Wall -std=c89 -pedantic -O2 10018-clc.c -o 10018-clc
10018-clc.c: In function `main':
10018-clc.c:22: warning: array subscript has type `char'

I don't like warnings ... or casts.

#include <stdio.h>

#define SIGNEDNESS
/* #define SIGNEDNESS signed */ /* either of these */
/* #define SIGNEDNESS unsigned */ /* defines "works" */ [...] SIGNEDNESS char *p = test; [...] x += charval[*p]; /* line 22 */ [...] /* casts to get rid of warning: all of them "work"! */ [...] /* x += charval[ (signed char) *p]; */
/* x += charval[ (unsigned char) *p]; */

[...]

Given that explicitly using "unsigned char" or "signed char" will both
get rid of the warning, my guess is that it's your compiler's way of
pointing out "hey, char can be signed in some environments, and unsigned
in others, so using a plain 'char' as a subscript may not necessarily be
what you want to do here".

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Mar 22 '06 #11

Robert Gamble

Richard G. Riley wrote:

On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
Ben C wrote:
On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> The warning is because chars can be negative, and a negative subscript
> to an array will cause undefined behaviour.

Are you sure? There's nothing undefined about this:

#include <stdio.h>

int main(void)
{
int x[10];
int *y = x + 5;
y[-1] = 100;

printf("%d\n", y[-1]);

return 0;
}

y is not an array, it is a pointer.

Robert Gamble

Looks kind of ok to me : am I being too lax with pointers in thinking that?

It seems to my (becoming "more standard") eye, that y points to x[5],
and so y[-1] is perfectly valid since arr[-1] is the same as
*(arr-1). And since "arr" (or y) points into a valid data area then it
is defined. Is this wrong?

No, it's not wrong at all. Old Wolf stated that a negative subscript
is not valid for an array. Ben C produced his attempt at a
counter-example. My point was that since he was using a negative
subscript with a pointer, not an array, that his example doesn't fall
under Old Wolf's assessment.

Robert Gamble

Mar 22 '06 #12

Ben C

On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:

Ben C wrote:
On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> [...] a negative subscript to an array will cause undefined
> behaviour.
[...] Are you sure? int x[10];
int *y = x + 5;
y[-1] = 100;
...
y is not an array, it is a pointer.

What about this then?

#include <stdio.h>

int main(void)
{
int x[][3] =
{
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};

printf("%d\n", x[1][-1]);

return 0;
}

x is an array, not a pointer. I believe there is nothing "undefined"
here.

Mar 22 '06 #13

Keith Thompson

Ben C <sp******@spam.eggs> writes:

On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
Ben C wrote:
On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> [...] a negative subscript to an array will cause undefined
> behaviour. [...] Are you sure? int x[10];
int *y = x + 5;
y[-1] = 100;
...

y is not an array, it is a pointer.

What about this then?

#include <stdio.h>

int main(void)
{
int x[][3] =
{
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};

printf("%d\n", x[1][-1]);

return 0;
}

x is an array, not a pointer. I believe there is nothing "undefined"
here.

I think that's actually a matter of some dispute. x[1] is a pointer
to an array of 3 ints, and x[1][-1] indexes into that array.
Conceivably an implementation could do bounds-checking on all array
indexing operations; since x[1][-1] is outside the bounds of the
3-element array being index the attempt to evaluate it could cause a
trap (or, more generally, undefined behavior).

The question is whether such an implementation would be conforming.
I offer no opinion on that question.

This is similar to the question of the "struct hack", which indexes
beyond the declared bounds of an array, but into memory that is known
to exist. I don't think the legality of the struct hack was ever
really settled (though it works on every implementation I've heard
of); C99 sidestepped the question by introducing flexible array
members.

(Of course we all know that we've gone far beyond the original
question; a strict bounds-checking implementation of comp.lang.c
would have required a new thread by now.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 23 '06 #14

Robert Gamble

Ben C wrote:

On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
Ben C wrote:
On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> [...] a negative subscript to an array will cause undefined
> behaviour. [...] Are you sure? int x[10];
int *y = x + 5;
y[-1] = 100;
...
y is not an array, it is a pointer.

What about this then?

#include <stdio.h>

int main(void)
{
int x[][3] =
{
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};

printf("%d\n", x[1][-1]);

Undefined behavior just as x[1][3] would be.
return 0;
}

x is an array, not a pointer. I believe there is nothing "undefined"
here.

Well, you believe wrong. In your example you are trying to access the
-1st element of an array (the array x[1]) and there is no such element,
trying to access an array element using an out of bounds index is not
defined.

Robert Gamble

Mar 23 '06 #15

Robert Gamble

Keith Thompson wrote:

Ben C <sp******@spam.eggs> writes:
On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
Ben C wrote:
On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> [...] a negative subscript to an array will cause undefined
> behaviour.

[...] Are you sure?

int x[10];
int *y = x + 5;
y[-1] = 100;
...

y is not an array, it is a pointer.

What about this then?

#include <stdio.h>

int main(void)
{
int x[][3] =
{
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};

printf("%d\n", x[1][-1]);

return 0;
}

x is an array, not a pointer. I believe there is nothing "undefined"
here.

I think that's actually a matter of some dispute.

It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16:

"For an array of arrays, the permitted pointer arithmetic in subclause
6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
of the word ``object'' as denoting the specific object determined
directly by the pointer's type and value, not other objects related to
that one by contiguity. Therefore, if an expression exceeds these
permissions, the behavior is undefined. For example, the following code
has undefined behavior:
int a[4][5];

a[1][7] = 0; /* undefined */
Some conforming implementations may choose to diagnose an ``array
bounds violation,'' while others may choose to interpret such attempted
accesses successfully with the ``obvious'' extended semantics."

The result of this question was to add the following to the
(informative) section G.2 which documents examples of undefined
behavior:

"An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.3.6)."

Robert Gamble

Mar 23 '06 #16

Keith Thompson

"Robert Gamble" <rg*******@gmail.com> writes:

Keith Thompson wrote:
Ben C <sp******@spam.eggs> writes: [...]
> What about this then?
>
> #include <stdio.h>
>
> int main(void)
> {
> int x[][3] =
> {
> {1, 2, 3},
> {4, 5, 6},
> {7, 8, 9}
> };
>
> printf("%d\n", x[1][-1]);
>
> return 0;
> }
>
> x is an array, not a pointer. I believe there is nothing "undefined"
> here.
I think that's actually a matter of some dispute.

It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16:

[snip] The result of this question was to add the following to the
(informative) section G.2 which documents examples of undefined
behavior:

"An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.3.6)."

Thanks. I'm sure I've read that, but I didn't remember the details.

The wording added to G.2 is in section J.2 in the C99 standard:

The behavior is undefined in the following circumstances:
[...]
-- An array subscript is out of range, even if an object is
apparently accessible with the given subscript (as in the
lvalue expression a[1][7] given the declaration int a[4][5])
(6.5.6).

I'm not entirely convinced that C99 6.5.6 couldn't be read to imply
that a[1][7] is valid, but J.2 makes the intent clear enough.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 23 '06 #17

santosh

Ben C wrote:

On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
Ben C wrote:
On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> [...] a negative subscript to an array will cause undefined
> behaviour. [...] Are you sure? int x[10];
int *y = x + 5;
y[-1] = 100;
...

y is not an array, it is a pointer.

What about this then?

#include <stdio.h>

int main(void)
{
int x[][3] =
{
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};

printf("%d\n", x[1][-1]);

return 0;
}

x is an array, not a pointer. I believe there is nothing "undefined"
here.

Well the index value of -1 is out of bounds. It might well point to
memory within the array, but if you have bounds checking enabled, it
will cause an exception.

Mar 23 '06 #18

ena8t8si

Keith Thompson wrote:

Ben C <sp******@spam.eggs> writes:
On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
Ben C wrote:
On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> [...] a negative subscript to an array will cause undefined
> behaviour.

[...] Are you sure?

int x[10];
int *y = x + 5;
y[-1] = 100;
...

y is not an array, it is a pointer.

What about this then?

#include <stdio.h>

int main(void)
{
int x[][3] =
{
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};

printf("%d\n", x[1][-1]);

return 0;
}

x is an array, not a pointer. I believe there is nothing "undefined"
here.

I think that's actually a matter of some dispute. x[1] is a pointer
to an array of 3 ints,

You mean x[1] is an array of 3 ints. In context x[1] does turn
into a pointer, but it turns into a pointer to int.

Mar 23 '06 #19

ena8t8si

Robert Gamble wrote:

Keith Thompson wrote:
Ben C <sp******@spam.eggs> writes:
On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
> Ben C wrote:
>> On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
>> > [...] a negative subscript to an array will cause undefined
>> > behaviour.

>> [...] Are you sure?

>> int x[10];
>> int *y = x + 5;
>> y[-1] = 100;
>> ...

> y is not an array, it is a pointer.

What about this then?

#include <stdio.h>

int main(void)
{
int x[][3] =
{
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};

printf("%d\n", x[1][-1]);

return 0;
}

x is an array, not a pointer. I believe there is nothing "undefined"
here.

I think that's actually a matter of some dispute.

It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16:

"For an array of arrays, the permitted pointer arithmetic in subclause
6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
of the word ``object'' as denoting the specific object determined
directly by the pointer's type and value, not other objects related to
that one by contiguity. Therefore, if an expression exceeds these
permissions, the behavior is undefined. For example, the following code
has undefined behavior:
int a[4][5];

a[1][7] = 0; /* undefined */
Some conforming implementations may choose to diagnose an ``array
bounds violation,'' while others may choose to interpret such attempted
accesses successfully with the ``obvious'' extended semantics."

The result of this question was to add the following to the
(informative) section G.2 which documents examples of undefined
behavior:

"An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.3.6)."

An easy/lazy/stupid response to the DR, resulting in an
easy/lazy/stupid statement in the standard.

void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;

/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;

/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;

/*4*/ p = a[1];
p[7] = 0;

/*5*/ (p = a[1])[7] = 0;

/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;

At what point in 1-7 does the behavior become undefined?
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."

Mar 23 '06 #20

Ben C

>> > What about this then?

>
> #include <stdio.h>
>
> int main(void)
> {
> int x[][3] =
> {
> {1, 2, 3},
> {4, 5, 6},
> {7, 8, 9}
> };
>
> printf("%d\n", x[1][-1]);
>
> return 0;
> }
>
> x is an array, not a pointer. I believe there is nothing "undefined"
> here.

I think that's actually a matter of some dispute.

It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16
[...]:

Most interesting. Thank you, I stand corrected!

Mar 23 '06 #21

Robert Gamble

en******@yahoo.com wrote:

Robert Gamble wrote:
Keith Thompson wrote:
Ben C <sp******@spam.eggs> writes:
> On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
>> Ben C wrote:
>>> On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
>>> > [...] a negative subscript to an array will cause undefined
>>> > behaviour.
>
>>> [...] Are you sure?
>
>>> int x[10];
>>> int *y = x + 5;
>>> y[-1] = 100;
>>> ...
>
>> y is not an array, it is a pointer.
>
> What about this then?
>
> #include <stdio.h>
>
> int main(void)
> {
> int x[][3] =
> {
> {1, 2, 3},
> {4, 5, 6},
> {7, 8, 9}
> };
>
> printf("%d\n", x[1][-1]);
>
> return 0;
> }
>
> x is an array, not a pointer. I believe there is nothing "undefined"
> here.

I think that's actually a matter of some dispute.
It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16:

"For an array of arrays, the permitted pointer arithmetic in subclause
6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
of the word ``object'' as denoting the specific object determined
directly by the pointer's type and value, not other objects related to
that one by contiguity. Therefore, if an expression exceeds these
permissions, the behavior is undefined. For example, the following code
has undefined behavior:
int a[4][5];

a[1][7] = 0; /* undefined */
Some conforming implementations may choose to diagnose an ``array
bounds violation,'' while others may choose to interpret such attempted
accesses successfully with the ``obvious'' extended semantics."

The result of this question was to add the following to the
(informative) section G.2 which documents examples of undefined
behavior:

"An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.3.6)."

An easy/lazy/stupid response to the DR, resulting in an
easy/lazy/stupid statement in the standard.

void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;

Looks okay.
/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;
Looks okay.
/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;
Looks okay.
/*4*/ p = a[1];
p[7] = 0;
p is a pointer to int, not an array so this should be okay.
/*5*/ (p = a[1])[7] = 0;
Same as #4 as far as I can tell.
/*6*/ (a[1])[7] = 0; /*7*/ a[1][7] = 0;
I don't see how these two are different but in both cases a[1] is an
array of 5 ints with and 7 is an out of range subscript. Definitely
not okay.
At what point in 1-7 does the behavior become undefined?
I would say at #6.
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."

In addition to this, the only rationale provided for the response to
the Defect Report is that implementations may choose to diagnose out of
bounds conditions. In your example, 1-5 uses the subscript operator on
a pointer, not an array so there should not be an issue as the elements
are still garaunteed to be contigious. In 6 and 7 you are using an
invalid subscript on an array object which is clearly undefined.

Robert Gamble

Mar 23 '06 #22

Chris Torek

In article <11**********************@j33g2000cwa.googlegroups .com>
<en******@yahoo.com> wrote:

void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;

/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;

/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;

/*4*/ p = a[1];
p[7] = 0;

/*5*/ (p = a[1])[7] = 0;

/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;

At what point in 1-7 does the behavior become undefined?
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."

Note: I am not sure whether I *agree* with the argument I am about
to present. I merely *present* it.

Someone -- I think Doug Gwyn -- said that the (or "an") aim of the
rules here is to allow a compiler to "cheat" by, internally, tagging
pointers with additional information about the size(s) of the
underlying object(s) from which the pointers are derived.

Imagine for a moment a machine in which "add integer value that
does not exceed 10 to pointer" is one million times faster than
"add any integer to pointer". Suppose that sizeof(int) is 2, so
that sizeof(a[0]) is 10. Suppose that the fast add gives the
"wrong" result if the integer is greater than 10 (presumably,
produces a sum that is smaller than the desired result).

Now, for (1) through (3) above, the compiler is probably forced to
use the million-times-slower addition to compute p[7]. In (4),
it probably still uses this. In (5), the compiler may have a little
more information, and by assignment 6, the compiler *definitely*
has more information.

In particular, in assignment 6, the compiler is allowed to "know"
that a[1] is an "array 5 of int" so that sizeof a[1] is 10. It
can therefore use the "fast add" to add 14 to the pointer, on the
assumption that the integer (14) must be 10-or-less (even though
it is not). This, of course, gives the "wrong" result, addressing
a[1][2] or a[1][3] or some such (maybe even halfway between the two).

According to this argument, the behavior becomes undefined
somewhere around assignment 5 or 6, and is definitely undefined
by assignment 7.

Obviously, real machines do not have instructions like "add restricted
integer not exceeding 10 to pointer" -- but they *do* have
range-restricted instructions. A compiler for a CPU like the 80186
might want to use 16-bit addition to address "small" arrays (with
segment:offset addressing), and 32-bit addition to address "large"
arrays (still with segment:offset but now using the full 20-bit
addressing mode, doing normalization and so on, instead of assuming
that the entire array fits in a single segment).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Mar 23 '06 #23

Keith Thompson

en******@yahoo.com writes:

Keith Thompson wrote:
Ben C <sp******@spam.eggs> writes: [...]
> int x[][3] =
> {
> {1, 2, 3},
> {4, 5, 6},
> {7, 8, 9}
> }; [...] > x is an array, not a pointer. I believe there is nothing "undefined"
> here.

I think that's actually a matter of some dispute. x[1] is a pointer
to an array of 3 ints,

You mean x[1] is an array of 3 ints. In context x[1] does turn
into a pointer, but it turns into a pointer to int.

You're right, thanks.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 23 '06 #24

pete

Chris Torek wrote:

Note: I am not sure whether I *agree* with the argument I am about
to present. I merely *present* it.

Is that an Alfred Hitchcock quote?

--
pete

Mar 23 '06 #25

Herbert Rosenau

On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
wrote:

Old Wolf wrote:
Pedro Graca wrote:
Should I make an effort to declare all char stuff as either signed or
unsigned? ... before it runs on a DS 9000 :)

Just make sure your code does not rely on chars being either
signed or unsigned.
If you need to rely on unsignedness (eg. an array of all possible
char values) then you should explicitly use unsigned chars.

Thank you for your answers.

Is it guaranteed that all characters available on some implementation
for which there is a standards compliant compiler are positive?

AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No. It is on the specific compiler if char defaults to signed or
unsigned. Beside that some compilers have a switch to declare the
default signness of char explicity, some may hafe a #pragma, some may
have both for that.

At least you can override the compiler default anyway by expicite
define any char object by 'unsigned char' or 'signed char'. I found
out that forcing the comiler to interprete 'char' as 'unsigned char'
anyway when it has to hold characters and define explicite 'signed
char' when char is using as short short int but not as holder for
characters.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!

Mar 24 '06 #26

Keith Thompson

"Herbert Rosenau" <os****@pc-rosenau.de> writes:

On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
wrote:

[...]

Is it guaranteed that all characters available on some implementation
for which there is a standards compliant compiler are positive?

AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No.

You are mistaken.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 24 '06 #27

Richard Bos

"Herbert Rosenau" <os****@pc-rosenau.de> wrote:

On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
wrote:
AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No. It is on the specific compiler if char defaults to signed or
unsigned.

Wrong. All characters in the basic execution character set must have a
positive value. '0' is a member of the basic execution charset. It must
therefore be above zero. If it also has the value 0xF0 and CHAR_BIT is
8, there's only one way for all the requirements to fit: plain char must
be unsigned.

Richard

Mar 24 '06 #28

Mark McIntyre

On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:

"Herbert Rosenau" <os****@pc-rosenau.de> wrote:
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
wrote:
> AFAICT, in EBCDIC the character '0' has value 0xF0.
> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> for conforming compilers?
>

No. It is on the specific compiler if char defaults to signed or
unsigned.

Wrong.

I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.

Mark McIntyre
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mar 24 '06 #29

Keith Thompson

Mark McIntyre <ma**********@spamcop.net> writes:

On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
wrote:

> AFAICT, in EBCDIC the character '0' has value 0xF0.
> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> for conforming compilers?
>
No. It is on the specific compiler if char defaults to signed or
unsigned.

Wrong.

I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.

Yes, but nobody was asking about the general case.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 24 '06 #30

Jordan Abel

On 2006-03-24, Keith Thompson <ks***@mib.org> wrote:

Mark McIntyre <ma**********@spamcop.net> writes:
On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:

On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
wrote:

> AFAICT, in EBCDIC the character '0' has value 0xF0.
> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> for conforming compilers?
>
No. It is on the specific compiler if char defaults to signed or
unsigned.

Wrong.

I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.

Yes, but nobody was asking about the general case.

Many people interpreted Pedro Graca's question as talking about the
general case - i.e. "If 0 _can_ have the high bit set, then is it the
case that char _must_ be unsigned?"

Mar 25 '06 #31

Pedro Graca

Jordan Abel wrote:

Keith Thompson
Mark McIntyre
Richard Bos
Herbert Rosenau
> Pedro Graca
> > AFAICT, in EBCDIC the character '0' has value 0xF0.
> > Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> > for conforming compilers?

[snip different answers for different interpretations of the question
and a big mess of people telling each other everyone is wrong.]

Shortly after people started answering the question I realized it didn't
come out as specific as I meant. However the discussion was (has been)
very enjoyable and I didn't want to interrupt by making a clearer
question. As a wannabe "pedantic" I apologize for my lack of pedantism.

I want to thank you all (again) for your comments and time spent on
this.
*Thank you*

Please carry on with the discussion :)

--
If you're posting through Google read <http://cfaj.freeshell.org/google>

Mar 25 '06 #32

ena8t8si

Robert Gamble wrote:

en******@yahoo.com wrote:
Robert Gamble wrote:
Keith Thompson wrote:
> Ben C <sp******@spam.eggs> writes:
> > On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
> >> Ben C wrote:
> >>> On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> >>> > [...] a negative subscript to an array will cause undefined
> >>> > behaviour.
> >
> >>> [...] Are you sure?
> >
> >>> int x[10];
> >>> int *y = x + 5;
> >>> y[-1] = 100;
> >>> ...
> >
> >> y is not an array, it is a pointer.
> >
> > What about this then?
> >
> > #include <stdio.h>
> >
> > int main(void)
> > {
> > int x[][3] =
> > {
> > {1, 2, 3},
> > {4, 5, 6},
> > {7, 8, 9}
> > };
> >
> > printf("%d\n", x[1][-1]);
> >
> > return 0;
> > }
> >
> > x is an array, not a pointer. I believe there is nothing "undefined"
> > here.
>
> I think that's actually a matter of some dispute.

It might have been in 1992, I think that DR #17 made it pretty clear
that this is undefined behavior. Quote the response to question #16:

"For an array of arrays, the permitted pointer arithmetic in subclause
6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
of the word ``object'' as denoting the specific object determined
directly by the pointer's type and value, not other objects related to
that one by contiguity. Therefore, if an expression exceeds these
permissions, the behavior is undefined. For example, the following code
has undefined behavior:
int a[4][5];

a[1][7] = 0; /* undefined */
Some conforming implementations may choose to diagnose an ``array
bounds violation,'' while others may choose to interpret such attempted
accesses successfully with the ``obvious'' extended semantics."

The result of this question was to add the following to the
(informative) section G.2 which documents examples of undefined
behavior:

"An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.3.6)."

An easy/lazy/stupid response to the DR, resulting in an
easy/lazy/stupid statement in the standard.

void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;

Looks okay.
/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;

Looks okay.
/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;

Looks okay.
/*4*/ p = a[1];
p[7] = 0;

p is a pointer to int, not an array so this should be okay.
/*5*/ (p = a[1])[7] = 0;

Same as #4 as far as I can tell.
/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;

I don't see how these two are different but in both cases a[1] is an
array of 5 ints with and 7 is an out of range subscript. Definitely
not okay.
At what point in 1-7 does the behavior become undefined?

I would say at #6.

The type and value at #6 is the same as the type and value
at #5, #4, #3, #2 and #1.

Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."

In addition to this, the only rationale provided for the response to
the Defect Report is that implementations may choose to diagnose out of
bounds conditions. In your example, 1-5 uses the subscript operator on
a pointer, not an array so there should not be an issue as the elements
are still garaunteed to be contigious. In 6 and 7 you are using an
invalid subscript on an array object which is clearly undefined.

I suggest you re-familiarize yourself with the rules for
array conversion and the Semantics paragraphs for subscript
operators. All the subscript operators above have pointer
operands. The [] operator doesn't work on array operands,
only on pointer operands.

Mar 25 '06 #33

ena8t8si

Chris Torek wrote:

In article <11**********************@j33g2000cwa.googlegroups .com>
<en******@yahoo.com> wrote:
void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;

/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;

/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;

/*4*/ p = a[1];
p[7] = 0;

/*5*/ (p = a[1])[7] = 0;

/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;

At what point in 1-7 does the behavior become undefined?
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."

Note: I am not sure whether I *agree* with the argument I am about
to present. I merely *present* it.

Someone -- I think Doug Gwyn -- said that the (or "an") aim of the
rules here is to allow a compiler to "cheat" by, internally, tagging
pointers with additional information about the size(s) of the
underlying object(s) from which the pointers are derived.

Imagine for a moment a machine in which "add integer value that
does not exceed 10 to pointer" is one million times faster than
"add any integer to pointer". Suppose that sizeof(int) is 2, so
that sizeof(a[0]) is 10. Suppose that the fast add gives the
"wrong" result if the integer is greater than 10 (presumably,
produces a sum that is smaller than the desired result).

Now, for (1) through (3) above, the compiler is probably forced to
use the million-times-slower addition to compute p[7]. In (4),
it probably still uses this. In (5), the compiler may have a little
more information, and by assignment 6, the compiler *definitely*
has more information.

In particular, in assignment 6, the compiler is allowed to "know"
that a[1] is an "array 5 of int" so that sizeof a[1] is 10. It
can therefore use the "fast add" to add 14 to the pointer, on the
assumption that the integer (14) must be 10-or-less (even though
it is not). This, of course, gives the "wrong" result, addressing
a[1][2] or a[1][3] or some such (maybe even halfway between the two).

According to this argument, the behavior becomes undefined
somewhere around assignment 5 or 6, and is definitely undefined
by assignment 7. ...

My complaint is that it's unclear when there is undefined
behavior and when there isn't. It might be a good idea to
define only "in bounds" array accesses, or it might not,
but the question here is which accesses are defined and
which aren't. The "clarifying text" in appendix J states
informative information that is _not derivable_ from normative
information, and _still_ isn't particularly clarifying about
what is allowed and what isn't. I just don't think the
committee did a very good job (a) of deciding which accesses
they wanted to leave undefined, or (b) of communicating what
their decision for (a) was. My fear is that both (a) and (b)
apply.

The argument you present is reasonable, except for the
conclusion about where undefinedness happens. All of 1
through 7 have the same type and value. If which object
is relevant is "determined directly by the _pointer's_
type and value", and all the pointers have the same type
and value, how can some be defined and some be undefined?

Mar 25 '06 #34

Robert Gamble

en******@yahoo.com wrote:

Robert Gamble wrote:
en******@yahoo.com wrote:
Robert Gamble wrote:
> Keith Thompson wrote:
> > Ben C <sp******@spam.eggs> writes:
> > > On 2006-03-22, Robert Gamble <rg*******@gmail.com> wrote:
> > >> Ben C wrote:
> > >>> On 2006-03-22, Old Wolf <ol*****@inspire.net.nz> wrote:
> > >>> > [...] a negative subscript to an array will cause undefined
> > >>> > behaviour.
> > >
> > >>> [...] Are you sure?
> > >
> > >>> int x[10];
> > >>> int *y = x + 5;
> > >>> y[-1] = 100;
> > >>> ...
> > >
> > >> y is not an array, it is a pointer.
> > >
> > > What about this then?
> > >
> > > #include <stdio.h>
> > >
> > > int main(void)
> > > {
> > > int x[][3] =
> > > {
> > > {1, 2, 3},
> > > {4, 5, 6},
> > > {7, 8, 9}
> > > };
> > >
> > > printf("%d\n", x[1][-1]);
> > >
> > > return 0;
> > > }
> > >
> > > x is an array, not a pointer. I believe there is nothing "undefined"
> > > here.
> >
> > I think that's actually a matter of some dispute.
>
> It might have been in 1992, I think that DR #17 made it pretty clear
> that this is undefined behavior. Quote the response to question #16:
>
> "For an array of arrays, the permitted pointer arithmetic in subclause
> 6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
> of the word ``object'' as denoting the specific object determined
> directly by the pointer's type and value, not other objects related to
> that one by contiguity. Therefore, if an expression exceeds these
> permissions, the behavior is undefined. For example, the following code
> has undefined behavior:
> int a[4][5];
>
> a[1][7] = 0; /* undefined */
> Some conforming implementations may choose to diagnose an ``array
> bounds violation,'' while others may choose to interpret such attempted
> accesses successfully with the ``obvious'' extended semantics."
>
> The result of this question was to add the following to the
> (informative) section G.2 which documents examples of undefined
> behavior:
>
> "An array subscript is out of range, even if an object is apparently
> accessible with the given subscript (as in the lvalue expression
> a[1][7] given the declaration int a[4][5]) (6.3.6)."

An easy/lazy/stupid response to the DR, resulting in an
easy/lazy/stupid statement in the standard.

void *v;
int *p;
int a[4][5];

/*1*/ v = &a;
p = (int*)((char*)v + 5 * sizeof(int));
p[7] = 0;

Looks okay.
/*2*/ v = &a;
p = (int*)v + 5;
p[7] = 0;

Looks okay.
/*3*/ v = a;
p = (int*)v + 5;
p[7] = 0;

Looks okay.
/*4*/ p = a[1];
p[7] = 0;

p is a pointer to int, not an array so this should be okay.
/*5*/ (p = a[1])[7] = 0;

Same as #4 as far as I can tell.
/*6*/ (a[1])[7] = 0;

/*7*/ a[1][7] = 0;

I don't see how these two are different but in both cases a[1] is an
array of 5 ints with and 7 is an out of range subscript. Definitely
not okay.
At what point in 1-7 does the behavior become undefined?

I would say at #6.

The type and value at #6 is the same as the type and value
at #5, #4, #3, #2 and #1.
Remember, the DR says that "object" means "the specific object
determined directly by the pointer's _type_ and _value_."

In addition to this, the only rationale provided for the response to
the Defect Report is that implementations may choose to diagnose out of
bounds conditions. In your example, 1-5 uses the subscript operator on
a pointer, not an array so there should not be an issue as the elements
are still garaunteed to be contigious. In 6 and 7 you are using an
invalid subscript on an array object which is clearly undefined.

I suggest you re-familiarize yourself with the rules for
array conversion and the Semantics paragraphs for subscript
operators. All the subscript operators above have pointer
operands. The [] operator doesn't work on array operands,
only on pointer operands.

Right, the array decays into a pointer before the subscript operator is
applied. My point is that the object in #6 and #7 from which that
pointer was directly derived was an array. My interpretation is that
only in the case where the pointer for which the subscript operator is
being applied is immediately and directly the result of array decay is
the behavior undefined. I believe that this was the intent based on
the rationale but confess that the wording is not neccessarily concrete
enough to clearly and concisely delineate the exact point in your
example in which the behavior becomes undefined. I can see a possible
argument for UB to be invoked in earlier examples. I would strongly
suggest that you present your original question and examples to the
folks at comp.std.c where you are much more likely to receive an answer
from someone like Doug Gwyn who was directly involved in the response
to that DR.

Robert Gamble

Mar 26 '06 #35

Herbert Rosenau

On Fri, 24 Mar 2006 06:58:16 UTC, Keith Thompson <ks***@mib.org>
wrote:

"Herbert Rosenau" <os****@pc-rosenau.de> writes:
On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
wrote:

[...]
Is it guaranteed that all characters available on some implementation
for which there is a standards compliant compiler are positive?

AFAICT, in EBCDIC the character '0' has value 0xF0.
Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
for conforming compilers?

No.

You are mistaken.

EBCDIC, ASCII or what charset is even undefined by standard C:
Assuming a specific charset and then talking about signed char will
fail in some cases as the standard does no assumption on anything than
digigits. May be your compiler guarantees unsigned on an char
containing 'E0' but char may always be signed and 'E0' holds an
negative sign when CHAR_BITS == 8.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!

Mar 26 '06 #36

Herbert Rosenau

On Fri, 24 Mar 2006 19:56:13 UTC, Keith Thompson <ks***@mib.org>
wrote:

Mark McIntyre <ma**********@spamcop.net> writes:
On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:

On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
wrote:

> AFAICT, in EBCDIC the character '0' has value 0xF0.
> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
> for conforming compilers?
>
No. It is on the specific compiler if char defaults to signed or
unsigned.

Wrong.

I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.

Yes, but nobody was asking about the general case.

We're discussing standard C here, not EBCDIC C. What is char 'E5'? Is
it a letter? Is it a number? It is some bitpattern. Without knowledge
of the signness of the char holding it, it may be either signed or
unsigned at all.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!

Mar 26 '06 #37

Keith Thompson

"Herbert Rosenau" <os****@pc-rosenau.de> writes:

On Fri, 24 Mar 2006 06:58:16 UTC, Keith Thompson <ks***@mib.org>
wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> writes:
> On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
> wrote:

[...]
>> Is it guaranteed that all characters available on some implementation
>> for which there is a standards compliant compiler are positive?
>>
>> AFAICT, in EBCDIC the character '0' has value 0xF0.
>> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
>> for conforming compilers?
>>
> No.

You are mistaken.

EBCDIC, ASCII or what charset is even undefined by standard C:
Assuming a specific charset and then talking about signed char will
fail in some cases as the standard does no assumption on anything than
digigits. May be your compiler guarantees unsigned on an char
containing 'E0' but char may always be signed and 'E0' holds an
negative sign when CHAR_BITS == 8.

I read Pedro's question as being based on an assumption of CHAR_BIT==8
*and* an EBCDIC character set. I'm no longer sure that that was
Pedro's intent.

If CHAR_BIT==8, plain char may be either signed or unsigned.

If CHAR_BIT==8 and '0'==0xF0 (as it is in ECBDIC), then plain char
must be unsigned.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 26 '06 #38

Michael Mair

Herbert Rosenau schrieb:

On Fri, 24 Mar 2006 19:56:13 UTC, Keith Thompson <ks***@mib.org>
wrote:

Mark McIntyre <ma**********@spamcop.net> writes:
On Fri, 24 Mar 2006 07:54:47 GMT, in comp.lang.c ,
rl*@hoekstra-uitgeverij.nl (Richard Bos) wrote:
"Herbert Rosenau" <os****@pc-rosenau.de> wrote:
>On Wed, 22 Mar 2006 11:17:02 UTC, Pedro Graca <he****@dodgeit.com>
>wrote:
>
>
>> AFAICT, in EBCDIC the character '0' has value 0xF0.
>> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
>> for conforming compilers?
>>
>
>No. It is on the specific compiler if char defaults to signed or
>unsigned.

Wrong.

I guess we're all answering parts of the question.
IF CHAR_BIT == 8
AND '0' == 0xF0
THEN for this particular implementation to conform, char must be
unsigned.

However, the general case is precisely as Herbert wrote.

Yes, but nobody was asking about the general case.

We're discussing standard C here, not EBCDIC C. What is char 'E5'? Is
it a letter? Is it a number? It is some bitpattern. Without knowledge
of the signness of the char holding it, it may be either signed or
unsigned at all.

Nope. We are comparing an int constant ('0') with an unsigned int
constant (0x...). If we have CHAR_BIT == 8, then "knowledge about
the signedness" of the underlying char does not play any role, as
int has at least 16 bits, i.e. the values cannot compare equal if
they are not equal. In this case, if a value >= 0x80 compares equal
to a character from the basic source character set and there are no
conversions in between (such as casting both values to unsigned
char), then char must be an unsigned integer type.

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Mar 26 '06 #39

Pedro Graca

Keith Thompson wrote:

Herbert Rosenau
Keith Thompson
Herbert Rosenau
> Pedro Graca
>> Is it guaranteed that all characters available on some implementation
>> for which there is a standards compliant compiler are positive?
>>
>> AFAICT, in EBCDIC the character '0' has value 0xF0.
>> Assuming CHAR_BIT is 8 does it follow that plain char is unsigned
>> for conforming compilers?

I read Pedro's question as being based on an assumption of CHAR_BIT==8
*and* an EBCDIC character set. I'm no longer sure that that was
Pedro's intent.

Yes, that was my intent.

The "main question" was incomplete (better one follows)
: Is it guaranteed that all characters in the basic execution set
: defined by the Standard are positive?

and the example was poorly worded (better one follows)
: AFAICT, in EBCDIC the character '0' has value 0xF0.
: For an implementation with EBCDIC for basic character set
: *and* CHAR_BIT == 8 does it follow that plain char is unsigned
: for /that/ particular implementation?
I wanted to make sure the following program would not invoke UB (by
trying to access a array with a negative index in line 22), no matter
the implementation it runs on

#include <assert.h>
#include <stdio.h>
int main(void) {
char test[] = '9012';
char charval['9'+1];
char *p;
int sum = 0;
/* assumes INT_MAX >= 9012; but I think the standard mandates
* int be at least 32767 */
assert(INT_MAX >= 9012);

charval['0'] = 0;
charval['1'] = 1;
charval['2'] = 2;
charval['9'] = 9;
while (*p) {
sum *= 10;
sum += charval[*p]; /* line 22 */
p++;
}
printf("%d\n", sum); /* print 9012 */
return 0;
}
If CHAR_BIT==8, plain char may be either signed or unsigned.

If CHAR_BIT==8 and '0'==0xF0 (as it is in ECBDIC), then plain char
must be unsigned.

I realize that no matter what character set the implementation defines
or what CHAR_BIT is for the implementation, *all* characters in the
basic set defined by the Standard (digits, lowercase and uppercase
letters, <tab>, <newline>, and a few signs, ...) must be positive.
There is no guarantee for characters outside this set:

char *p = "Pedro Graça";
int i=0;
while (*p) {
charval[*p] = i++; /* possible BANG! for 'ç' */
++p;
}

--
If you're posting through Google read <http://cfaj.freeshell.org/google>

Mar 27 '06 #40

Keith Thompson

Pedro Graca <he****@dodgeit.com> writes:

Keith Thompson wrote:

[...]

If CHAR_BIT==8, plain char may be either signed or unsigned.

If CHAR_BIT==8 and '0'==0xF0 (as it is in ECBDIC), then plain char
must be unsigned.

I realize that no matter what character set the implementation defines
or what CHAR_BIT is for the implementation, *all* characters in the
basic set defined by the Standard (digits, lowercase and uppercase
letters, <tab>, <newline>, and a few signs, ...) must be positive.
There is no guarantee for characters outside this set:

char *p = "Pedro Graça";
int i=0;
while (*p) {
charval[*p] = i++; /* possible BANG! for 'ç' */
++p;
}

And even for characters within the basic set, it's not a guarantee
that I'd want to depend on. There's no problem in the above code if I
use my own name rather than yours, but I'd still want to make sure it
works for any possible character value, probably by casting to
unsigned char. It's too easy to change the code to use a different
string literal or to get its data from somewhere else, and it's easier
to avoid the situation altogether than to document the assumption.

Even seemingly innocuous characters like '$', I think, are outside the
basic characater set; I'd be astonished to see an implementation where
'$'<0, but I'll probably run into one at the most inconvenient
possible moment.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Mar 27 '06 #41

Jordan Abel

Note: No context quoted because I'm replying to the actual issue the
thread brings up rather than to any particular post

What all this is missing is that it's silly to warn on an array
subscript of type char when you don't warn on one of type signed int.

Idea -- magic safe macro for isalpha:

#define ISALPHA(x) isalpha(sizeof(x)==1?(unsigned char)(x):(x))

Mar 27 '06 #42

Michael Mair

Jordan Abel schrieb:

Note: No context quoted because I'm replying to the actual issue the
thread brings up rather than to any particular post

What all this is missing is that it's silly to warn on an array
subscript of type char when you don't warn on one of type signed int.

Idea -- magic safe macro for isalpha:

#define ISALPHA(x) isalpha(sizeof(x)==1?(unsigned char)(x):(x))

Hmmm. Nice until x is something with sideeffects like, say,
"c = getchar()".

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Mar 27 '06 #43

Mark McIntyre

On Sun, 26 Mar 2006 20:59:16 +0000 (UTC), in comp.lang.c , "Herbert
Rosenau" <os****@pc-rosenau.de> wrote:

On Fri, 24 Mar 2006 19:56:13 UTC, Keith Thompson <ks***@mib.org>
wrote:
Mark McIntyre <ma**********@spamcop.net> writes:
> However, the general case is precisely as Herbert wrote.

Yes, but nobody was asking about the general case.

And frequently, we correct people's spelling, and fix bugs they didn't
spot, even though they didn't ask about that either.
We're discussing standard C here, not EBCDIC C.

I agree. But IMHO, and ICBW, you (and initially I) misunderstood the
thread to be about the general case when it was actually about a
specific one and thus technically offtopic. No harm done, either way.
Mark McIntyre
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mar 27 '06 #44

Jordan Abel

On 2006-03-27, Michael Mair <Mi**********@invalid.invalid> wrote:

Jordan Abel schrieb:
Note: No context quoted because I'm replying to the actual issue the
thread brings up rather than to any particular post

What all this is missing is that it's silly to warn on an array
subscript of type char when you don't warn on one of type signed int.

Idea -- magic safe macro for isalpha:

#define ISALPHA(x) isalpha(sizeof(x)==1?(unsigned char)(x):(x))

Hmmm. Nice until x is something with sideeffects like, say,
"c = getchar()".

That's why it's uppercase. To warn you.

Mar 27 '06 #45

Michael Mair

Jordan Abel schrieb:

On 2006-03-27, Michael Mair <Mi**********@invalid.invalid> wrote:
Jordan Abel schrieb:
Note: No context quoted because I'm replying to the actual issue the
thread brings up rather than to any particular post

What all this is missing is that it's silly to warn on an array
subscript of type char when you don't warn on one of type signed int.

Idea -- magic safe macro for isalpha:

#define ISALPHA(x) isalpha(sizeof(x)==1?(unsigned char)(x):(x))

Hmmm. Nice until x is something with sideeffects like, say,
"c = getchar()".

That's why it's uppercase. To warn you.

For functionality as basic as this, I do not trust anyone
to use it consistently correctly -- including me. I remember
a then-colleague abusing a macro with the words "Oh, it's
from XY -- he surely did something clever"... ;-(

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Mar 28 '06 #46

Jordan Abel

On 2006-03-28, Michael Mair <Mi**********@invalid.invalid> wrote:

Jordan Abel schrieb:
On 2006-03-27, Michael Mair <Mi**********@invalid.invalid> wrote:
Jordan Abel schrieb:

Note: No context quoted because I'm replying to the actual issue the
thread brings up rather than to any particular post

What all this is missing is that it's silly to warn on an array
subscript of type char when you don't warn on one of type signed int.

Idea -- magic safe macro for isalpha:

#define ISALPHA(x) isalpha(sizeof(x)==1?(unsigned char)(x):(x))

Hmmm. Nice until x is something with sideeffects like, say,
"c = getchar()".

That's why it's uppercase. To warn you.

For functionality as basic as this, I do not trust anyone
to use it consistently correctly -- including me. I remember
a then-colleague abusing a macro with the words "Oh, it's
from XY -- he surely did something clever"... ;-(

as it turns out, i _did_ do something clever. x is evaluated only once.

--
look closer...

Mar 28 '06 #47

pete

Jordan Abel wrote:

Note: No context quoted because I'm replying to the actual issue the
thread brings up rather than to any particular post

What all this is missing is that it's silly to warn on an array
subscript of type char when you don't warn on one of type signed int.

Idea -- magic safe macro for isalpha:

#define ISALPHA(x) isalpha(sizeof(x)==1?(unsigned char)(x):(x))

I don't understand what that macro is for, or how you would use it.
If you would cast a byte sized type to unsigned char,
why wouldn't you cast a larger type to unsigned char?

--
pete

Mar 28 '06 #48

CBFalconer

Jordan Abel wrote:

On 2006-03-28, Michael Mair <Mi**********@invalid.invalid> wrote:
Jordan Abel schrieb:
On 2006-03-27, Michael Mair <Mi**********@invalid.invalid> wrote:
Jordan Abel schrieb:

> Note: No context quoted because I'm replying to the actual
> issue the thread brings up rather than to any particular post
>
> What all this is missing is that it's silly to warn on an
> array subscript of type char when you don't warn on one of type
> signed int.
>
> Idea -- magic safe macro for isalpha:
>
> #define ISALPHA(x) isalpha(sizeof(x)==1?(unsigned char)(x):(x))

Hmmm. Nice until x is something with sideeffects like, say,
"c = getchar()".

That's why it's uppercase. To warn you.

For functionality as basic as this, I do not trust anyone
to use it consistently correctly -- including me. I remember
a then-colleague abusing a macro with the words "Oh, it's
from XY -- he surely did something clever"... ;-(

as it turns out, i _did_ do something clever. x is evaluated
only once.

There you go - somebody finally noticed.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Mar 28 '06 #49

Jordan Abel

On 2006-03-28, pete <pf******@mindspring.com> wrote:

Jordan Abel wrote:

Note: No context quoted because I'm replying to the actual issue the
thread brings up rather than to any particular post

What all this is missing is that it's silly to warn on an array
subscript of type char when you don't warn on one of type signed int.

Idea -- magic safe macro for isalpha:

#define ISALPHA(x) isalpha(sizeof(x)==1?(unsigned char)(x):(x))

I don't understand what that macro is for, or how you would use it.
If you would cast a byte sized type to unsigned char,
why wouldn't you cast a larger type to unsigned char?

Because the larger type is likely to be an int which is safe to use
since it will be valued either EOF or between 0 and UCHAR_MAX [e.g. the
result from getc] the issue is that a char value could be a negative
number other than EOF.

Mar 28 '06 #50