469,930 Members | 1,722 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,930 developers. It's quick & easy.

using character as array subscript

Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?

Thanks,
Ivan
Jun 27 '08 #1
13 3300
"Ivan" <iv**@novickmail.comwrote in message
news:e1**********************************@i18g2000 prn.googlegroups.com...
Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?
MSVC++ 2008 express isn't complaining and compiles that code fine, not even
a warning. It is well defined behavior as long as the type of your native
char is unsigned 8 bit byte.

On my system if I
std::cout << typeid('a').name() < "\n";
I get the output of
char

Not unsigned char. That may produce some undefined behavior for you if you
attempt to work with characters that would be above 127 as a byte, they
might show up negative.
Jun 27 '08 #2
Jim Langston wrote:
"Ivan" <iv**@novickmail.comwrote in message
news:e1**********************************@i18g2000 prn.googlegroups.com...
>Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?

MSVC++ 2008 express isn't complaining and compiles that code fine, not even
a warning. It is well defined behavior as long as the type of your native
char is unsigned 8 bit byte.

On my system if I
std::cout << typeid('a').name() < "\n";
I get the output of
char

Not unsigned char. That may produce some undefined behavior for you if you
attempt to work with characters that would be above 127 as a byte, they
might show up negative.

Is it well defined? I thought it would depend on the character encoding
used, such as ASCII vs EBCDIC. Or does the standard actually specify
char encoding now?

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Jun 27 '08 #3
On Mon, 16 Jun 2008 18:53:05 -0700, Daniel Pitts
<ne******************@virtualinfinity.netwrote in comp.lang.c++:
Jim Langston wrote:
"Ivan" <iv**@novickmail.comwrote in message
news:e1**********************************@i18g2000 prn.googlegroups.com...
Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?
MSVC++ 2008 express isn't complaining and compiles that code fine, not even
a warning. It is well defined behavior as long as the type of your native
char is unsigned 8 bit byte.

On my system if I
std::cout << typeid('a').name() < "\n";
I get the output of
char

Not unsigned char. That may produce some undefined behavior for you if you
attempt to work with characters that would be above 127 as a byte, they
might show up negative.
Is it well defined? I thought it would depend on the character encoding
used, such as ASCII vs EBCDIC. Or does the standard actually specify
char encoding now?
No, the standard does not specify execution character set. Or source
character set, for that matter. That's exactly why it is more
portable to use the actual characters, rather than their numerical
value in a particular character set.

In fact, the OP's code could well be part of a beginner's assignment
to generate a histogram of characters in some input data.

This is guaranteed to produce the correct hex digit character for the
lowest nibble of an unsigned int regardless of the character set:

char hex[] = "0123456789ABCDEF";

char hex_digit(unsigned int x)
{
return hex [x & 0xf];
}

....if you change the definition of the array to:

char hex [17] = { 48, 48, /*... */ 69, 70, 0 };

....then you get exactly the same array and result on an ASCII
implementation, and gibberish on any other execution character set.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
Jun 27 '08 #4
On Jun 17, 12:48 am, Ivan <i...@novickmail.comwrote:
What is the best syntax to use a char to index into an array.
It depends.
///////////////////////////////////
For example
int data[256];
data['a'] = 1;
data['b'] = 1;
///////////////////////////////////
gcc is complaining about this syntax, so i am using static
cast on the character literal. Is there a better way to do
this?
It depends on the context.

First, this is a warning; you can turn it off, or ignore it. In
fact, it is a legitimate warning unless you've taken adequate
precautions; a char may have negative values. (But then, so may
an int. Logically, g++ shouldn't warn unless the size of the
array is such that not all entries can be reached by a char, and
not in the case of a character literal, in any case. But in
fact, it does always warn, unless you turn that warning off.)

The first case is when the array will normally be indexed by an
int, and you're just using character literals during
initialization; if the only indexation by a char is with a
character literal, you can simply ignore the warning. (Note
that this is a more or less usual idiom: you read the array with
a return value of istream::get(), for example, after having
checked for EOF.)

If you really do want to index with arbitrary characters, there
are three solutions:

1. If portability isn't a large concern, you can just compile
with -funsigned-char. This should really be the default,
but there are historical reasons which mean that it isn't.
Other compilers also have such an option. (It's /J for
VC++, I think.) If you're certain that you'll never have to
port to a compiler without this option, you can just use it,
and be assured that plain char is unsigned.

In this case, you'll still have to turn off the warning from
g++. (IMHO, the warning, as it is currently implemented, is
stupid. If they want to warn, it would be more reasonable
to warn when the type of the index cannot encompass all of
the possible index values, and only if the value is not a
constant.)

2. Otherwise, you can cast to unsigned_char anytime you use a
char as an index.

3. Or, you can rearrange the array, and use character -
CHAR_MIN as an index.

In the latter two cases, I'd wrap the array in a class which
took care of the "correction" of the index.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #5
Ivan wrote:
For example
int data[256];
data['a'] = 1;
data['b'] = 1;
///////////////////////////////////
gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?
Which gcc? From your example, I assumed:

int data[256];

int main()
{
data['a'] = 1;
data['b'] = 1;
return 0;
}

Compiled as C++ There was not a single warning in:
g++-4.3 (-Wall -pedantic)
mingw-gcc-3.4.1
icpc (intel CC 10.1)

Maybe you made another mistake not
shown in your incomplete excerpt.

Regards

Mirco
Jun 27 '08 #6
Jack Klein wrote:
On Mon, 16 Jun 2008 18:53:05 -0700, Daniel Pitts
<ne******************@virtualinfinity.netwrote in comp.lang.c++:
>Jim Langston wrote:
>>"Ivan" <iv**@novickmail.comwrote in message
news:e1**********************************@i18g20 00prn.googlegroups.com...
Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?
MSVC++ 2008 express isn't complaining and compiles that code fine, not even
a warning. It is well defined behavior as long as the type of your native
char is unsigned 8 bit byte.

On my system if I
std::cout << typeid('a').name() < "\n";
I get the output of
char

Not unsigned char. That may produce some undefined behavior for you if you
attempt to work with characters that would be above 127 as a byte, they
might show up negative.

Is it well defined? I thought it would depend on the character encoding
used, such as ASCII vs EBCDIC. Or does the standard actually specify
char encoding now?

No, the standard does not specify execution character set. Or source
character set, for that matter. That's exactly why it is more
portable to use the actual characters, rather than their numerical
value in a particular character set.

In fact, the OP's code could well be part of a beginner's assignment
to generate a histogram of characters in some input data.

This is guaranteed to produce the correct hex digit character for the
lowest nibble of an unsigned int regardless of the character set:

char hex[] = "0123456789ABCDEF";

char hex_digit(unsigned int x)
{
return hex [x & 0xf];
}
You're example only addresses the *converse* of my point, and therefor
doesn't have any connection to the validity of my point.
>
....if you change the definition of the array to:

char hex [17] = { 48, 48, /*... */ 69, 70, 0 };

....then you get exactly the same array and result on an ASCII
implementation, and gibberish on any other execution character set.
Right, but using 'a' as an index into an array could be a different
index on different compilers. considering that char could be signed and
negative, you could have serious consequences.

Granted, this isn't a problem in practice, but its not portable that
foo['a'] = 1 should do something specific.

Now, if you were to get specific with vendor/platform, thats a different
question.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Jun 27 '08 #7
In article <48***********************@newsrazor.net>,
ne******************@virtualinfinity.net says...

[ ... ]
Right, but using 'a' as an index into an array could be a different
index on different compilers. considering that char could be signed and
negative, you could have serious consequences.

Granted, this isn't a problem in practice, but its not portable that
foo['a'] = 1 should do something specific.
That depends on what you mean by something specific. Basically, the
behavior is unspecified, but NOT undefined. In particular, the C++
standard specifies a basic execution character set that includes the
usual English letters, base-10 digits, etc. and requires that all those
characters have non-negative values. Since the 'a' in your expression
must be non-negative, it has defined results if (for example) foo has
been defined something like 'int foo[UCHAR_MAX];'

It's certainly true that you could encounter characters whose encoding
is negative, but this isn't one of them.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #8
On Jun 17, 6:58 pm, Daniel Pitts
<newsgroup.spamfil...@virtualinfinity.netwrote:

[...]
Right, but using 'a' as an index into an array could be a
different index on different compilers.
Which, presumably, is what is wanted. You don't want the entry
corresponding to 97 (or whatever); you want the entry
corresponding to the encoding for the character 'a' on the
platform in question.
considering that char could be signed and negative, you could
have serious consequences.
That's the real problem. The OP had an array "int x[ 256 ] ;";
indexing it with a char could definitely be a problem (and
logically, it probably should be "int x[ UCHAR_MAX + 1 ] ;").
But of course, we (and g++) don't know whether he intends to
index it with a char, or with a char cast to unsigned char, or
with an int, return value from istream::get() or fgetc(). And
'a' *is* guaranteed to be positive, and in the range
0...UCHAR_MAX.
Granted, this isn't a problem in practice, but its not
portable that foo['a'] = 1 should do something specific.
Except that the language standard says that it does something
very specific, and very useful. Issuing a warning in this case
is simply brain
damage.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #9
On Jun 17, 7:27 pm, Jerry Coffin <jcof...@taeus.comwrote:
In article <4857ed5a$0$12713$7836c...@newsrazor.net>,
newsgroup.spamfil...@virtualinfinity.net says...
[ ... ]
Right, but using 'a' as an index into an array could be a
different index on different compilers. considering that
char could be signed and negative, you could have serious
consequences.
Granted, this isn't a problem in practice, but its not
portable that foo['a'] = 1 should do something specific.
That depends on what you mean by something specific.
Basically, the behavior is unspecified, but NOT undefined.
The behavior is exactly specified (or at least, as specified as
anything else in C++). You index the array with the value
corresponding to the encoding of a small a in the native
character encoding. If the goal is to index the entry
corresponding to the encoding of a small a, this is the only
correct and specified way of doing it.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #10
On Jun 17, 12:00 pm, Mirco Wahab <wa...@chemie.uni-halle.dewrote:
Ivan wrote:
For example
int data[256];
data['a'] = 1;
data['b'] = 1;
///////////////////////////////////
gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?
Which gcc? From your example, I assumed:
int data[256];
int main()
{
data['a'] = 1;
data['b'] = 1;
return 0;
}
Compiled as C++ There was not a single warning in:
g++-4.3 (-Wall -pedantic)
g++ 4.1.0 (under Solaris) definitely warns in this case when
-Wall -pedantic is used.
mingw-gcc-3.4.1
So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
under Windows.
icpc (intel CC 10.1)
Maybe you made another mistake not shown in your incomplete
excerpt.
I have no problem reproducing his warnings, with several
different versions of g++, as long as -Wall is used. The actual
warning is "char-subscripts", so adding -Wno-char-subscripts
*after* -Wall (or not using -Wall at all, but choosing
explicitly for each warning) will suppress it. Which you
probably should do---this is one of those brain dead warnings of
which every compiler seems to have a few.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #11
James Kanze wrote:
g++ 4.1.0 (under Solaris) definitely warns in this case when
-Wall -pedantic is used.
...
So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
under Windows.
I have no problem reproducing his warnings, with several
different versions of g++, as long as -Wall is used. The actual
warning is "char-subscripts", so adding -Wno-char-subscripts
*after* -Wall (or not using -Wall at all, but choosing
explicitly for each warning) will suppress it. Which you
probably should do---this is one of those brain dead warnings of
which every compiler seems to have a few.
OK, I checked again (-Wall, -pedantic if possible):

1) gcc version 3.4.2 (mingw-special)
/s/misc/charsubscr/charsubscr.cxx:6: warning: array subscript has type `char'
/s/misc/charsubscr/charsubscr.cxx:7: warning: array subscript has type `char'

2) gcc version 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
charsubscr.cxx:6: warning: array subscript has type `char'
charsubscr.cxx:7: warning: array subscript has type `char'

3) gcc version 4.2.3 20071030 (Linux)
(no warning)

4) gcc version 4.3.1 20080507 [gcc-4_3-branch revision 135036] (Linux)
(no warning)

5) icpc Version 10.1 (Linux)
(no warning)

6) Visual C++ 6 (SP6), Warning Level 4 (XP/SP2)
(no warning)

7) Visual C++ 9 (SP0), Warning Level 4 (XP/SP2)
(no warning)
So the gcc < 4.x seems to be the only tool
that emits this warning (?).

Thanks & Regards

Mirco
Jun 27 '08 #12
On Jun 18, 11:01 am, Mirco Wahab <wa...@chemie.uni-halle.dewrote:
James Kanze wrote:
g++ 4.1.0 (under Solaris) definitely warns in this case when
-Wall -pedantic is used.
...
So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
under Windows.
I have no problem reproducing his warnings, with several
different versions of g++, as long as -Wall is used. The actual
warning is "char-subscripts", so adding -Wno-char-subscripts
*after* -Wall (or not using -Wall at all, but choosing
explicitly for each warning) will suppress it. Which you
probably should do---this is one of those brain dead warnings of
which every compiler seems to have a few.
OK, I checked again (-Wall, -pedantic if possible):
[...]
So the gcc < 4.x seems to be the only tool that emits this
warning (?).
I get it with g++ 4.1. So maybe they realized how stupid it
was, and got rid of it (or at least dropped it from -Wall).

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #13
On 17 Jun., 00:48, Ivan <i...@novickmail.comwrote:
Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?
It would be helpful, to post also the gcc warnings (complaints).

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
Jun 27 '08 #14

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

51 posts views Thread by Pedro Graca | last post: by
25 posts views Thread by ehabaziz2001 | last post: by
272 posts views Thread by Peter Olcott | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.