Implementations with CHAR_BIT=32

Skarmander

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Just out of curiosity, does anyone know actual implementations that have
this?

S.

Nov 15 '05 #1

Subscribe Reply

3763

Ben Pfaff

Skarmander <in*****@dontmailme.com> writes:

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?
Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. Another problem is that declaring an array of
UCHAR_MAX elements is probably not possible; UCHAR_MAX + 1
elements is a constraint violation. I'm sure that other common
practices would fail as well.
Just out of curiosity, does anyone know actual implementations that have
this?

I have heard that some DSPs use this model, but not hosted
implementations.
--
"...Almost makes you wonder why Heisenberg didn't include postinc/dec operators
in the uncertainty principle. Which of course makes the above equivalent to
Schrodinger's pointer..."
--Anthony McDonald

Nov 15 '05 #2

Skarmander

Ben Pfaff wrote:

Skarmander <in*****@dontmailme.com> writes:

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. Another problem is that declaring an array of
UCHAR_MAX elements is probably not possible; UCHAR_MAX + 1
elements is a constraint violation. I'm sure that other common
practices would fail as well.

<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set (which
fails), and that the character set is "small" for some reasonable value
of "small", which does not include "32 bits" (this will probably still
hold).

Either that or the application really wants 8-bit bytes, but is using
UCHAR_MAX because it looks neater (which could be considered a bug, not
just an assumption).

I don't quite see the EOF problem, though. It's probably just my lack of
imagination, but could you give a code snippet that fails?

S.

Nov 15 '05 #3

Ben Pfaff

Skarmander <in*****@dontmailme.com> writes:

I don't quite see the EOF problem, though. It's probably just my lack of
imagination, but could you give a code snippet that fails?

Something like this is often used to detect end-of-file or error:
if (getc(file) == EOF) {
/* ...handle error or end of file... */
}
If "int" and "char" have the same range, then a return value of
EOF doesn't necessarily mean that an error or end-of-file was
encountered.
--
"To get the best out of this book, I strongly recommend that you read it."
--Richard Heathfield

Nov 15 '05 #4

Michael Mair

Skarmander wrote:

Ben Pfaff wrote:
Skarmander <in*****@dontmailme.com> writes:

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. Another problem is that declaring an array of
UCHAR_MAX elements is probably not possible; UCHAR_MAX + 1
elements is a constraint violation. I'm sure that other common
practices would fail as well.

<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set (which
fails), and that the character set is "small" for some reasonable value
of "small", which does not include "32 bits" (this will probably still
hold).

Either that or the application really wants 8-bit bytes, but is using
UCHAR_MAX because it looks neater (which could be considered a bug, not
just an assumption).

I don't quite see the EOF problem, though. It's probably just my lack of
imagination, but could you give a code snippet that fails?

- Functions from <ctype.h> have an int parameter which is either
representable by unsigned char or equals the value of the macro
EOF (which is a negative integral constant expression).

- fgetc()/getc returns either the next character as unsigned char
converted to an int or EOF; with fputc(), you write an int which
is a char converted to unsigned char.

If the value range of unsigned char is not contained in int we
have signed integer overflow. If we, for one moment, assume this
overflow is well-defined and "wraps around" just as in the unsigned
integer case, then we still have the problem that we cannot
discern whether EOF is intended to be EOF or (int)((unsigned char) EOF).

So, character based I/O and <ctype.h> gives us some trouble.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Nov 15 '05 #5

Jordan Abel

On 2005-11-07, Skarmander <in*****@dontmailme.com> wrote:

Ben Pfaff wrote:
Skarmander <in*****@dontmailme.com> writes:

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. Another problem is that declaring an array of
UCHAR_MAX elements is probably not possible; UCHAR_MAX + 1
elements is a constraint violation. I'm sure that other common
practices would fail as well.

<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set
(which fails), and that the character set is "small" for some
reasonable value of "small", which does not include "32 bits" (this
will probably still hold).

eh - 20 bits [actually 20.0875-ish] bits is still pretty large, and
those will need to be stored in 32 bits most likely.
Either that or the application really wants 8-bit bytes, but is
using UCHAR_MAX because it looks neater (which could be considered a
bug, not just an assumption).

I don't quite see the EOF problem, though. It's probably just my
lack of imagination, but could you give a code snippet that fails?

It would be possible if there actually were a 32-bit character set, or
if getchar() read four bytes at a time from the file.

Nov 15 '05 #6

Keith Thompson

Jordan Abel <jm****@purdue.edu> writes:

On 2005-11-07, Skarmander <in*****@dontmailme.com> wrote:

[...]

I don't quite see the EOF problem, though. It's probably just my
lack of imagination, but could you give a code snippet that fails?

It would be possible if there actually were a 32-bit character set, or
if getchar() read four bytes at a time from the file.

getchar() by definition reads one byte at a time from the file.
A byte may be larger than 8 bits.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 15 '05 #7

pete

Skarmander wrote:

Ben Pfaff wrote:
Skarmander <in*****@dontmailme.com> writes:

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. Another problem is that declaring an array of
UCHAR_MAX elements is probably not possible; UCHAR_MAX + 1
elements is a constraint violation. I'm sure that other common
practices would fail as well.

<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set (which
fails), and that the character set is "small" for some reasonable value
of "small", which does not include "32 bits" (this will probably still
hold).

Either that or the application really wants 8-bit bytes, but is using
UCHAR_MAX because it looks neater (which could be considered a bug, not
just an assumption).

I don't quite see the EOF problem, though.
It's probably just my lack of
imagination, but could you give a code snippet that fails?

int putchar(int c);

putchar returns either ((int)(unsigned char)c) or EOF.

If sizeof(int) equals one, and c is negative,
then (unsigned char)c is greater than INT_MAX,
and that means that ((int)(unsigned char)c)
would be implementation defined
and possibley negative, upon success.

--
pete

Nov 15 '05 #8

slebetman

Skarmander wrote:

Ben Pfaff wrote:
Skarmander <in*****@dontmailme.com> writes:

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. Another problem is that declaring an array of
UCHAR_MAX elements is probably not possible; UCHAR_MAX + 1
elements is a constraint violation. I'm sure that other common
practices would fail as well.

<snip>
I'd imagine that declaring an array of UCHAR_MAX elements is most
commonly done under the assumption that `char' is not significantly
larger than necessary to hold the characters in the character set (which
fails), and that the character set is "small" for some reasonable value
of "small", which does not include "32 bits" (this will probably still
hold).

Either that or the application really wants 8-bit bytes, but is using
UCHAR_MAX because it looks neater (which could be considered a bug, not
just an assumption).

I don't quite see the EOF problem, though. It's probably just my lack of
imagination, but could you give a code snippet that fails?

S.

See http://www.homebrewcpu.com/projects.htm
and scroll down to his last project "LCC retargeting for D16/M homebrew
computer".
The D16/M is a hobbyist-designed, homebrew CPU which is fully 16 bits.
It cannot even handle 8 bits. So the architecture has 16bit chars,
16bit shorts, 16bit ints and 16bit pointers.
And the compiler worked quite well...
If you are never going to process text files generated on other
systems, there's no reason for chars to be 8 bits.

Nov 15 '05 #9

Jack Klein

On Mon, 07 Nov 2005 22:29:29 +0100, Skarmander
<in*****@dontmailme.com> wrote in comp.lang.c:

The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Just out of curiosity, does anyone know actual implementations that have
this?

S.

I used an Analog Devices 32-bit SHARC DSP a few years ago, I forget
the exact model, where CHAR_BIT was 32 and all the integer types (this
was before long long) were 32 bits.

I currently do a lot of work with a Texas Instruments DSP in the
TMS32F28xx family where CHAR_BIT is 16 and the char, short, and int
types all share the same representation and size.

I imagine other DSPs from these and other manufacturers are similar,
although Freescale (was Motorola) has a 16 bit DSP that they say
supports CHAR_BIT 8, although I haven't used it.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Nov 15 '05 #10

Tim Rentsch

Ben Pfaff <bl*@cs.stanford.edu> writes:

Skarmander <in*****@dontmailme.com> writes:
The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. [...]

Just a reminder that CHAR_BIT == 32 and sizeof(int) == 1 both being
true doesn't automatically imply either that INT_MAX == CHAR_MAX or
that INT_MIN == CHAR_MIN. In particular,

INT_MAX 2147483647 CHAR_MAX 1073741823
INT_MIN -2147483648 CHAR_MIN -1073741824

are allowed, or even

INT_MAX 2147483647 CHAR_MAX 127
INT_MIN -2147483648 CHAR_MIN -128

are allowed.

What I would expect in a hosted implementation with CHAR_BIT == 32
and sizeof(int) == 1 is

INT_MAX 2147483647 CHAR_MAX 2147483647
INT_MIN -2147483648 CHAR_MIN -2147483647

So EOF could be -2147483648 and there would be no conflict with any
character value. Of course, on such a system, outputting binary data
would most likely be done with unsigned char rather than char.

Nov 15 '05 #11

Richard Bos

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:

Ben Pfaff <bl*@cs.stanford.edu> writes:
Skarmander <in*****@dontmailme.com> writes:
The standard allows an implementation where CHAR_BIT = 32 and sizeof
char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. [...]

Just a reminder that CHAR_BIT == 32 and sizeof(int) == 1 both being
true doesn't automatically imply either that INT_MAX == CHAR_MAX or
that INT_MIN == CHAR_MIN. In particular,

INT_MAX 2147483647 CHAR_MAX 1073741823
INT_MIN -2147483648 CHAR_MIN -1073741824

are allowed, or even

INT_MAX 2147483647 CHAR_MAX 127
INT_MIN -2147483648 CHAR_MIN -128

are allowed.

Not for char, it isn't. Other types can have padding bits; char
(unsigned in any case, and AFAIK since a TC some time ago the other
kinds, too) can not.

Richard

Nov 15 '05 #12

Tim Rentsch

rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:

Tim Rentsch <tx*@alumnus.caltech.edu> wrote:
Ben Pfaff <bl*@cs.stanford.edu> writes:
Skarmander <in*****@dontmailme.com> writes:

> The standard allows an implementation where CHAR_BIT = 32 and sizeof
> char = sizeof short = sizeof int = sizeof long = 1, right?

Yes. A lot of otherwise well-written code would malfunction on
such a (hosted) implementation, because EOF is now in the range
of a signed char. [...]

Just a reminder that CHAR_BIT == 32 and sizeof(int) == 1 both being
true doesn't automatically imply either that INT_MAX == CHAR_MAX or
that INT_MIN == CHAR_MIN. In particular,

INT_MAX 2147483647 CHAR_MAX 1073741823
INT_MIN -2147483648 CHAR_MIN -1073741824

are allowed, or even

INT_MAX 2147483647 CHAR_MAX 127
INT_MIN -2147483648 CHAR_MIN -128

are allowed.

Not for char, it isn't. Other types can have padding bits; char
(unsigned in any case, and AFAIK since a TC some time ago the other
kinds, too) can not.

I need to ask for a reference. It's true that unsigned char can't
have padding bits, but I can't find any evidence that signed char
can't have padding bits. Apparently it _was_ true in C89/C90 that the
committee didn't expect that signed char's would ever have padding
bits; however, it seems like this unstated assumption was cleared up
in a DR (DR 069, I believe). The language in 6.2.6.2 p1,p2 seems to
say fairly plainly that padding bits aren't allowed for unsigned char
but are allowed for all other integer types.

There was a posting from Doug Gwyn in comp.std.c on July 12 of this
year saying that signed char's could have padding bits. A search in
Google Groups

"signed char" "padding bits" numeration

should turn that up. The message was:

Doug Gwyn in comp.std.c:

Keith Thompson wrote:
> If it doesn't produce undefined behavior, are we to infer that it
> produces defined behavior? If so, where is the behavior defined?

Certainly the value is well defined for unsigned char
(no padding bits). For signed char there can be
padding bits, but they don't affect the value, AND
(apparently according to the spec) the object can be
accessed regardless of the contents of the padding
bits. The only conformance issues would thus appear
to be: which of the three allowed binary numeration
schemes is used, and how many value bits are present?

Also: even if signed char's have no padding bits, it's still
true that INT_MIN < SCHAR_MIN can hold, which lets INT_MIN
serve as a value for EOF.

Nov 15 '05 #13

Implementations with CHAR_BIT=32

Similar topics