3-byte ints

Ed Morton

I have 2 counters - one is required to be a 2-byte variable while the
other is required to be 3 bytes (not my choice, but I'm stuck with it!).
I've declared them as:

unsigned short small;
unsigned long large: 24;

First question - is that the best way to declare the "large" one to
ensure it's 3 bytes? Another suggestion I got was "unsigned char
large[3];" but that would be a little tougher to do arithmetic
operations on.

Now I need a general-purpose macro to increment them and I need to be
able to produce a report if the counter would overflow when incremented.
So I don't need to have separate macros for each type of counter, I've
done this by introducing a temporary unsigned long (no counter will be
larger than that) to store the original and then doing the increment
using the real counter (to ensure that overflow occurs when expected for
that type of counter) then testing if it rolled over by seeing if the
result is less than the original as in the "incCntr()" macro below:

#include <stdio.h>

typedef struct {
unsigned short small;
unsigned long large: 24;
} CNTRS;

#define incCntr(cntr,incr) \
do { \
unsigned long _tmp = (unsigned long)(cntr); \
(cntr) = (cntr) + (incr); \
if ((cntr) < _tmp) { \
printf("Overflow at %lu + %d\n",_tmp,(incr));\
(cntr) = _tmp; \
} \
printf("%lu -> %lu\n",_tmp,(unsigned long)(cntr));\
} while(0)

int main(void)
{
CNTRS cntrs;
cntrs.small = 65529;
cntrs.large = 16777210;
incCntr(cntrs.small,5);
incCntr(cntrs.small,5);
incCntr(cntrs.large,5);
incCntr(cntrs.large,5);
return 1;
}

Second question - anyone see any problems with doing the overflow test
this way or can suggest a better alternative?

When compiling I get this warning:

gcc -Wall -otst tst.c
tst.c: In function `main':
tst.c:26: warning: long unsigned int format, unsigned int arg (arg 3)
tst.c:27: warning: long unsigned int format, unsigned int arg (arg 3)

It's complaining about the "cntr" argument in the line:

printf("%lu -> %lu\n",_tmp,(unsigned long)(cntr));

Third question - why is the compiler apparently ignoring my cast and
complaining that "(unsigned long)(cntr)" is an unsigned int?

Fourth question - would there be any reason not to declare my "small"
counter as an unsigned long bit-field too, i.e.:

unsigned long small: 16;

for consistency?

FWIW, the result of running the program is what I expected:

tst
65529 -> 65534
Overflow at 65534 + 5
65534 -> 65534
16777210 -> 16777215
Overflow at 16777215 + 5
16777215 -> 16777215

Regards,

Ed.

Nov 13 '05 #1

Subscribe Reply

8796

Mark A. Odell

Ed Morton <mo****************@lucent.com> wrote in
news:bk*******@netnews.proxy.lucent.com:

I have 2 counters - one is required to be a 2-byte variable while the
other is required to be 3 bytes (not my choice, but I'm stuck with it!).
I've declared them as:

unsigned short small;
unsigned long large: 24;

How does the bit field help you? Why not:

unsigned short small; // 16-bits on this platform
unsigned long large; // 32-bits on this platform

#define INCR_SMALL(s, i) do \
{ \
if (s + i <= UNSIGNED_SHORT_MAX) s += i; \
else printf("Overflow off " #s "+" #i "\n"); \
} while (0)

#define INCR_LARGE(l, i) do \
{ \
if (l + i <= 0x00FFFFFFUL) l += i; \
else printf("Overflow off " #l "+" #i "\n"); \
} while (0)

Note: above is untested.

--
- Mark ->
--

Nov 13 '05 #2

Mark A. Odell

"Mark A. Odell" <no****@embeddedfw.com> wrote in
news:Xn*******************************@130.133.1.4 :

#define INCR_SMALL(s, i) do \
{ \
if ((unsigned long) s + i <= UNSIGNED_SHORT_MAX) s += i; \

^^^^^^^^^^^^^^^
Oops, need this.

--
- Mark ->
--

Nov 13 '05 #3

Kevin Bracey

In message <bk*******@netnews.proxy.lucent.com>
Ed Morton <mo****************@lucent.com> wrote:

I have 2 counters - one is required to be a 2-byte variable while the
other is required to be 3 bytes (not my choice, but I'm stuck with it!).
I've declared them as:

unsigned short small;
unsigned long large: 24;

First question - is that the best way to declare the "large" one to
ensure it's 3 bytes?
Pretty much, assuming it's in a structure. The only things I'd say are:

1) C90 doesn't allow anything other than "int" and "unsigned int" for
bitfield types. C99 does allow implementations to offer other types
like "unsigned long"; presumably your implementation does - it's
a common extension.

2) The size of a bitfield can't exceed that of its type - if you did
change it to "unsigned int", you'd then have a requirement that
int was at least 24 bits (but you'd get a diagnostic if it wasn't).
[ snip code ]
Second question - anyone see any problems with doing the overflow test
this way or can suggest a better alternative?
Looks fine to me. If you're using C99 you could use uintmax_t rather than
unsigned long, just in case you end up with larger bitfields in future.

I wouldn't return 1 from main though - that'll probably be interpreted as
an error condition by the calling environment.
It's complaining about the "cntr" argument in the line:

printf("%lu -> %lu\n",_tmp,(unsigned long)(cntr));

Third question - why is the compiler apparently ignoring my cast and
complaining that "(unsigned long)(cntr)" is an unsigned int?
Because it's buggy? Your code looks fine to me.
Fourth question - would there be any reason not to declare my "small"
counter as an unsigned long bit-field too, i.e.:

unsigned long small: 16;

for consistency?

Not really, modulo the comments about types above. It might be more portable
as it would guarantee exactly 16 bits, which 'short' wouldn't. But in
practice, I've seen compilers generate significantly different code for a
16-bit bitfield versus a short; it's not unlikely that 'short' may be more
optimised, either in code generation terms or just alignment.

--
Kevin Bracey, Principal Software Engineer
Tematic Ltd Tel: +44 (0) 1223 503464
182-190 Newmarket Road Fax: +44 (0) 1223 503458
Cambridge, CB5 8HE, United Kingdom WWW: http://www.tematic.com/

Nov 13 '05 #4

Ed Morton

Mark A. Odell wrote:

Ed Morton <mo****************@lucent.com> wrote in
news:bk*******@netnews.proxy.lucent.com:

I have 2 counters - one is required to be a 2-byte variable while the
other is required to be 3 bytes (not my choice, but I'm stuck with it!).
I've declared them as:

unsigned short small;
unsigned long large: 24;

How does the bit field help you? Why not:

unsigned short small; // 16-bits on this platform
unsigned long large; // 32-bits on this platform

<snip>

I have to pass this structure to some other code that's expecting
several fields each to be exactly 3 bytes.
#define INCR_SMALL(s, i) do \
{ \
if (s + i <= UNSIGNED_SHORT_MAX) s += i; \
else printf("Overflow off " #s "+" #i "\n"); \
} while (0)

#define INCR_LARGE(l, i) do \
{ \
if (l + i <= 0x00FFFFFFUL) l += i; \
else printf("Overflow off " #l "+" #i "\n"); \
} while (0)

Note: above is untested.

To isolate the callers of the macro from the types of the counters, it'd
mean creating a separate macro for each counter (I actually have several
3-byte counters and several 2-byte counters), e.g.:

#define INCR_C1(cntrs,incr) INCR_SMALL(cntrs.small1,incr)
#define INCR_C2(cntrs,incr) INCR_SMALL(cntrs.small2,incr)
#define INCR_C3(cntrs,incr) INCR_LARGE(cntrs.large1,incr)
.....

I do prefer that to incrementing the counter first and then having to
reset it later.

Once I added in the cast to unsigned long for the "s + i", it worked.

Thanks.

Ed.

Nov 13 '05 #5

Simon Biber

"Ed Morton" <mo****************@lucent.com> wrote:

I have to pass this structure to some other code that's expecting
several fields each to be exactly 3 bytes.

Most C implementations do not support exact 3 byte integer types.

If it needs to be laid out exactly so in memory, you can create
an array of unsigned characters.

void pack(unsigned char *three, unsigned long value)
{
assert(value < (1UL << 24));
three[0] = value & 0xFF;
three[1] = value >> 8 & 0xFF;
three[2] = value >> 16 & 0xFF;
}

unsigned long unpack(unsigned char *three)
{
return (unsigned long)three[0]
| (unsigned long)three[1] << 8
| (unsigned long)three[2] << 16;
}

These functions assume you will be packing 8 bits into each byte,
and using a little-endian packing layout.

--
Simon.

Nov 13 '05 #6

Ed Morton

On 9/23/2003 12:40 PM, Simon Biber wrote:

"Ed Morton" <mo****************@lucent.com> wrote:
I have to pass this structure to some other code that's expecting
several fields each to be exactly 3 bytes.

Most C implementations do not support exact 3 byte integer types.

So, if I use:

unsigned long large: 24;

then the code may not work on my original platform and, even if it does, it
isn't portable, right? What kind of problems could I expect to see? Is there any
way to test whether or not I actually have a problem?
If it needs to be laid out exactly so in memory, you can create
an array of unsigned characters. <snip> These functions assume you will be packing 8 bits into each byte,
and using a little-endian packing layout.

Sounds like I'll be needing those.

Thanks,

Ed.

Nov 13 '05 #7

Peter Nilsson

"Simon Biber" <sb****@optushome.com.au> wrote in message news:<3f**********************@news.optusnet.com.a u>...

"Ed Morton" <mo****************@lucent.com> wrote:
I have to pass this structure to some other code that's expecting
several fields each to be exactly 3 bytes.
Most C implementations do not support exact 3 byte integer types.

Do any? :-)
If it needs to be laid out exactly so in memory, you can create
an array of unsigned characters.

void pack(unsigned char *three, unsigned long value)
{
assert(value < (1UL << 24));
three[0] = value & 0xFF;
three[1] = value >> 8 & 0xFF;
three[2] = value >> 16 & 0xFF;
}

unsigned long unpack(unsigned char *three)
{
return (unsigned long)three[0]
| (unsigned long)three[1] << 8
| (unsigned long)three[2] << 16;
}

These functions assume you will be packing 8 bits into each byte,
Why? I know we live in an octet world, but you can do this portably
with CHAR_BIT and UCHAR_MAX.
and using a little-endian packing layout.

--
Peter

Nov 13 '05 #8

Simon Biber

"Peter Nilsson" <ai***@acay.com.au> wrote:

"Simon Biber" <sb****@optushome.com.au> wrote:
Most C implementations do not support exact 3 byte integer types.

Do any? :-)

I don't know of any, but I've learnt not to make generalisations on
comp.lang.c as someone inevitably provides an example to the contrary.

--
Simon.

Nov 13 '05 #9

Simon Biber

"Ed Morton" <mo****************@Lucent.com> wrote:

So, if I use:

unsigned long large: 24;
It's not portable to use 'unsigned long' as the base type for a bitfield; the
only portable types are 'int' and 'unsigned int'.
then the code may not work on my original platform and, even if it does, it
isn't portable, right? What kind of problems could I expect to see? Is there
any way to test whether or not I actually have a problem?

You need to know the exact binary format expected, then conform to it.

A 24-bit bitfield will probably still take up four bytes, and you have no
control over exactly where and in what order the 24 bits are stored.

You can check it out on your computer by:
#include <stdio.h>

int main(void)
{
struct foo {unsigned long large : 24; } bar;
size_t i;

bar.large = 0xDEADBE;
printf("%lu\n", (long unsigned) sizeof bar);
for(i = 0; i < sizeof bar; i++)
{
printf("%02X ", ((unsigned char *)&bar)[i]);
}
putchar('\n');
return 0;
}

I get:
4
BE AD DE 61

Which indicates the bitfield is stored in the first three of four bytes, in
little-endian order, and that the fourth (padding) byte is uninitialised.
Your results may vary.

--
Simon.

Nov 13 '05 #10

Jack Klein

On Tue, 23 Sep 2003 09:19:37 -0500, Ed Morton
<mo****************@lucent.com> wrote in comp.lang.c:

I have 2 counters - one is required to be a 2-byte variable while the
That would 32 bits on one compiler I use, and 64 bits on another.
other is required to be 3 bytes (not my choice, but I'm stuck with it!).

This would be 48 bits on one compiler I use, and 96 bits on another.

Since a byte in C must have at least 8 bits but may have more, and
does on some architectures, if you mean "16 bits" and "24 bits", say
that. 2 bytes means "at least 16 bits, but possibly more" and 3 bytes
means "at least 24 bits, but possible more".

If you mean 16 bits and 24 bits, say so. There are architectures I
work with where both would fit into a byte.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Nov 13 '05 #11

Jack Klein

On Wed, 24 Sep 2003 13:45:53 +1000, "Simon Biber"
<sb****@optushome.com.au> wrote in comp.lang.c:

"Ed Morton" <mo****************@Lucent.com> wrote:
So, if I use:

unsigned long large: 24;

It's not portable to use 'unsigned long' as the base type for a bitfield; the
only portable types are 'int' and 'unsigned int'.
then the code may not work on my original platform and, even if it does, it
isn't portable, right? What kind of problems could I expect to see? Is there
any way to test whether or not I actually have a problem?

You need to know the exact binary format expected, then conform to it.

A 24-bit bitfield will probably still take up four bytes, and you have no
control over exactly where and in what order the 24 bits are stored.

On a Motorola 56000 it will fit perfectly in 1 byte, which happens to
have exactly 24 bits.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Nov 13 '05 #12

Jack Klein

On 23 Sep 2003 17:40:50 -0700, ai***@acay.com.au (Peter Nilsson) wrote
in comp.lang.c:

"Simon Biber" <sb****@optushome.com.au> wrote in message news:<3f**********************@news.optusnet.com.a u>...
"Ed Morton" <mo****************@lucent.com> wrote:
I have to pass this structure to some other code that's expecting
several fields each to be exactly 3 bytes.

Most C implementations do not support exact 3 byte integer types.

Do any? :-)
If it needs to be laid out exactly so in memory, you can create
an array of unsigned characters.

void pack(unsigned char *three, unsigned long value)
{
assert(value < (1UL << 24));
three[0] = value & 0xFF;
three[1] = value >> 8 & 0xFF;
three[2] = value >> 16 & 0xFF;
}

unsigned long unpack(unsigned char *three)
{
return (unsigned long)three[0]
| (unsigned long)three[1] << 8
| (unsigned long)three[2] << 16;
}

These functions assume you will be packing 8 bits into each byte,

Why? I know we live in an octet world, but you can do this portably
with CHAR_BIT and UCHAR_MAX.
and using a little-endian packing layout.

Actually even when an implementation has CHAR_BIT > 8, it is quite
easy and useful to write code using just 8 bits in an unsigned char,
and far more portable. Also, living in a octet-oriented world of
communications standards, it is often necessary to handle individual 8
bit quantities as individual items.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Nov 13 '05 #13

R Pradeep Chandran

On Wed, 24 Sep 2003 04:14:05 GMT, in comp.lang.c, Jack Klein wrote:
:On Tue, 23 Sep 2003 09:19:37 -0500, Ed Morton
:<mo****************@lucent.com> wrote in comp.lang.c:
:
:> I have 2 counters - one is required to be a 2-byte variable while the
:
:That would 32 bits on one compiler I use, and 64 bits on another.
:
:> other is required to be 3 bytes (not my choice, but I'm stuck with it!).
:
:This would be 48 bits on one compiler I use, and 96 bits on another.

<snip>

Which are those compilers and their corresponding target platforms?
Could you please post some details of them? It is not that I don't
believe you. But, A lot of my colleagues and friends don't believe in
CHAR_BIT != 8 and I would really like to point out these cases to them.

Have a nice day,
Pradeep
--
R Pradeep Chandran pradeep DOT chandran AT sisl.co.in
All opinions are mine and do not represent those of my employer.

Nov 13 '05 #14

Ed Morton

Based on the feedback, I'll declare my variables without bitfields and
test them for overflow as Mark suggested, and then pack then into an
array of unsigned chars as Simon suggested when I need to send them to
the code I interface with.

Unsigned short and unsigned long are guaranteed to be 16 bits and 32
bits respectively on this and any future platform this code will run on
as it's adding to a large existing base that depends on those sizes.

Thanks to all who replied.

Ed.

Nov 13 '05 #15

Mark Gordon

On Wed, 24 Sep 2003 13:05:29 +1000
"Simon Biber" <sb****@optushome.com.au> wrote:

"Peter Nilsson" <ai***@acay.com.au> wrote:
"Simon Biber" <sb****@optushome.com.au> wrote:
Most C implementations do not support exact 3 byte integer types.

Do any? :-)

I don't know of any, but I've learnt not to make generalisations on
comp.lang.c as someone inevitably provides an example to the contrary.

I know of (but have not used) a 24 bit processor which has a C compiler
available for it. Specifically the Motorola DSP56000. I would assume
that int is 24 bits and it is even possible that char could be 24 bits!
--
Mark Gordon
Paid to be a Geek & a Senior Software Developer
Although my email address says spamtrap, it is real and I read it.

Nov 13 '05 #16

Mark Gordon

On Wed, 24 Sep 2003 13:05:29 +1000
"Simon Biber" <sb****@optushome.com.au> wrote:

"Peter Nilsson" <ai***@acay.com.au> wrote:
"Simon Biber" <sb****@optushome.com.au> wrote:
Most C implementations do not support exact 3 byte integer types.

Do any? :-)

I don't know of any, but I've learnt not to make generalisations on
comp.lang.c as someone inevitably provides an example to the contrary.

Nov 13 '05 #17

Jack Klein

On Wed, 24 Sep 2003 16:20:13 +0530, R Pradeep Chandran <se*@sig.below>
wrote in comp.lang.c:

On Wed, 24 Sep 2003 04:14:05 GMT, in comp.lang.c, Jack Klein wrote:
:On Tue, 23 Sep 2003 09:19:37 -0500, Ed Morton
:<mo****************@lucent.com> wrote in comp.lang.c:
:
:> I have 2 counters - one is required to be a 2-byte variable while the
:
:That would 32 bits on one compiler I use, and 64 bits on another.
:
:> other is required to be 3 bytes (not my choice, but I'm stuck with it!).
:
:This would be 48 bits on one compiler I use, and 96 bits on another.

<snip>

Which are those compilers and their corresponding target platforms?
Could you please post some details of them? It is not that I don't
believe you. But, A lot of my colleagues and friends don't believe in
CHAR_BIT != 8 and I would really like to point out these cases to them.

Have a nice day,
Pradeep

I happen to have my laptop home with me, which has one of the
compilers installed, here is a copy and paste of a part of the
limits.h file...

========
/************************************************** ******************/
/* limits.h v3.09 */
/* Copyright (c) 1996-2003 Texas Instruments Incorporated */
/************************************************** ******************/

#ifndef _LIMITS
#define _LIMITS

#define CHAR_BIT 16 /* NUMBER OF BITS IN TYPE CHAR */
#define SCHAR_MAX 32767 /* MAX VALUE FOR SIGNED CHAR */
#define SCHAR_MIN (-SCHAR_MAX-1) /* MIN VALUE FOR SIGNED CHAR */
#define UCHAR_MAX 65535u /* MAX VALUE FOR UNSIGNED CHAR */
========

This is from Texas Instruments Code Composer Studio for the
TMS320C2810 and TMS320C2812 Digital Signal Processors.

I don't have a copy of the other compiler handy here at home to copy
and paste, so you will have to take my word for it. It is for an
Analog Devices SHARC 32-bit DSP. All the integer types are 32 bits,
and CHAR_BIT is 16.

Mind you, you won't find these sort of architectures anywhere else but
on DSPs anymore, but a lot of DSP programming is being done in C and
even C++ these days.

These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.

There are also the early members of the Motorola 56000 DSP family,
which had a 24-bit word size. char, short, and int were all 24 bits,
and long was 64 bits. Many of the new 56000 family members are either
16 bit or 32 bit, but I believe some of the 24 bit versions are still
produced today.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Nov 13 '05 #18

Kevin Easton

Jack Klein <ja*******@spamcop.net> wrote:
[...]

Mind you, you won't find these sort of architectures anywhere else but
on DSPs anymore, but a lot of DSP programming is being done in C and
even C++ these days.

These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.

(It's actually an unsigned char converted to int, not plain char).

However, are you sure it has to be able to return all possible unsigned
chars? Isn't it possible for unsigned char to have 65536 possible
values, but there be only, say, 140 distinct _characters_ which the
string, input and output functions deal with? Does every possible
unsigned char value have to represent a character?

- Kevin.

Nov 13 '05 #19

Micah Cowan

Kevin Easton <kevin@-nospam-pcug.org.au> writes:

Jack Klein <ja*******@spamcop.net> wrote:
[...]
Mind you, you won't find these sort of architectures anywhere else but
on DSPs anymore, but a lot of DSP programming is being done in C and
even C++ these days.

These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.
(It's actually an unsigned char converted to int, not plain char).

I assume you mean the "all possible values" bit, not EOF.
However, are you sure it has to be able to return all possible unsigned
chars? Isn't it possible for unsigned char to have 65536 possible
values, but there be only, say, 140 distinct _characters_ which the
string, input and output functions deal with? Does every possible
unsigned char value have to represent a character?

Doesn't matter. Consider the case when you are reading binary files.

-Micah

Nov 13 '05 #20

Keith Thompson

"Simon Biber" <sb****@optushome.com.au> writes:

"Ed Morton" <mo****************@Lucent.com> wrote:
So, if I use:

unsigned long large: 24;

It's not portable to use 'unsigned long' as the base type for a bitfield; the
only portable types are 'int' and 'unsigned int'.

And 'signed int'. For a bit field, it's implementation-defined
whether plain 'int' is signed or unsigned.

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 13 '05 #21

Keith Thompson

Jack Klein <ja*******@spamcop.net> writes:
[...]

These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.

I don't see where the standard requires that EOF has to be different
from any possible char value.

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {
putchar(c);
}
return 0;
}

The comparison to EOF could be omitted, but it might save the overhead
of some function calls.

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 13 '05 #22

Barry Schwarz

On Fri, 26 Sep 2003 09:59:59 GMT, Kevin Easton
<kevin@-nospam-pcug.org.au> wrote:

Jack Klein <ja*******@spamcop.net> wrote:
[...]
Mind you, you won't find these sort of architectures anywhere else but
on DSPs anymore, but a lot of DSP programming is being done in C and
even C++ these days.

These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.

(It's actually an unsigned char converted to int, not plain char).

However, are you sure it has to be able to return all possible unsigned
chars? Isn't it possible for unsigned char to have 65536 possible
values, but there be only, say, 140 distinct _characters_ which the
string, input and output functions deal with? Does every possible
unsigned char value have to represent a character?

Obviously not since in the ASCII character set, values between 0x00
and 0x1f don't.
<<Remove the del for email>>

Nov 13 '05 #23

Kevin Easton

Keith Thompson <ks*@cts.com> wrote:

Jack Klein <ja*******@spamcop.net> writes:
[...]
These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.

I don't see where the standard requires that EOF has to be different
from any possible char value.

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))

The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1
and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

- kevin.

Nov 13 '05 #24

Keith Thompson

Kevin Easton <kevin@-nospam-pcug.org.au> writes:

Keith Thompson <ks*@cts.com> wrote: [...]
#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))

Right.
The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1
and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

Having sizeof(int)==1 breaks the common "while ((c=getchar()) != EOF)"
idiom, but I don't see that it breaks anything else -- which argues
that the common idiom is non-portable.

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 13 '05 #25

Peter Nilsson

"Keith Thompson" <ks*@cts.com> wrote in message
news:lz************@cts.com...

Kevin Easton <kevin@-nospam-pcug.org.au> writes:
Keith Thompson <ks*@cts.com> wrote:

[...]
#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) &&
!ferror(stdin)) {
ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))

Right.
The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1 and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

Having sizeof(int)==1 breaks the common "while ((c=getchar()) != EOF)"
idiom, but I don't see that it breaks anything else -- which argues
that the common idiom is non-portable.

And it always has been. [Under C99 it is even worse since the conversion of
an unsigned char to signed int can theoretically raise an implementation
defined signal! Thus reducing getc to the level of gets.]

The unwritten assumption about hosted implementations is naturally that
UCHAR_MAX <= INT_MAX. Why the standards never made this normative seems a
mystery to lesser minds like my own.

--
Peter

Nov 13 '05 #26

Barry Schwarz

On Sat, 27 Sep 2003 01:56:02 GMT, Keith Thompson <ks*@cts.com> wrote:

Jack Klein <ja*******@spamcop.net> writes:
[...]
These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.
I don't see where the standard requires that EOF has to be different
from any possible char value.

EOF must have type int and be negative. On those systems where char
is unsigned, it obviously cannot be a char value.

It could be a valid char on a system where char is signed. But, as
explained below, none of the normal character I/O functions can return
any negative value other than for end of file or I/O error

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {
Coding problem here:

If c == EOF, then the remaining to expressions following the
first && will never be evaluated due to && short circuit. The while
will evaluate to false and the loop terminated immediately, regardless
of the status of feof and ferror. Consequently, you don't know if you
have hit the real EOF or the merely a character that looks like it.

If c != EOF, you are pretty much guaranteed that !feof() and
!ferror will both be true also.

Therefore, the expression c != EOF defeats the purpose of what
you want the expression after the comma to do.

Logic problem also:

getchar "returns the next character of [stdin] as an unsigned
char (converted to an int), or an EOF if end of file or error occurs"
(from K&R2, B1.4). Since an unsigned int cannot be negative and EOF
has to be, getchar cannot return EOF for a normal character.
putchar(c);
}
return 0;
}

The comparison to EOF could be omitted, but it might save the overhead
of some function calls.

<<Remove the del for email>>

Nov 13 '05 #27

Kevin Easton

Keith Thompson <ks*@cts.com> wrote:

Kevin Easton <kevin@-nospam-pcug.org.au> writes:
Keith Thompson <ks*@cts.com> wrote: [...]
> #include <stdio.h>
> int main(void)
> {
> int c;
> while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))

Right.
The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1
and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

OK, you're right - it just has to be converted to an int.
Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

....but the conversion to int that takes place is in no way defined (it
just says "as an unsigned char converted to int") - so you don't know
how it'll be converted. It doesn't say it has to be a reversible
conversion, or even a stable one.

Perhaps you could read the requirement that anything written to a binary
stream will compare equal to the original value when it's read back as
meaning that the unsigned char / int conversions mentioned in the
character reading and writing functions have to be stable, reversible
and the inverse of each other.

You still break ungetc() if a valid character maps to EOF, since you
couldn't ungetc that character:

4 If the value of c equals that of the macro EOF, the operation fails
and the input stream is unchanged.

- Kevin.

Nov 13 '05 #28

Keith Thompson

Kevin Easton <kevin@-nospam-pcug.org.au> writes:

Keith Thompson <ks*@cts.com> wrote:

[...]

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

...but the conversion to int that takes place is in no way defined (it
just says "as an unsigned char converted to int") - so you don't know
how it'll be converted. It doesn't say it has to be a reversible
conversion, or even a stable one.

Thank you, that's the point I was missing. I had assumed (because I
didn't bother to check) that the conversion from unsigned char to int
was well-defined.

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 13 '05 #29

Dave Thompson

On Tue, 23 Sep 2003 15:56:31 +0100, Kevin Bracey
<ke**********@tematic.com> wrote:

In message <bk*******@netnews.proxy.lucent.com>
Ed Morton <mo****************@lucent.com> wrote:
I have 2 counters - one is required to be a 2-byte variable while the
other is required to be 3 bytes (not my choice, but I'm stuck with it!).
I've declared them as:

unsigned short small;
unsigned long large: 24;
(Within a struct, shown later.)
First question - is that the best way to declare the "large" one to
ensure it's 3 bytes?
Pretty much, assuming it's in a structure. The only things I'd say are:

It will do exactly 24-bit arithmetic, which is 3 bytes IF a byte is 8
bits, as is very common but not required. It, or rather the
"allocation unit" containing it, is very likely to occupy 32 bits or 4
usual-bytes/octets. This difference matters only if you write out
the/a containing struct to a file or over a network etc., since you
can't form (or use) a pointer to a bitfield member; or if you (need
to) care about the actual memory/bus accesses performed by the
compiled (object) form of your code when executed.
1) C90 doesn't allow anything other than "int" and "unsigned int" for
bitfield types. C99 does allow implementations to offer other types
like "unsigned long"; presumably your implementation does - it's
a common extension.
(explicitly) signed int, unsigned int, or "plain" int which unlike
non-char integer types elsewhere is not automatically signed, it is
implementation-defined as signed or unsigned. And C99 also standardly
allows _Bool (or bool with stdbool.h).

<snip>
It's complaining about the "cntr" argument in the line:

printf("%lu -> %lu\n",_tmp,(unsigned long)(cntr));

Third question - why is the compiler apparently ignoring my cast and
complaining that "(unsigned long)(cntr)" is an unsigned int?

Plus _tmp already had type unsigned long.
Because it's buggy? Your code looks fine to me.

Unless perhaps the OP (or someone) did <GACK!> #define long int </>
since you are using gcc, check the preprocessor output with -E .
- David.Thompson1 at worldnet.att.net

Nov 13 '05 #30

Similar topics