FAQ Related - why cast?

Martin

Two questions relating to FAQ answer 12.42.

(1) In the statement

s.i16 |= (unsigned)(getc(fp) << 8);

i16 is declared int. The reason for casting to (unsigned) is explained as
guarding against sign extension. But left-shifting will always fill vacated
bits with zero (assuming the right operand is nonnegative and less than the
number of bits in the left expression's type). So how is the cast useful?

(2) I am puzzled by the cast to unsigned in the following statement:

putc((unsigned)((s.i32 >> 24) & 0xff), fp);

i32 is declared long int.

As I understand it the usual arithmetic conversions will ensure the type of
the expression (s.i32 >> 24) & 0xff will be long int. That long int will be
cast to unsigned int, but what is the point? putc() expects its first
argument to be of type int. So at the moment it's going through

long int -> unsigned int -> int

whereas without the cast it would be

long int -> int

---
Martin

Nov 14 '05 #1

Subscribe Post Reply

1694

Jack Klein

On Tue, 4 Jan 2005 16:39:52 -0000, "Martin"
<martin.o_brien@[no-spam]which.net> wrote in comp.lang.c:

Two questions relating to FAQ answer 12.42.
You must have the book version of the FAQ, since 12.42 is not in the
online version.
(1) In the statement

s.i16 |= (unsigned)(getc(fp) << 8);

i16 is declared int. The reason for casting to (unsigned) is explained as
guarding against sign extension. But left-shifting will always fill vacated
bits with zero (assuming the right operand is nonnegative and less than the
number of bits in the left expression's type). So how is the cast useful?
If the int returned by getc() is negative, left shifting it produces
undefined behavior. If the int returned by getc() has a value greater
than 255, left shifting it produces undefined behavior. Converting
either of these out-of-range values to unsigned int avoids the
undefined behavior.
(2) I am puzzled by the cast to unsigned in the following statement:

putc((unsigned)((s.i32 >> 24) & 0xff), fp);

i32 is declared long int.

As I understand it the usual arithmetic conversions will ensure the type of
the expression (s.i32 >> 24) & 0xff will be long int. That long int will be
cast to unsigned int, but what is the point? putc() expects its first
argument to be of type int. So at the moment it's going through

long int -> unsigned int -> int

whereas without the cast it would be

long int -> int

This is somewhat sloppy coding. Generally, bit shifts should not be
used on signed integer types. There are too many potential surprises
(read defects, when the program does not do what the programmer
expected). If s.i32 is negative, the result of the shift is
implementation defined. It would actually make more sense to cast
s.i32 to unsigned long before the shift.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Nov 14 '05 #2

Derrick Coetzee

Jack Klein wrote:

s.i16 |= (unsigned)(getc(fp) << 8);

If the int returned by getc() is negative, left shifting it produces
undefined behavior. If the int returned by getc() has a value greater
than 255, left shifting it produces undefined behavior.

The shift is done before the cast, though. To avoid undefined behaviour
you would want to do:

s.i16 |= ((unsigned)getc(fp)) << 8;

Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is
1) or a negative value (EOF is negative).
--
Derrick Coetzee
I grant this newsgroup posting into the public domain. I disclaim all
express or implied warranty and all liability. I am not a professional.

Nov 14 '05 #3

Micah Cowan

Derrick Coetzee wrote:

Jack Klein wrote:
s.i16 |= (unsigned)(getc(fp) << 8);

If the int returned by getc() is negative, left shifting it produces
undefined behavior. If the int returned by getc() has a value greater
than 255, left shifting it produces undefined behavior.

The shift is done before the cast, though. To avoid undefined behaviour
you would want to do:

s.i16 |= ((unsigned)getc(fp)) << 8;

Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is
1) or a negative value (EOF is negative).

I'm not sure why Jack thought that a value of greater than 255
could not be left-shifted, but be assured that it is entirely
possible for getc() to return a value exceding 255, on systems
with more than 8 bits to a byte. There are people here who have
worked on such implementations.

Nov 14 '05 #4

Dietmar Schindler

Martin wrote:

Two questions relating to FAQ answer 12.42.

(1) In the statement

s.i16 |= (unsigned)(getc(fp) << 8);

i16 is declared int. The reason for casting to (unsigned) is explained as
guarding against sign extension. ...

Provided that you stated the FAQ answer correctly, the explanation is
nonsense (the left hand side of the assignment expression is of type
int, and without the cast, the right hand side is also of type int; so
there is no extension).

Nov 14 '05 #5

CBFalconer

Jack Klein wrote:

<martin.o_brien@[no-spam]which.net> wrote in comp.lang.c:
Two questions relating to FAQ answer 12.42.

You must have the book version of the FAQ, since 12.42 is not in
the online version.
(1) In the statement

s.i16 |= (unsigned)(getc(fp) << 8);

i16 is declared int. The reason for casting to (unsigned) is
explained as guarding against sign extension. But left-shifting
will always fill vacated bits with zero (assuming the right
operand is nonnegative and less than the number of bits in the
left expression's type). So how is the cast useful?

If the int returned by getc() is negative, left shifting it
produces undefined behavior. If the int returned by getc() has
a value greater than 255, left shifting it produces undefined
behavior. Converting either of these out-of-range values to
unsigned int avoids the undefined behavior.

Disagree. getc returns the integer value of an unsigned char
(positive) or EOF. The code is faulty since it doesn't handle EOF
anyway. That integer needs to be coerced into an unsigned to allow
the left shift. So the statement should be:

s.i16 |= ((unsigned)getc(fp)) << 8;

which may still not fit into an int, if the int is 16 bits. i16
should have been declared as unsigned.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #6

Martin

"Dietmar Schindler" <dS***@arcor.de> wrote in message
news:41***********@arcor.de...

Provided that you stated the FAQ answer correctly, the explanation is
nonsense (the left hand side of the assignment expression is of type
int, and without the cast, the right hand side is also of type int; so
there is no extension).

To ensure the partial quote I gave in my initial post was not misleading,
this is the question and answer from the book (c)1996 by Addison-Wesley
Publishing Company, Inc.

Question: How can I write code to conform to these old, binary data file
formats?

Answer: It's difficult because of word size and byte-order differences,
floating-point formats, and structure padding. To get the control you need
over these particulars, you may have to read and write things a byte at a
time, shuffling and rearranging as you go. (This isn't always as bad as it
sounds and gives you both code portability and complete
control.) For example, suppose that you want to read a data structure,
consisting of a character, a 32-bit integer, and a 16-bit integer, from the
stream fp into the C structure

struct mystruct {
char c;
long int i32;
int i16;
};

You might use code like this:

s.c = getc(fp);

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (unsigned)(getc(fp) << 8);
s.i32 |= getc(fp);

s.i16 = getc(fp) << 8;
s.i16 |= getc(fp);

This code assumes that getc reads 8-bit characters and that the data is
stored most significant byte first ("big endian"). The casts to (long)
ensure that the 16- and 24-bit shifts operate on long values (see question
3.14), and the cast to (unsigned) guards against sign extension. (In
general, it's safer to use all unsigned types when writing code like this,
but see question 3.19.)

The corresponding code to write the structure might look like:

putc(s.c, fp);
putc((unsigned)((s.i32 >> 24) & 0xff), fp);
putc((unsigned)((s.i32 >> 16) & 0xff), fp);
putc((unsigned)((s.i32 >> 8) & 0xff), fp);
putc((unsigned)(s.i32 & 0xff), fp);
putc(s.i16 >> 8) & 0xff, fp);
putc(s.i16 & 0xff, fp);

See also questions 2.12, 12.38, 16.7, and 20.5.

--
Martin
http://martinobrien.co.uk/

Nov 14 '05 #7

Eric Sosman

Martin wrote:

To ensure the partial quote I gave in my initial post was not misleading,
this is the question and answer from the book (c)1996 by Addison-Wesley
Publishing Company, Inc.

Question: How can I write code to conform to these old, binary data file
formats?

Answer: It's difficult because of word size and byte-order differences,
floating-point formats, and structure padding. To get the control you need
over these particulars, you may have to read and write things a byte at a
time, shuffling and rearranging as you go. (This isn't always as bad as it
sounds and gives you both code portability and complete
control.) For example, suppose that you want to read a data structure,
consisting of a character, a 32-bit integer, and a 16-bit integer, from the
stream fp into the C structure

struct mystruct {
char c;
long int i32;
int i16;
};

You might use code like this:

s.c = getc(fp);

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (unsigned)(getc(fp) << 8);
s.i32 |= getc(fp);

s.i16 = getc(fp) << 8;
s.i16 |= getc(fp);

This code assumes that getc reads 8-bit characters and that the data is
stored most significant byte first ("big endian"). The casts to (long)
ensure that the 16- and 24-bit shifts operate on long values (see question
3.14), and the cast to (unsigned) guards against sign extension. (In
general, it's safer to use all unsigned types when writing code like this,
but see question 3.19.)
This code seems to arise from an odd combination of
caution, carelessness, and micro-optimization. The design
considerations may have evolved along these lines:

Caution: Since an `int' could be as narrow as 16 bits,
use `long' to store the final value, safe in the knowledge
that `long' is at least 32 bits wide. For the same reason,
convert the first two getc() results from `int' to `long'
before shifting, since the shifts might be too wide for a
narrow `int'.

Optimization: The third getc() result is shifted only
8 bits, so it will fit in an `int' even if `int' is only
16 bits wide. Doing arithmetic on an `int' may be a hair
faster than on a `long', so shift first and convert later.

Carelessness: If `int' is only 16 bits wide, this
shift may slide a high-order 1-bit from the getc() result
into the sign position of the `int'. This will cause no
harm on most machines, but the C language doesn't actually
specify what will happen. (The same carelessness afflicts
the shifting of the first byte, too.)

Caution: If the shift did in fact slide a 1-bit into
the sign position of a 16-bit `int' and thereby make it
negative, converting this `int' to `long' will propagate
the sign bit leftward and the subsequent `|' will clobber
the two bytes already processed. Hence the `unsigned' cast:
if `int' is 16 bits wide it will be zero-extended instead of
sign-extended, and if `int' is wider it won't be negative
anyhow.

Optimization: Since the fourth getc() result is non-
negative and doesn't get shifted, this sign bit is zero and
conversion to `long' will not "smear" the first three bytes.
The conversion can go straight from `int' to `long' safely.

Carelessness: Of course, all these getc() calls can fail,
and the results should be checked against EOF before being
used. I assume Mr. Summit omitted the checks for brevity.
(Alternatively, the individual checks could be omitted if
tests of feof() and ferror() followed the whole sequence.)

The optimizations seem pointless to me. If there is any
speed advantage for shift-convert over convert-shift, that
advantage will be tiny compared to the I/O activity that
provides the incoming bytes. Suppose a disk read takes 10ms
to fetch 64KB of input: that's ~150ns per byte, or about 450
processing cycles on a 3GHz machine. If shift-then-convert
saves two cycles, say, you have saved a whopping two-tenths
of one percent -- it seems likely that almost any program you
can name presents more significant optimization opportunities
elsewhere. (The other way to think about this is to note that
64KB per 10ms means bytes arrive at a rate of 6.5MHz, which is
peanuts compared to even a 1GHz=1000MHz machine.)

If we throw out the pointless optimizations, we get
something like

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (long)getc(fp) << 8;
s.i32 |= (long)getc(fp) << 0;

.... which, I submit, makes up in clarity what little it gives
away in efficiency.
The corresponding code to write the structure might look like:

putc(s.c, fp);
putc((unsigned)((s.i32 >> 24) & 0xff), fp);
putc((unsigned)((s.i32 >> 16) & 0xff), fp);
putc((unsigned)((s.i32 >> 8) & 0xff), fp);
putc((unsigned)(s.i32 & 0xff), fp);
putc(s.i16 >> 8) & 0xff, fp);
putc(s.i16 & 0xff, fp);

I'm afraid this baffles me. I could understand, e.g.

putc( ((unsigned)(s.i32 >> 24)) & 0xFF, fp);

on the grounds of avoiding the need for a `long' version of
0xFF, but as written I simply don't get it. (Besides, the
next-to-last line is missing a parenthesis.) You'd better
address your question to Mr. Summit directly.

--
Er*********@sun.com

Nov 14 '05 #8

Martin

"Eric Sosman" wrote:
(Besides, the next-to-last line is missing a parenthesis.)

You'd better address your question to Mr. Summit directly.

Thanks for that response. The penultimate line should be

putc((s.i16 >> 8) & 0xff, fp);

as you point out.

Martin

Nov 14 '05 #9

CBFalconer

Martin wrote:

.... snip ...
For example, suppose that you want to read a data structure,
consisting of a character, a 32-bit integer, and a 16-bit
integer, from the stream fp into the C structure

struct mystruct {
char c;
long int i32;
int i16;
};

You might use code like this:

s.c = getc(fp);

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (unsigned)(getc(fp) << 8);
s.i32 |= getc(fp);

I hope not. What if CHAR_BIT is greater than 8? What about EOF?
What you might do (assuming hi byte first in the stream) is:

#include <limits.h>
unsigned long u;
int i;

for (i = 0, u = 0; i < 4; i++) {
/* you may want to include error traps for getc
returning anything larger than 255 or EOF */
u = u * 256 + (getc(fp) & 0xff);
}
if (u < LONG_MAX) s.i32 = u;
else {
/* take corrective action on overflow */
/* creating a neg. value is system dependant */
}

and if you really need the obfuscation you can use "<< 8" in place
of "* 256".

Notice how the standard network assumption of hi byte first eases
the translation of an incoming stream, and does not hamper
generation of an output stream. You can also settle the possible
negations etc. on the initial input byte, and make any following
code bulletproof.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #10

Peter Nilsson

> Jack Klein wrote:

s.i16 |= (unsigned)(getc(fp) << 8);
If the int returned by getc() is negative, left shifting it produces undefined behavior.
In C90 perhaps, but not always in C99.
If the int returned by getc() has a value greater
than 255, left shifting it produces undefined behavior.

Actually, anything over 127 is problematic.

Derrick Coetzee wrote: The shift is done before the cast, though. To avoid undefined behaviour you would want to do:

s.i16 |= ((unsigned)getc(fp)) << 8;

Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is 1)
sizeof(unsigned char) == 1 does not limit the upper bound of unsigned
char. CHAR_BIT may be more than 8. or a negative value (EOF is negative).

--
Peter

Nov 14 '05 #11

Derrick Coetzee

Micah Cowan wrote:

Derrick Coetzee wrote:
Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is
1) or a negative value (EOF is negative).

[ . . . ] be assured that it is entirely possible for getc()
to return a value exceding 255, on systems with more than 8 bits to a
byte. There are people here who have worked on such implementations.

Oops, I should've said, "exceeding UCHAR_MAX" - I keep assuming CHAR_BIT
is 8.
--
Derrick Coetzee
I grant this newsgroup posting into the public domain. I disclaim all
express or implied warranty and all liability. I am not a professional.

Nov 14 '05 #12

Micah Cowan

Peter Nilsson wrote:

Jack Klein wrote:
s.i16 |= (unsigned)(getc(fp) << 8);

If the int returned by getc() is negative, left shifting it
produces
undefined behavior.

In C90 perhaps, but not always in C99.

Yes, always. Read 6.5.7#4:

The result of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are filled with zeros.
If E1 has an unsigned type.... If E1 has a signed
type and nonnegative value... otherwise, the behavior is
undefined.

Nov 14 '05 #13

Similar topics

HexToInt, vbintoHexStr - fun with CAST

by: Aaron W. West | last post by:

Fun with CAST! (Optimized SQLServerCentral script posts) I found some interesting "tricks" to convert binary to hexadecimal and back, which allow doing 4 or 8 at a time. Test code first: --...

Microsoft SQL Server

"Bus Error" related to compiler option

by: Chul Min Kim | last post by:

Hi, I got a BUS ERROR from one of my company's program. Let me briefly tell our environment. Machine : Sun E3500 (Ultra Sparc II 400Mhz CPU 4EA) OS : Solaris7 Compiler : Sun Workshop...

C / C++

question related to sizeof operator

by: junky_fellow | last post by:

Consider the following piece of code: #include <stddef.h> int main (void) { int i, j=1; char c; printf("\nsize =%lu\n", sizeof(i+j));

C / C++

by: Marcelo | last post by:

Hello! I am developping a Visual C++ .NET 2003 multiple forms application. My problem is: When running my application, I click a button and a new main menu item is created. Ok. Now I want to...

.NET Framework

More of a raw kind of cast

by: Frederick Gotham | last post by:

Before I begin, here's a list of assumptions for this particular example: (1) unsigned int has no padding bits, and therefore no invalid bit- patterns or trap representations. (2) All types have...

C / C++

A question related to virtual function

by: babu | last post by:

class CPrintString { void printName(Person* p) { printf("person"); } void printName(Student * s) { printf("student");

C / C++

storing related structs

by: Gray Alien | last post by:

I have two related structs: struct A { int x ; void * data ; }; and

C / C++

Does anyone know what this text means? (related to C4251 warning)

by: Anonymous | last post by:

On MS site: http://msdn2.microsoft.com/en-us/library/esew7y1w(VS.80).aspx is the following garbled rambling: "You can avoid exporting classes by defining a DLL that defines a class with...

.NET Framework

Question related to printf

by: somenath | last post by:

Hi All, I have one question regarding the behavior of printf function. In page number of 154 in K&R2 in Table 7-1 it is stated that Characters Argument Type :...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing