getc and "large" bytes - Page 2

vippstar

Assuming all the values of int are in the range of unsigned char, what
happends if getc returns EOF?
Is it possible that EOF was the value of the byte read?
Does that mean that code aiming for maximum portability needs to check
for both feof() and ferror()?
(for example, if both feof() and ferror() return 0 for the stream when
getc() returned EOF, consider EOF a valid byte read)
To me, that seems to be the case, but maybe the standard says this to
be incorrect.

As always, all replies appreciated.

Jun 27 '08

Subscribe Reply

2098

Richard Heathfield

Bartc said:

>
"Ben Pfaff" <bl*@cs.stanfor d.eduwrote in message

<snip>

>-1 is in the range of int.
-1 is not in the range of unsigned char.
Therefore it is not true that all the values of int are in the
range of unsigned char.

The OP mentioned an example where both might be 16 bits.

That doesn't affect Ben's counter-example (or my range of counter-examples,
presented earlier).

So -1 in one
could be 0xFFFF in the other, causing ambiguity

There is no ambiguity here. 0xFFFF is an integer constant with the value
65535. This is not equal to -1.

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Jun 27 '08 #11

Keith Thompson

"Bartc" <bc@freeuk.comw rites:

"Ben Pfaff" <bl*@cs.stanfor d.eduwrote in message
news:87******** ****@blp.benpfa ff.org...
>vi******@gmail. com writes:

>>On May 23, 6:35 pm, Ben Pfaff <b...@cs.stanfo rd.eduwrote:
vipps...@gma il.com writes:
Assuming all the values of int are in the range of unsigned char, what
happends if getc returns EOF?

Your assumption is false.
Would you please elaborate?

-1 is in the range of int.
-1 is not in the range of unsigned char.
Therefore it is not true that all the values of int are in the
range of unsigned char.

The OP mentioned an example where both might be 16 bits. So -1 in one could
be 0xFFFF in the other, causing ambiguity in the (I think unlikely) event of
reading a 16-bit character 0xFFFF from a file with 16-bit encoding.

No, -1 and 0xFFFF are two different values. It's possible that one of
those values is the result of converting the other.

(How would such a character size read standard 8-bit files? By
zero-extending to 16?)

It would be implementation-defined, or perhaps undefined.

For such an implementation to see an 8-bit file, the file would have
to have been copied to the system, or at least made visible somehow.
Such copying might necessarily involve some sort of conversion. The
conversion is outside the scope of C.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Jun 27 '08 #12

Keith Thompson

Eric Sosman <Er*********@su n.comwrites:
[...]

It seems to me that the behavior required of getc() places
far-reaching requirements on implementations where `int' and
`char' have the same width. Here are a few:

1) Since `unsigned char' can represent 2**N distinct values
and all of these must be distinguishable when converted to `int',
it follows that `int' must also have 2**N distinct values. Thus,
signed-magnitude and ones' complement representations are ruled
out, and INT_MIN must have its most negative possible value
(that is, INT_MIN == -INT_MAX - 1, all-bits-set cannot be a trap
representation) .

[...]

How do you conclude that all 2**N distinct values of type unsigned
char must be distinguishable when converted to int? The result of the
conversion is implementation-defined. If, for example, int has the
range -32768 .. +32767, and unsigned char has the range 0 .. 65536, I
see nothing in the standard that forbids converting all unsigned char
values greater than 32767 to 32767 (saturation). It would break
stdio, but I'm not convinced that that would make it non-conforming
(particularly for a freestanding implementation that needn't provide
stdio).

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Jun 27 '08 #13

Richard Tobin

In article <ln************ @nuthaus.mib.or g>,
Keith Thompson <ks***@mib.orgw rote:

>In my opinion, it would be reasonable for the standard to require
INT_MAX >= UCHAR_MAX for all hosted implementations .

An implementation with, say, 16-bit ints and chars is still likely to
have 8-bit data on disk and most other input sources. In which case
fgetc() could read 8-bit values, and have no problem. At least, I
don't konw of anything in the standard that prevents this.

One can imagine a future implementation that uses UTF-32-encoded
Unicode characters, and has 32-bit chars. In that case there is no
problem with text (because Unicode in fact only goes up to about
2^20), but binary data would still have the problem.

-- Richard
--
In the selection of the two characters immediately succeeding the numeral 9,
consideration shall be given to their replacement by the graphics 10 and 11 to
facilitate the adoption of the code in the sterling monetary area. (X3.4-1963)

Jun 27 '08 #14

Richard Tobin

In article <NH************ *****@text.news .virginmedia.co m>,
Bartc <bc@freeuk.comw rote:

>The OP mentioned an example where both might be 16 bits. So -1 in one could
be 0xFFFF in the other, causing ambiguity in the (I think unlikely) event of
reading a 16-bit character 0xFFFF from a file with 16-bit encoding.

There is a problem with chars and ints of equal size, but the OP
expressed it wrongly: he talked about "all the values of int [being]
in the range of unsigned char" - which can't happen, because negative
ints aren't in the range of unsigned char. The right way to put it is
that some of the values of unsigned char are not representable as int.

-- Richard
--
In the selection of the two characters immediately succeeding the numeral 9,
consideration shall be given to their replacement by the graphics 10 and 11 to
facilitate the adoption of the code in the sterling monetary area. (X3.4-1963)

Jun 27 '08 #15

Eric Sosman

Keith Thompson wrote:

Eric Sosman <Er*********@su n.comwrites:
[...]
> It seems to me that the behavior required of getc() places
far-reaching requirements on implementations where `int' and
`char' have the same width. Here are a few:

1) Since `unsigned char' can represent 2**N distinct values
and all of these must be distinguishable when converted to `int',
it follows that `int' must also have 2**N distinct values. Thus,
signed-magnitude and ones' complement representations are ruled
out, and INT_MIN must have its most negative possible value
(that is, INT_MIN == -INT_MAX - 1, all-bits-set cannot be a trap
representation ).
[...]

How do you conclude that all 2**N distinct values of type unsigned
char must be distinguishable when converted to int? The result of the
conversion is implementation-defined. If, for example, int has the
range -32768 .. +32767, and unsigned char has the range 0 .. 65536, I
see nothing in the standard that forbids converting all unsigned char
values greater than 32767 to 32767 (saturation). It would break
stdio, but I'm not convinced that that would make it non-conforming
(particularly for a freestanding implementation that needn't provide
stdio).

My case for distinguishabil ity was in the part you snipped,
labeled "1a)". It derives from the Standard's requirement that
bytes read back from a binary stream must compare equal to those
written to it (on the same implementation, not counting trailing
zeroes, et cetera). If there are fewer `int' values than there
are `unsigned char' values, then by the pigeonhole principle there
must be at least one collision where two distinct `unsigned char'
values V1 and V2 convert to the same `int' value. Then this
code fragment

putc(V1, stream);
putc(V2, stream);
rewind(stream);
assert(getc(str eam) == V1);
assert(getc(str eam) == V2);

.... cannot succeed. (Yes, I know, it's very bad to generate
side-effects in an assert(), but this is just for illustration.)

"Upon further review," as they say in American football, I
guess an implementation could choose to report an I/O error if
it ever encountered V2, say, on input. (If "helpful," it would
also report an error for any attempt to write V2.) That would
give an extremely low QoI, but the Standard does not forbid I/O
operations from failing "predictabl y." (Indeed, on many systems
fopen("/", "w") will fail predictably.) So perhaps a sufficiently
bad implementation could in fact claim conformance even if unable
to read and write all `unsigned char' values, and this would allow
signed magnitude and ones' complement (and two's complement with
one trap representation) .

And, of course, no argument based on the behavior of getc()
has any force for freestanding implementations .

--
Er*********@sun .com

Jun 27 '08 #16

Bartc

"Keith Thompson" <ks***@mib.orgw rote in message
news:ln******** ****@nuthaus.mi b.org...

"Bartc" <bc@freeuk.comw rites:
>"Ben Pfaff" <bl*@cs.stanfor d.eduwrote in message
news:87******* *****@blp.benpf aff.org...
>>vi******@gmail. com writes:

On May 23, 6:35 pm, Ben Pfaff <b...@cs.stanfo rd.eduwrote:
vipps...@gm ail.com writes:
Assuming all the values of int are in the range of unsigned char,
what
happends if getc returns EOF?
>
Your assumption is false.
Would you please elaborate?

-1 is in the range of int.
-1 is not in the range of unsigned char.
Therefore it is not true that all the values of int are in the
range of unsigned char.

The OP mentioned an example where both might be 16 bits. So -1 in one
could
be 0xFFFF in the other, causing ambiguity in the (I think unlikely) event
of
reading a 16-bit character 0xFFFF from a file with 16-bit encoding.

No, -1 and 0xFFFF are two different values. It's possible that one of
those values is the result of converting the other.

I don't understand. In a 16-bit system where all 65536 bit patterns might
represent characters, what bit pattern would you use to signal EOF?

(Reading Eric's first post:

An implication of (1) for the programmer is that yes, there
will be a legitimate `unsigned char' value that maps to EOF
when converted to `int'.

this seems to suggest that yes an ambiguity can occur.)

--
bartc

Jun 27 '08 #17

Keith Thompson

"Bartc" <bc@freeuk.comw rites:

"Keith Thompson" <ks***@mib.orgw rote in message
news:ln******** ****@nuthaus.mi b.org...
>"Bartc" <bc@freeuk.comw rites:

[...]

>>The OP mentioned an example where both might be 16 bits. So -1 in
one could be 0xFFFF in the other, causing ambiguity in the (I
think unlikely) event of reading a 16-bit character 0xFFFF from a
file with 16-bit encoding.

No, -1 and 0xFFFF are two different values. It's possible that one of
those values is the result of converting the other.

I don't understand. In a 16-bit system where all 65536 bit patterns might
represent characters, what bit pattern would you use to signal EOF?

Bit patterns are not values. A value is an *interpretation * of a bit
pattern; the interpretation is done with respect to a specified type.

For example, an object of type float with the value 123.0 and an
object of type unsigned int with the value 0x42f60000 might happen to
contain the same bit pattern, but they have distinct values because
those bit patterns (representation s) are interpreted as having
different types.

(Reading Eric's first post:

> An implication of (1) for the programmer is that yes, there
will be a legitimate `unsigned char' value that maps to EOF
when converted to `int'.

this seems to suggest that yes an ambiguity can occur.)

Yes.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Jun 27 '08 #18

Walter Roberson

In article <ln************ @nuthaus.mib.or g>,
Keith Thompson <ks***@mib.orgw rote:

>Bit patterns are not values. A value is an *interpretation * of a bit
pattern; the interpretation is done with respect to a specified type.

Not in C: in C, a bit pattern is a *representation * of a value.
A machine doesn't have to use real bits (binary digits) as long as
the operators produce the right -values-.

(Though, I'd want to have another look over the wording on
floating point representations , as I seem to recall that that
wording could be interpreted as requiring Real Bits (SM).)
--
"Walter is undoubtedly the country's and club's most popular player."
-- vitalfootball.c o.uk

Jun 27 '08 #19

lawrence.jones

Richard Tobin <ri*****@cogsci .ed.ac.ukwrote:

>
An implementation with, say, 16-bit ints and chars is still likely to
have 8-bit data on disk and most other input sources. In which case
fgetc() could read 8-bit values, and have no problem. At least, I
don't konw of anything in the standard that prevents this.

Writing a byte with fputc() and then reading it back with fgetc() must
produce the same value. That won't happen if you only write or read
half the bits.

-- Larry Jones

It must be sad being a species with so little imagination. -- Calvin

Jun 27 '08 #20

Similar topics

6397

Is there a "Large Scale Python Software Design" ?

by: Andrea Griffini | last post by:

I did it. I proposed python as the main language for our next CAD/CAM software because I think that it has all the potential needed for it. I'm not sure yet if the decision will get through, but something I'll need in this case is some experience-based set of rules about how to use python in this context. For example... is defining readonly attributes in classes worth the hassle ? Does duck-typing scale well in complex

Python

20564

"Record is too large"?

by: tekctrl | last post by:

Anyone: I have a simple MSAccess DB which was created from an old ASCII flatfile. It works fine except for something that just started happening. I'll enter info in a record, save the record, and try to move to another record and get an Access error "Record is too large". The record is only half filled, with many empty fields. If I remove the added data or delete some older data, then it saves ok and works fine again. Whenever I'm...

Microsoft Access / VBA

9563

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9386

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10145

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9938

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9822

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8822

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7366

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6642

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5270

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration