integer overflow in scanf functions - Page 2

vid512

hi.

i wanted to know why doesn't the scanf functions check for overflow
when reading number. For example scanf("%d" on 32bit machine considers
"1" and "4294967297 " to be the same.

I tracked to code to where the conversion itself happens. Code in
scanfs just ignores return value from conversion procedures.

More info in case of glibc posted here:
http://board.flatassembler.net/topic.php?t=6359

AFAIK, implementation doesn't define behavior in case of overflow, so
glibc could consider this error and return errno=ERANGE

Dec 15 '06

Subscribe Reply

9534

CBFalconer

jacob navia wrote:

>

.... snip ...

>
In general functions like scanf are unusable. They are so
problematic, that it is a wonder when they work at all.

Use strtol, or a similar function that will give reasonable
error returns...

No, that requires assigning a buffer of sufficient size, which is
unknown a-priori. Instead take a look at:

<http://cbfalconer.home .att.net/download/txtio.zip>

(which has been revised, but not posted) for a method of reading
values from a text stream without any buffer assignment needed. In
particular see txtinput.c.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net>

Dec 16 '06 #11

Peter Nilsson

Eric Sosman wrote:

Walter Roberson wrote:
In article <sl************ *******@rlaptop .random.yi.org> ,
Random832 <ra*******@gmai l.comwrote:
>
It's undefined. Which means there _are_ no requirements. An
implementation is free to treat it as 1, or as 429496729 with 7 still on
the stream, or as such with 7 _not_ still on the stream, or as
4294967295 (saturation), etc, etc
No, consumption of the maximum characters is -required-. It cannot
leave the other characters in the stream. The undefined part comes
in the valuation and storage of the overly-long result, not in
how many characters are consumed from input.

Once undefined behavior strikes, the program has no way
to tell how many characters were or were not consumed.
All requirements lose their force in the face of U.B.

True, but suppose an implementation defines the usual non-trapping 2c
overflow or strtoxxx style behaviour for the %d fscanf case, then the
behaviour is no longer undefined and the normal rules apply.

Of course, few implementations go so far as to actually define (i.e.
guarantee) such behaviour, let alone document it.

--
Peter

Dec 16 '06 #12

Walter Roberson

In article <sl************ *******@rlaptop .random.yi.org> ,
Random832 <ra*******@gmai l.comwrote:

>2006-12-15 <el**********@c anopus.cc.umani toba.ca>,
Walter Roberson wrote:
>In article <sl************ *******@rlaptop .random.yi.org> ,
Random832 <ra*******@gmai l.comwrote:

>>>Unless assignment suppression was indicated by a *, the result
of the conversion is placed in the object pointed to by the first
argument following the format argument that has not
already received a conversion result. If this object does not
have an appropriate type, or if the result of the conversion cannot
be represented in the space provided, the behaviour is undefined."

>No, I don't think you get it.

>In an undefined situation, the standard forbids nothing.

>Meaning the implementation gets to do whatever the f*** it wants to,
regarding anything, once anything has happened that has been undefined.

The C90 standard defines a three-part operation, first reading
the characters, then converting the type of the value, and then
attempting to store the received value. The first two parts
do not allow for undefined behaviour: only the storage aspect does.

Therefor, in a conforming C90 implementation, the complete sequence
of decimal digits is certain to be read. Stopping reading the stream
at the maximum usable int length (for %d) is not one of the options.
The "undefined behaviour" might then go through the trouble of
"putting back" the extra characters somehow, but read them first it
must.

Ah, there's a simple way to tell: use assignment supression. Then no
actual storage attempt takes place, so whether the receiving variable
is the right size or type is not at question, and undefined behaviour
cannot take place. If you then have another format element to read a
value, or use %n to find the number of characters read, you can
determine where the %d scan left off. C90 tells you where you
should be (i.e., after the sequence of decimal characters); if
your system does leave you in the middle then your system is wrong.
--
There are some ideas so wrong that only a very intelligent person
could believe in them. -- George Orwell

Dec 16 '06 #13

Chris Torek

In article <sl************ *******@rlaptop .random.yi.org>
Random832 <ra*******@gmai l.comwrote:

>Anyway, I found a possible situation in which my scanf is
non-conformant:

Numerical strings are truncated to 512 characters; for example, %f
and %d are implicitly %512f and %512d.

So, if I send %f

1.000000000000 000000000000000 000000000000000 000000000000000 0000
00000000000000 000000000000000 000000000000000 000000000000000 0000
00000000000000 000000000000000 000000000000000 000000000000000 0000
00000000000000 000000000000000 000000000000000 000000000000000 0000
00000000000000 000000000000000 000000000000000 000000000000000 0000
00000000000000 000000000000000 000000000000000 000000000000000 0000
00000000000000 000000000000000 000000000000000 000000000000000 0000
00000000000000 000000000000000 000000000000000 000000000000000 0000
e1

it converts to 1 instead of 10. Does the standard allow this?

Yes:

Environmental limits

[#7] An implementation shall support text files with lines
containing at least 254 characters, including the
terminating new-line character. The value of the macro
BUFSIZ shall be at least 256.

(under "7.13.2 Streams" in the draft .txt file I keep handy).

Most stdio implementations will have *some* convenient limit, as
they will read numerical input into a buffer and then use strtol(),
strtoll(), strtod(), etc., to perform the actual conversions. That
limit must be at least 254, but need not be as high as BUFSIZ (that
is, just because BUFSIZ is, say, 8192, does not mean that scanf()
must be able to eat 8192-digit numbers).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Dec 16 '06 #14

Random832

2006-12-16 <em**********@c anopus.cc.umani toba.ca>,
Walter Roberson wrote:

The C90 standard defines a three-part operation, first reading
the characters, then converting the type of the value, and then
attempting to store the received value. The first two parts
do not allow for undefined behaviour: only the storage aspect does.

And once the storage aspect _does_ have undefined behavior, it can
then go backwards in time and change how the other two aspects operated
in the first place.

In an undefined situation, the C standard forbids nothing.

Therefor, in a conforming C90 implementation, the complete sequence
of decimal digits is certain to be read. Stopping reading the stream
at the maximum usable int length (for %d) is not one of the options.
The "undefined behaviour" might then go through the trouble of
"putting back" the extra characters somehow, but read them first it
must.

It's undefined, there's no rule against time paradoxes.

Ah, there's a simple way to tell: use assignment supression. Then no
actual storage attempt takes place, so whether the receiving variable
is the right size or type is not at question, and undefined behaviour
cannot take place.

But since the behavior is undefined when assignment suppression is not
used, it's free to act differently than if it is used.

Dec 17 '06 #15

Random832

2006-12-16 <em*********@ne ws1.newsguy.com >,
Chris Torek wrote:

In article <sl************ *******@rlaptop .random.yi.org>
Random832 <ra*******@gmai l.comwrote:
>>Anyway, I found a possible situation in which my scanf is
non-conformant:

Numerical strings are truncated to 512 characters; for example, %f
and %d are implicitly %512f and %512d.

So, if I send %f

1.00000000000 000000000000000 000000000000000 000000000000000 00000
0000000000000 000000000000000 000000000000000 000000000000000 00000
0000000000000 000000000000000 000000000000000 000000000000000 00000
0000000000000 000000000000000 000000000000000 000000000000000 00000
0000000000000 000000000000000 000000000000000 000000000000000 00000
0000000000000 000000000000000 000000000000000 000000000000000 00000
0000000000000 000000000000000 000000000000000 000000000000000 00000
0000000000000 000000000000000 000000000000000 000000000000000 00000
e1

it converts to 1 instead of 10. Does the standard allow this?

Yes:

Environmental limits

[#7] An implementation shall support text files with lines
containing at least 254 characters, including the
terminating new-line character. The value of the macro
BUFSIZ shall be at least 256.

And what about sscanf?

int main() {
char *x[515];
double n;
memset(x+2,'0', 510);
x[0] = '1'; x[1] = '.'; x[512] = 'e'; x[513] = '1'; x[514] = 0;
sscanf(x,"%lf", &n); printf("%f",x);
}

prints 1 or 10?

Dec 17 '06 #16

Chris Torek

>>In article <sl************ *******@rlaptop .random.yi.org>

>>Random832 <ra*******@gmai l.comwrote:
>>>Does the standard allow [scanf to place limits on the size of

numbers converted with %d, %f, etc]

In article <em*********@ne ws1.newsguy.com I wrote:

>Yes:
Environmental limits

[snippage]

In article <sl************ *******@rlaptop .random.yi.org>
Random832 <ra*******@gmai l.comwrote:

>And what about sscanf?

As far as I can tell, the same rules apply.

Since there is no documentation requirement and no fixed upper
bound (just that "254" I quoted as a lower bound), each scanf
(either each call, or each member of the family, or both) could
use a different limit, too, as long as it is at least 254 each
time.

Practically speaking, I would expect either all the functions
(scanf, fscanf, and sscanf) would have the same limit because they
use the same internal engine; or the engine might "see" that sscanf
is working off a string in memory, hence there is no need to make
a copy of digit-sequences for strto*(), hence sort of "accidental ly"
avoid upper limits there. (However, the ruling that conversion
of, e.g., "1.23e-x" must fail, instead of converting "1.23" and
leaving the e-x for the next directive, would make this harder than
one might think at first. If the implementor just used the endptr
parameter from strtod(), sscanf against "1.23e-x" with "%f%s" would
convert two items successfully, instead of failing as required.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Dec 18 '06 #17

CBFalconer

Chris Torek wrote:

>

.... snip ...

>
Practically speaking, I would expect either all the functions
(scanf, fscanf, and sscanf) would have the same limit because they
use the same internal engine; or the engine might "see" that sscanf
is working off a string in memory, hence there is no need to make
a copy of digit-sequences for strto*(), hence sort of "accidental ly"
avoid upper limits there. (However, the ruling that conversion
of, e.g., "1.23e-x" must fail, instead of converting "1.23" and
leaving the e-x for the next directive, would make this harder than
one might think at first. If the implementor just used the endptr
parameter from strtod(), sscanf against "1.23e-x" with "%f%s" would
convert two items successfully, instead of failing as required.)

There is no necessity to have ANY string length limit affect these
textstream-to-number conversions. I have written code that avoids
the problem entirely. However the error condition for "1.2e-x"
sequences remains. This can obviously be handled easily when the
input is a string, and is otherwise limited by the guaranteed
lookback (ungetc) level.

I disagree that such an input must fail. The interpretation as a
number, followed by a string, seems perfectly reasonable to me.
The cure here is that the application must check the termination
char for the numeric field.

In addition, there should be no problem at the system level in
providing multi-level ungetc ability, provided that the system
never has to back up across line ends. Since a '\n' will always
terminate any numeric input field, this is no hardship. A short
time ago I wrote a small test program to detect this capability,
and found that DJGPP has it. I published the little test here at
the time. So this reduces to a quality of implementation issue.

In practice this all means that the scanf series of functions
should not be used to input numerics without limiting the call to a
single field.

Here is my test program for ungetc levels (tungetc.c):

#include <stdio.h>
#include <stdlib.h>
#define MAXLN 10

int main(void) {
char line[MAXLN + 1];
int ix, ch;

puts("Test ability to ungetc for multiple chars in one line");
fputs("Enter no more than 10 chars:", stdout); fflush(stdout);
ix = 0;
while ((EOF != (ch = getchar())) && ('\n' != ch)) {
if (MAXLN <= ix) break;
line[ix++] = ch;
}
line[ix] = '\0';
if ('\n' != ungetc('\n', stdin)) {
puts("Can't unget a '\\n'");
return(EXIT_FAI LURE);
}
puts(line);
puts("Trying to push back the whole line");
while (ix 0) {
ch = ungetc(line[--ix], stdin);
if (ch == line[ix]) putchar(ch);
else {
putchar(line[ix]);
puts(" failed to push back");
return(EXIT_FAI LURE);
}
}
puts("\nTrying to reread the whole line");
while ((EOF != (ch = getchar())) && ('\n' != ch)) {
if (ix++ == MAXLN) break;
putchar(ch);
}
return 0;
} /* main */

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net>

Dec 18 '06 #18

Random832

2006-12-18 <em*********@ne ws4.newsguy.com >,
Chris Torek wrote:

>>>In article <sl************ *******@rlaptop .random.yi.org>
Random832 <ra*******@gmai l.comwrote:
Does the standard allow [scanf to place limits on the size of

numbers converted with %d, %f, etc]

In article <em*********@ne ws1.newsguy.com I wrote:

>>Yes:
Environmental limits

[snippage]

In article <sl************ *******@rlaptop .random.yi.org>
Random832 <ra*******@gmai l.comwrote:
>>And what about sscanf?

As far as I can tell, the same rules apply.

That rule does not allow a limit for any scanf function - it allows
limits for other things which allows an implementation to be written for
which no such case is possible for scanf or fscanf - that is not the
same thing.

Since there is no documentation requirement and no fixed upper
bound (just that "254" I quoted as a lower bound), each scanf
(either each call, or each member of the family, or both) could
use a different limit, too, as long as it is at least 254 each
time.

The section you quoted has absolutely nothing to do with any *scanf
function, and even less to do with sscanf.

Dec 18 '06 #19

Random832

2006-12-18 <45************ ***@yahoo.com>,
CBFalconer wrote:

Chris Torek wrote:
>>
... snip ...
>>
Practically speaking, I would expect either all the functions
(scanf, fscanf, and sscanf) would have the same limit because they
use the same internal engine; or the engine might "see" that sscanf
is working off a string in memory, hence there is no need to make
a copy of digit-sequences for strto*(), hence sort of "accidental ly"
avoid upper limits there. (However, the ruling that conversion
of, e.g., "1.23e-x" must fail, instead of converting "1.23" and
leaving the e-x for the next directive, would make this harder than
one might think at first. If the implementor just used the endptr
parameter from strtod(), sscanf against "1.23e-x" with "%f%s" would
convert two items successfully, instead of failing as required.)

There is no necessity to have ANY string length limit affect these
textstream-to-number conversions.

He was apparently saying, though, that it is _permitted_ for an
implementation to limit it to 512 characters, and quoted an unrelated
section of the standard that makes it difficult [but clearly not
impossible, as shown by my post] to construct a test case.

If I pass
1.0000000000000 000000000000000 000000000000000 000000000000000 0000\
000000000000000 000000000000000 000000000000000 000000000000000 0000\
000000000000000 000000000000000 000000000000000 000000000000000 0000\
000000000000000 000000000000000 000000000000000 000000000000000 0000\
000000000000000 000000000000000 000000000000000 000000000000000 0000\
000000000000000 000000000000000 000000000000000 000000000000000 0000\
000000000000000 000000000000000 000000000000000 000000000000000 0000\
000000000000000 000000000000000 000000000000000 000000000000000 0000e1 to
scanf, I expect it to come back with ten, not one, as the result value.

No-one has provided a convincing argument that an implementation which
stores 1. in the pointed-to argument is legal.

Dec 18 '06 #20

Similar topics

5373

Integer overflow

by: Enrico 'Trippo' Porreca | last post by:

I believe there can be an integer overflow, without a silent wrap-around, in the following example: int a = INT_MAX; a++; Am I right? Could this lead to an abnormal program termination in some implementations? If so, could this happen without an arithmetical operation, i.e. because

C / C++

6276

detecting integer overflow

by: junky_fellow | last post by:

Is there any way by which the overflow during addition of two integers may be detected ? eg. suppose we have three unsigned integers, a ,b, c. we are doing a check like if ((a +b) > c) do something;

C / C++

9952

Unsigned integer overflow detection

by: Raymond | last post by:

Source: http://moryton.blogspot.com/2007/08/detecting-overflowunderflow-when.html Example from source: char unsigned augend (255); char unsigned const addend (255); char unsigned const sum (augend + addend); if (sum < augend)

C / C++

7036

Catching integer overflow

by: thomas.mertes | last post by:

Is it possible to use some C or compiler extension to catch integer overflow? The situation is as follows: I use C as target language for compiled Seed7 programs. For integer computions the C type 'long' is used. That way native C speed can be reached. Now I want to experiment with raising a Seed7 exception (which is emulated with setjmp(), longjmp() in C) for integer

C / C++

9645

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9480

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10330

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10093

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9952

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8976

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6740

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5381

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5511

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET