473,320 Members | 1,921 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Error in scanf implementation or error in example in standard?

The following Example 3 is given in the 1999 C standard for the function
fscanf:
EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of
measure, and an item name:

#include <stdio.h>
/* ... */
int count; float quant; char units[21], item[21];
do {
count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
fscanf(stdin,"%*[^\n]");
} while (!feof(stdin) && !ferror(stdin));

If the stdin stream contains the following lines:

2 quarts of oil
-12.8degrees Celsius
lots of luck
10.0LBS of
dirt
100ergs of energy

the execution of the above example will be analogous to the following
assignments:

quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
count = 3;
quant = -12.8; strcpy(units, "degrees");
count = 2; // "C" fails to match "o"
count = 0; // "l" fails to match "%f"
quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
count = 3;
count = 0; // "100e" fails to match "%f"
count = EOF;
I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

--
Simon.
Nov 29 '06 #1
27 2784
Simon Biber wrote:
The following Example 3 is given in the 1999 C standard for the function
fscanf:
EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of
measure, and an item name:

#include <stdio.h>
/* ... */
int count; float quant; char units[21], item[21];
do {
count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
fscanf(stdin,"%*[^\n]");
} while (!feof(stdin) && !ferror(stdin));

If the stdin stream contains the following lines:

2 quarts of oil
-12.8degrees Celsius
lots of luck
10.0LBS of
dirt
100ergs of energy

the execution of the above example will be analogous to the following
assignments:

quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
count = 3;
quant = -12.8; strcpy(units, "degrees");
count = 2; // "C" fails to match "o"
count = 0; // "l" fails to match "%f"
quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
count = 3;
count = 0; // "100e" fails to match "%f"
count = EOF;

I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?
Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard but the sentiment among many implementors is that the
requirement is unjustified and they just live with non-conformance.

Robert Gamble

Nov 29 '06 #2
Robert Gamble said:

<snip>
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard
Why?

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Nov 29 '06 #3
"Robert Gamble" <rg*******@gmail.comwrote:
Simon Biber wrote:
I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."
True, but feetneet are not normative. Strictly speaking, there's a
conflict between two parts of the Standard; the footnote makes it clear
that in this case, the intent was that the part about a single character
pushback buffer for input streams overrides the part about parsing
numbers, but it would be better if that were made explicit in the
_normative_ text in the next TC.

Richard
Nov 29 '06 #4
Richard Heathfield wrote:
Robert Gamble said:

<snip>
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard

Why?
Why what? Why such implementations aren't technically conforming?
Because implementations that push back more than one character in the
fscanf family of functions do not behave as mandated by the Standard.
I am not sure I understand your point, perhaps you could clarify with a
multi-word response.

Robert Gamble

Nov 29 '06 #5
"Robert Gamble" <rg*******@gmail.comwrites:
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard but the sentiment among many implementors is that the
requirement is unjustified and they just live with non-conformance.
C99 says this in the description of the ungetc function:

One character of pushback is guaranteed. If the ungetc
function is called too many times on the same stream without
an intervening read or file positioning operation on that
stream, the operation may fail.

I don't see a requirement that *only* one character of pushback
be supported, only that *at least* one character of pushback be
supported.

On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:

An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the
longest sequence of input characters which does not exceed
any specified field width and which is, or is a prefix of, a
matching input sequence.242)

242) fscanf pushes back at most one input character onto the
input stream. Therefore, some sequences that are
acceptable to strtod, strtol, etc., are unacceptable
to fscanf.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}
Nov 29 '06 #6
Richard Bos wrote:
"Robert Gamble" <rg*******@gmail.comwrote:
Simon Biber wrote:
I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".
>
The actual behaviour varies. Some will match '100', leaving the 'e' unread:
>
quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;
>
While others will match '100e', leaving the 'r' unread:
>
quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;
>
But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?
Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

True, but feetneet are not normative.
And neither are the examples for that matter.
Strictly speaking, there's a
conflict between two parts of the Standard; the footnote makes it clear
that in this case, the intent was that the part about a single character
pushback buffer for input streams overrides the part about parsing
numbers, but it would be better if that were made explicit in the
_normative_ text in the next TC.
I certainly agree that it would have been nice if this footnote was
part of the normative text, I don't know why it isn't. The only
conflict I see is the one in the C90 Standard which was addressed in DR
022. Although the footnote is non-normative, it along with the example
and the fact that it was the result of a DR make it abundantly clear
what the intent was. If intent isn't enough though, a careful reading
of the normative changes made in the DR (which were carried through to
C99) yield the same result even if not as clearly spelled out.

Robert Gamble

Nov 29 '06 #7
Robert Gamble said:
Richard Heathfield wrote:
>Robert Gamble said:

<snip>
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard

Why?

Why what? Why such implementations aren't technically conforming?
Yes.
Because implementations that push back more than one character in the
fscanf family of functions do not behave as mandated by the Standard.
Why not?
I am not sure I understand your point, perhaps you could clarify with a
multi-word response.
<grinOkay, let me see if I can make it clearer. Maybe you're right that
providing more than the minimum level of pushback is against the rules, and
maybe you're not. I can see why an implementation *must* provide at least
one character of pushback, but where is it *forbidden* from providing more?

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Nov 29 '06 #8
Ben Pfaff wrote:
"Robert Gamble" <rg*******@gmail.comwrites:
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard but the sentiment among many implementors is that the
requirement is unjustified and they just live with non-conformance.

C99 says this in the description of the ungetc function:

One character of pushback is guaranteed. If the ungetc
function is called too many times on the same stream without
an intervening read or file positioning operation on that
stream, the operation may fail.

I don't see a requirement that *only* one character of pushback
be supported, only that *at least* one character of pushback be
supported.
I was speaking specifically of the pushback used by the fscanf function
which I thought was clear based on the footnote that I cited. I
certainly did not mean to imply that multi-character pushback was
itself incorrect, just its use in the fscanf function.
On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:

An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the
longest sequence of input characters which does not exceed
any specified field width and which is, or is a prefix of, a
matching input sequence.242)

242) fscanf pushes back at most one input character onto the
input stream. Therefore, some sequences that are
acceptable to strtod, strtol, etc., are unacceptable
to fscanf.
Right, I cited this exact footnote at the beginning of my original
article, perhaps your missed it.

Robert Gamble

Nov 29 '06 #9
Richard Heathfield wrote:
Robert Gamble said:
Richard Heathfield wrote:
Robert Gamble said:

<snip>

Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard

Why?
Why what? Why such implementations aren't technically conforming?

Yes.
Because implementations that push back more than one character in the
fscanf family of functions do not behave as mandated by the Standard.

Why not?
I am not sure I understand your point, perhaps you could clarify with a
multi-word response.

<grinOkay, let me see if I can make it clearer. Maybe you're right that
providing more than the minimum level of pushback is against the rules, and
maybe you're not. I can see why an implementation *must* provide at least
one character of pushback, but where is it *forbidden* from providing more?
First let me make clear that I am speaking only of the pushback
functionality used within the fscanf function itself, not the pushback
capability of a stream in general (which can provide pushback for as
many characters as it desires), at least one person seems to have been
confused by my original statement. The Standard makes it clear through
the discussed footnote and example that the behavior shall be as if a
maximum of one character of pushback was used within the fscanf
function ("fscanf pushes back at most one input character onto the
input stream"). Although footnotes and examples are non-normative, the
same meaning is supported by the normative changes that were provoked
by DR 022:

In subclause 7.9.6.2, page 135, lines 31-33, change:

"An input item is defined as the longest matching sequence of input
characters, unless that exceeds a specified field width, in which case
it is the initial subsequence of that length in the sequence."

to:

"An input item is defined as the longest sequence of input characters
which does not exceed any specified field width and which is, or is a
prefix of, a matching input sequence."

Robert Gamble

Nov 29 '06 #10
Robert Gamble said:
The Standard makes it clear through
the discussed footnote and example that the behavior shall be as if a
maximum of one character of pushback was used within the fscanf
function ("fscanf pushes back at most one input character onto the
input stream").
Thank you for clarifying. I know you know that footn...
Although footnotes and examples are non-normative,
....er, quite so.
the
same meaning is supported by the normative changes that were provoked
by DR 022:
I've found DRs 200 through 294. I can't find DR 022.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Nov 29 '06 #11
Richard Heathfield wrote:
Robert Gamble said:
The Standard makes it clear through
the discussed footnote and example that the behavior shall be as if a
maximum of one character of pushback was used within the fscanf
function ("fscanf pushes back at most one input character onto the
input stream").

Thank you for clarifying. I know you know that footn...
Although footnotes and examples are non-normative,

...er, quite so.
the
same meaning is supported by the normative changes that were provoked
by DR 022:

I've found DRs 200 through 294. I can't find DR 022.
The link was in my original response:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.

Robert Gamble

Nov 29 '06 #12
"Robert Gamble" <rg*******@gmail.comwrites:
>On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:
[...]
Right, I cited this exact footnote at the beginning of my original
article, perhaps your missed it.
I did miss it, sorry.
--
Ben Pfaff
email: bl*@cs.stanford.edu
web: http://benpfaff.org
Nov 29 '06 #13
Robert Gamble said:
Richard Heathfield wrote:
<snip>
>>
I've found DRs 200 through 294. I can't find DR 022.

The link was in my original response:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.
My apologies for missing that. It does appear that the text under
consideration is still non-normative. (It's footnote 245 in n1124, for
those who don't know).

Having said that, I accept that the intent of footnotes, despite their
non-normative status, is to clarify the meaning of the Standard, so I'll
shut up now.

(Like I care ***so much*** about fscanf! :-) )

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Nov 29 '06 #14
Robert Gamble wrote:
Simon Biber wrote:
>I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.
But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed back.

--
Simon.
Nov 30 '06 #15
Simon Biber wrote:
Robert Gamble wrote:
Simon Biber wrote:
I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?
Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.

But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed back.
There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1". glibc is known to accept certain invalid
numeric sequences but they don't seem willing to acknowledge such
problems.

I tested a number of implementations a while ago and had the same
results that you have seen. I believe the that at least the Solaris
and glibc folk are aware of this particular issue but they don't seem
to have any plans to change their behavior. I believe that uClibc
(http://uclibc.org/) handled this case correctly, but I'm not positive.

I haven't tried this on Dinkumware as I don't have access to it but if
this was going to be handled correctly on any implementation it would
probably be the Dinkumware C99 library. Their library claims to be
certified by Perennial as C99-compliant and I believe the behavior in
question is tested in the certification process. If anyone has access
to this library it would be nice if they could confirm how it handles
the this. Additionally, if it does handle this correctly, I would be
curious to know if the same string is handled the same way with the
sscanf function (I believe it should but some people do not, the
Standard isn't crystal clear in my opinion).

Robert Gamble

Nov 30 '06 #16
"Robert Gamble" <rg*******@gmail.comwrote in message
news:11**********************@l39g2000cwd.googlegr oups.com...
Simon Biber wrote:
>Robert Gamble wrote:
Simon Biber wrote:
I have tested several implementations and none of them get the last
case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e'
unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the
example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.

But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed
back.

There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1". glibc is known to accept certain invalid
numeric sequences but they don't seem willing to acknowledge such
problems.

I tested a number of implementations a while ago and had the same
results that you have seen. I believe the that at least the Solaris
and glibc folk are aware of this particular issue but they don't seem
to have any plans to change their behavior. I believe that uClibc
(http://uclibc.org/) handled this case correctly, but I'm not positive.

I haven't tried this on Dinkumware as I don't have access to it but if
this was going to be handled correctly on any implementation it would
probably be the Dinkumware C99 library. Their library claims to be
certified by Perennial as C99-compliant and I believe the behavior in
question is tested in the certification process. If anyone has access
to this library it would be nice if they could confirm how it handles
the this. Additionally, if it does handle this correctly, I would be
curious to know if the same string is handled the same way with the
sscanf function (I believe it should but some people do not, the
Standard isn't crystal clear in my opinion).
We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Nov 30 '06 #17
P.J. Plauger wrote:
"Robert Gamble" <rg*******@gmail.comwrote in message
news:11**********************@l39g2000cwd.googlegr oups.com...
Simon Biber wrote:
Robert Gamble wrote:
Simon Biber wrote:
I have tested several implementations and none of them get the last
case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e'
unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the
example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.

But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed
back.
There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1". glibc is known to accept certain invalid
numeric sequences but they don't seem willing to acknowledge such
problems.

I tested a number of implementations a while ago and had the same
results that you have seen. I believe the that at least the Solaris
and glibc folk are aware of this particular issue but they don't seem
to have any plans to change their behavior. I believe that uClibc
(http://uclibc.org/) handled this case correctly, but I'm not positive.

I haven't tried this on Dinkumware as I don't have access to it but if
this was going to be handled correctly on any implementation it would
probably be the Dinkumware C99 library. Their library claims to be
certified by Perennial as C99-compliant and I believe the behavior in
question is tested in the certification process. If anyone has access
to this library it would be nice if they could confirm how it handles
the this. Additionally, if it does handle this correctly, I would be
curious to know if the same string is handled the same way with the
sscanf function (I believe it should but some people do not, the
Standard isn't crystal clear in my opinion).

We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.
Thanks very much for the input. I sense from you the same sentiment
that I have seen expressed from other implementors, that the one
character max pushback mandate isn't well-received. Although the
Rationale doesn't provide any insight as to why this decision was made
I would assume it would be to support implementations that only provide
a single character pushback while keeping results consistent among
implementations that could provide more. Do you feel that there is a
better way to handle this, has there been any discussion on changing
this behavior in the Standard, and is this a common sentiment in your
experience?

Robert Gamble

Nov 30 '06 #18
2006-11-30 <11**********************@l39g2000cwd.googlegroups .com>,
Robert Gamble wrote:
Simon Biber wrote:
>Robert Gamble wrote:
Simon Biber wrote:
I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.

But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed back.

There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1".
100e0, actually - which it's arguable* that it in fact is equivalent.

* Arguable. adj. That for which "one would be wrong, but one could argue it."
Nov 30 '06 #19
"P.J. Plauger" wrote:
>
.... snip about parsing "100ergs" as a real ...
>
We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.
Which makes sense, especially if you consider the spec as reading
"stop on the first character that cannot describe a real". It also
makes sense if you conceive of an empty field as describing zero.
This more or less agrees with the standard (at least N869):

[#4] If the subject sequence has the expected form for a
floating-point number, the sequence of characters starting
with the first digit or the decimal-point character
(whichever occurs first) is interpreted as a floating
constant according to the rules of 6.4.4.2, except that the
decimal-point character is used in place of a period, and |
that if neither an exponent part nor a decimal-point |
character appears in a decimal floating point number, or if |
a binary exponent part does not appear in a binary floating |
point number, an exponent part of the appropriate type with |
value zero is assumed to follow the last digit in the |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
string. If the subject sequence begins with a minus sign, |
the sequence is interpreted as negated.235) A character
sequence INF or INFINITY is interpreted as an infinity, if
representable in the return type, else like a floating
constant that is too large for the range of the return type.
A character sequence NAN or NAN(n-char-sequence-opt), is
interpreted as a quiet NaN, if supported in the return type,
else like a subject sequence part that does not have the
expected form; the meaning of the n-char sequences is
implementation-defined.236) A pointer to the final string *
is stored in the object pointed to by endptr, provided that
endptr is not a null pointer.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Nov 30 '06 #20
Robert Gamble wrote:
>
.... snip ...
>
Thanks very much for the input. I sense from you the same sentiment
that I have seen expressed from other implementors, that the one
character max pushback mandate isn't well-received. Although the
Rationale doesn't provide any insight as to why this decision was made
I would assume it would be to support implementations that only provide
a single character pushback while keeping results consistent among
implementations that could provide more. Do you feel that there is a
better way to handle this, has there been any discussion on changing
this behavior in the Standard, and is this a common sentiment in your
experience?
Consider handling pushing back two characters, the second of which
is a '\n'. The system buffer is holding the next line, so where do
you put the '\n'? Single char pushback can be handled simply by
diddling the internal pointer to the buffered line. Anything more
involves complications.

To misquote Dijkstra, "pity the poor implementor".

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Nov 30 '06 #21
Random832 wrote:
2006-11-30 <11**********************@l39g2000cwd.googlegroups .com>,
Robert Gamble wrote:
Simon Biber wrote:
Robert Gamble wrote:
Simon Biber wrote:
I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg...cs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.

But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed back.
There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1".

100e0, actually
Yep, thanks for the correction.
which it's arguable* that it in fact is equivalent.

* Arguable. adj. That for which "one would be wrong, but one could argue it."
In other words, 100e is not valid.

Robert Gamble

Nov 30 '06 #22
"Robert Gamble" <rg*******@gmail.comwrote in message
news:11*********************@h54g2000cwb.googlegro ups.com...
.....
I tested a number of implementations a while ago and had the same
results that you have seen. I believe the that at least the Solaris
and glibc folk are aware of this particular issue but they don't seem
to have any plans to change their behavior. I believe that uClibc
(http://uclibc.org/) handled this case correctly, but I'm not positive.

I haven't tried this on Dinkumware as I don't have access to it but if
this was going to be handled correctly on any implementation it would
probably be the Dinkumware C99 library. Their library claims to be
certified by Perennial as C99-compliant and I believe the behavior in
question is tested in the certification process. If anyone has access
to this library it would be nice if they could confirm how it handles
the this. Additionally, if it does handle this correctly, I would be
curious to know if the same string is handled the same way with the
sscanf function (I believe it should but some people do not, the
Standard isn't crystal clear in my opinion).

We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.

Thanks very much for the input. I sense from you the same sentiment
that I have seen expressed from other implementors, that the one
character max pushback mandate isn't well-received. Although the
Rationale doesn't provide any insight as to why this decision was made
I would assume it would be to support implementations that only provide
a single character pushback while keeping results consistent among
implementations that could provide more. Do you feel that there is a
better way to handle this, has there been any discussion on changing
this behavior in the Standard, and is this a common sentiment in your
experience?
I was one of the people arguing for a maximum of one character
pushback, so I have no problem with that limitation. My only issue
is whether 100e can arguably be a valid field -- I tend to err on
the tolerant side when it comes to scanning input. But I certainly
dislike the thought that different implementations can get different
results depending on the amount of pushback they happen to tolerate
in a given context. (Yes, pushback can be very context dependent.)

So I guess, at the end of the day and despite my "if only" above,
I favor the DR resolution that we made a point of matching.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Nov 30 '06 #23
On Thu, 30 Nov 2006 22:21:43 +1100, Simon Biber wrote:
>But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.
wrong 'my' little implementation of sscanf like function [sscan] seems
ok

#include "winb.h"

int main(void)
{double a, b;
char inp[]=" 100eBUONEFESTE\n",
inp1[]=" 100e0BUONEFESTE", *pc=0;
int r;
// sscan_m (char** ove, char* input, char* fmt, ...);
r=sscan_m(&pc, inp, " %f", &a);
P("a=%f, ris=%d, resto=%s", a, r, pc);
r=sscan_m(&pc, inp1, " %f", &b);
P("a=%f, ris=%d, resto=%s", b, r, pc);
return 0;
}
C:>sscan
a=100.000000, ris=1, resto=eBUONEFESTE
a=100.000000, ris=1, resto=BUONEFESTEMEMORIA DINAMICA LIBERATA
Tot=+0.0 Mb
>That's the real bug, not the quibble on how many characters are pushed back.
but have you sscan in standard C? no
Nov 30 '06 #24
On Thu, 30 Nov 2006 19:31:59 +0100, ¬a\/b wrote:
>On Thu, 30 Nov 2006 22:21:43 +1100, Simon Biber wrote:
>>But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

wrong 'my' little implementation of sscanf like function [sscan] seems
ok
lalalalalla
Nov 30 '06 #25
In article <11**********************@j72g2000cwa.googlegroups .com>
Robert Gamble <rg*******@gmail.comwrote:
>First let me make clear that I am speaking only of the pushback
functionality used within the fscanf function itself, not the pushback
capability of a stream in general (which can provide pushback for as
many characters as it desires), at least one person seems to have been
confused by my original statement. The Standard makes it clear through
the discussed footnote and example that the behavior shall be as if a
maximum of one character of pushback was used within the fscanf
function ("fscanf pushes back at most one input character onto the
input stream"). Although footnotes and examples are non-normative, the
same meaning is supported by the normative changes that were provoked
by DR 022:

In subclause 7.9.6.2, page 135, lines 31-33, change:

"An input item is defined as the longest matching sequence of input
characters, unless that exceeds a specified field width, in which case
it is the initial subsequence of that length in the sequence."

to:

"An input item is defined as the longest sequence of input characters
which does not exceed any specified field width and which is, or is a
prefix of, a matching input sequence."
I will note that when I wrote my stdio (in 1991 or so), which
internally guarantees at least four characters of pushback, the
wording in the standard was different (and in fact, the standard
itself was different :-) ).

I remember pointing out the problem somewhere -- possibly in
comp.std.c -- and the fact that correctly[%] matching "%f" against
input of the form "1.234e-x" required internally pushing back the
three characters 'x', '-', and 'e'. Add the guaranteed ungetc()
pushback and you get the four I provided.

It would have been nice if someone had taken notice of this back
in the 1990s, when I pointed it out, but I admit I did not use the
proper forum.

[% As defined at the time, "correct" appeared to mean "match 1.234,
leaving e-x in the input stream".]

Thus, given (e.g.):

double d;
char buf[100];
int ret;

ret = sscanf("1.234e-x", "%lf%s", &d, buf);

my original scanf engine sets ret to 2, d to 1.234, and buf[0]
through buf[3] to 'e', '-', 'x', and '\0' respectively.

Because C99 adds strings like "Inf" and "Infinity" (with or without
a leading sign), the amount of pushback required to make this all
work in C99 would have been larger.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 30 '06 #26
CBFalconer wrote:
Robert Gamble wrote:
>>
... snip ...
>>
Thanks very much for the input. I sense from you the same sentiment
that I have seen expressed from other implementors, that the one
character max pushback mandate isn't well-received. Although the
Rationale doesn't provide any insight as to why this decision was made
I would assume it would be to support implementations that only provide
a single character pushback while keeping results consistent among
implementations that could provide more. Do you feel that there is a
better way to handle this, has there been any discussion on changing
this behavior in the Standard, and is this a common sentiment in your
experience?

Consider handling pushing back two characters, the second of which
is a '\n'. The system buffer is holding the next line, so where do
you put the '\n'? Single char pushback can be handled simply by
diddling the internal pointer to the buffered line. Anything more
involves complications.

To misquote Dijkstra, "pity the poor implementor".
Thinking about it, it seems quite reasonable to implement multiple
pushback in a line buffered stream, provided the pushed back
material does not cross line boundaries. So I wrote a quick test,
which follows. Lo and behold, DJGPP succeeds at it. DJ is
careful. This means (not tested) that strtod should also be able
to handle such faulty input as "100e-x" and leave the stream
pointing at the e, since strtod knows the library capability.

[1] c:\c\junk>cat tungetc.c
#include <stdio.h>
#include <stdlib.h>
#define MAXLN 10

int main(void) {
char line[MAXLN + 1];
int ix, ch;

puts("Test ability to ungetc for multiple chars in one line");
fputs("Enter no more than 10 chars:", stdout); fflush(stdout);
ix = 0;
while ((EOF != (ch = getchar())) && ('\n' != ch)) {
if (MAXLN <= ix) break;
line[ix++] = ch;
}
line[ix] = '\0';
if ('\n' != ungetc('\n', stdin)) {
puts("Can't unget a '\\n'");
return(EXIT_FAILURE);
}
puts(line);
puts("Trying to push back the whole line");
while (ix 0) {
ch = ungetc(line[--ix], stdin);
if (ch == line[ix]) putchar(ch);
else {
putchar(line[ix]);
puts(" failed to push back");
return(EXIT_FAILURE);
}
}
puts("\nTrying to reread the whole line");
while ((EOF != (ch = getchar())) && ('\n' != ch)) {
if (ix++ == MAXLN) break;
putchar(ch);
}
return 0;
} /* main */

[1] c:\c\junk>.\a
Test ability to ungetc for multiple chars in one line
Enter no more than 10 chars:12345
12345
Trying to push back the whole line
54321
Trying to reread the whole line
12345
[1] c:\c\junk>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Dec 1 '06 #27
On Thu, 30 Nov 2006 19:31:59 +0100, ¬a\/b wrote:
>On Thu, 30 Nov 2006 22:21:43 +1100, Simon Biber wrote:
>>But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

wrong 'my' little implementation of sscanf like function [sscan] seems
ok
unget the sign - too...

#include "winb.h"

int main(void)
{double a, b;
char inp[] =" 100eBUONEFESTE",
inp1[]=" 100e0BUONEFESTE",
inp2[]=" 100e-0BUONEFESTE",
inp3[]=" 100e-BUONEFESTE", *pc=0;
int r;
// sscan_m (char** ove, char* input, char* fmt, ...);
r=sscan_m(&pc, inp, " %f", &a);
P("[%s] a=%f, ris=%d, resto=%s\n", inp, a, r, pc);
r=sscan_m(&pc, inp1, " %f", &b);
P("[%s] a=%f, ris=%d, resto=%s\n", inp1, b, r, pc);
r=sscan_m(&pc, inp2, " %f", &b);
P("[%s] a=%f, ris=%d, resto=%s\n", inp2, b, r, pc);
r=sscan_m(&pc, inp3, " %f", &b);
P("[%s] a=%f, ris=%d, resto=%s\n", inp3, b, r, pc);
return 0;
}

C:>sscan
[ 100eBUONEFESTE] a=100.000000, ris=1, resto=eBUONEFESTE
[ 100e0BUONEFESTE] a=100.000000, ris=1, resto=BUONEFESTE
[ 100e-0BUONEFESTE] a=100.000000, ris=1, resto=BUONEFESTE
[ 100e-BUONEFESTE] a=100.000000, ris=1, resto=e-BUONEFESTE
MEMORIA DINAMICA LIBERATA Tot=+0.0 Mb

Dec 1 '06 #28

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: hpy_awad | last post by:
I wrote that example from a book and there is en error in the display module that it does not showing all the records are entered in the input module. I traced with some printf statments without...
6
by: Rob Thorpe | last post by:
Given the code:- r = sscanf (s, "%lf", x); What is the correct output if the string s is simply "-" ? If "-" is considered the beginning of a number, that has been cut-short then the...
51
by: moosdau | last post by:
my code: do { printf("please input the dividend and the divisor.\n"); if(!scanf("%d%d",&dend,&dor)) { temp1=1; fflush(stdin); } else
185
by: Martin Jørgensen | last post by:
Hi, Consider: ------------ char stringinput ..bla. bla. bla. do {
26
by: vid512 | last post by:
hi. i wanted to know why doesn't the scanf functions check for overflow when reading number. For example scanf("%d" on 32bit machine considers "1" and "4294967297" to be the same. I tracked...
22
by: Amali | last post by:
I'm newdie in c programming. this is my first project in programming. I have to write a program for a airline reservation. this is what i have done yet. but when it runs it shows the number of...
7
by: i | last post by:
#include<stdio.h> #include<conio.h> #include<process.h> #include<string.h> char ch; int w; int n,m; //void main(); char check(int n,int m,char ch); void cash(int n,int m,char ch);
13
by: Albert | last post by:
Hi I'm using the lcc compiler for win32. I tried compiling a program but there's an error stating: "cpp: Can't open input file clrscr()" I don't get it - I've included <tcconio.h>. (strange why...
51
by: deepak | last post by:
Hi, For the program pasted below, scanf is not waiting for the second user input. Can someone suggest reason for this? void main() { char c;
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.