473,890 Members | 1,367 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to force fscanf to find only data on a single input line?

Apologies if this is in the FAQ. I looked, but didn't find it.

In a particular program the input read from a file is supposed to be:

+ 100 200 name1
- 101 201 name2

It is parsed by reading the + character, and then sending the
remainder into fscanf() like

count = fscanf(fp,"%d %d %s",&first_int, &second_int,&st ring);

This works fine unless the input is bogus. In particular, if
"name1" is left off, fscanf happily reads past the EOL of the
first line and comes back with "-" from the second line
stored in the string. Effectively it sees the bogus line as:

+ 100 200 - 101 201 name2

since it makes no distinction between EOL and other white space.
So count is 3 but the wrong characters are stored in string.

What I want is for count to be 2 and string's contents to be
undefined. Is there some magic format specifier that tells fscanf()
not to go past the EOL when looking for data? Sure, it can be done by
reading a whole line into a buffer, and then using sscanf() on that. It
just seems that there should be a way to make fscanf() "line aware".

Possible?

Thanks,

David Mathog
Aug 28 '07
59 5615

"Richard Heathfield" <rj*@see.sig.in validwrote in message
news:0N******** *************** *******@bt.com. ..
Malcolm McLean said:
>I am I the only one who has realised this?

I don't know. You're the one who says it's a recognised design flaw, so
it's up to you to come up with some recognisers.
We used to have regular discussions about how to use the fscanf() format
string to do amazing things with the function. If I remember rightly these
were in the days of Dan Pop (anyone know what became of him after he left
CERN? He is sorely missed.) One thing that came out of this was that the
treatment of a newline as matching whitespace meant that there was no nice
way of doing line-based formatting.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 29 '07 #11
Richard Heathfield wrote:
CBFalconer said:

<snip>
>[ggets has] been out there and used for about 5 years now, and nobody
worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25 June
2002. He was the first, as far as I can make out, but he is certainly
not the last.
Well, I certainly never saw it, and I have given my reasons for
rejecting any change to the functions header.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 29 '07 #12
Malcolm McLean said:
>
"Richard Heathfield" <rj*@see.sig.in validwrote in message
news:0N******** *************** *******@bt.com. ..
>Malcolm McLean said:
>>I am I the only one who has realised this?

I don't know. You're the one who says it's a recognised design flaw,
so it's up to you to come up with some recognisers.
We used to have regular discussions about how to use the fscanf()
format string to do amazing things with the function. If I remember
rightly these were in the days of Dan Pop (anyone know what became of
him after he left CERN? He is sorely missed.) One thing that came out
of this was that the treatment of a newline as matching whitespace
meant that there was no nice way of doing line-based formatting.
Never forget the (non-rhyming, non-scanning) fscanf limerick:

The ability to process information
That is spread arbitrarily
Over a number of lines
Might reasonably be seen
As a feature instead of a flaw.

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 29 '07 #13
CBFalconer said:
Richard Heathfield wrote:
>CBFalconer said:

<snip>
>>[ggets has] been out there and used for about 5 years now, and
[nobody
worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25 June
2002. He was the first, as far as I can make out, but he is certainly
not the last.

Well, I certainly never saw it,
Oh, I see. There must be two CBFalconers then, since CBFalconer did in
fact post a prompt reply to Pat Foley.
and I have given my reasons for
rejecting any change to the functions header.
That's fine - but it makes your function less useful than it could be.
For example, it oughtn't to be used in environments that are open to
accidental or malicious data abuse, or in low memory situations
(because of its leak-encouraging design).

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 29 '07 #14
pete wrote:
David Mathog wrote:
>a way to make fscanf() "line aware".

If the case is that it is acceptable
to truncate any lines longer than LENGTH number of characters,
then you can make fscanf() "line aware" this way:

http://www.mindspring.com/~pfilandr/...fscanf_input.c
This example includes the line:

rc = fscanf(stdin, "%" xstr(LENGTH) "[^\n]%*[^\n]", array);

I don't see that as being an improvement over using fgetc() and storing
the characters one by one into array, checking for \n and LENGTH as it
goes. If the data is to be read into a buffer, then sscanf() can be
employed instead of fscanf(), and the problem goes away.

I already looked at the [] notation as a possible solution for this
but couldn't figure out how to force it into shape. For instance:

rc = fscanf(fp,"%d[ \t]%d[ \t]%s[\n]",&int1,&int2,% string);

and the input is (missing name1 the end of the first line):

+ 100 200 \n- 300 400 name2\n

and fscanf is called after the "+" is read, then string will be
"\n-300 400 name2", which is not at all the desired result.

Seems like to solve this cleanly one would need to amend the spec to either:

1. Add a new format specifier which tells fscanf to STOP at the first \n.
2. Or more generally, %[\n.:] - terminate input at any of the specified
characters. I believe the %[] syntax would generate an error now, so
extending that way should not break any current code, but you folks are
the experts.

Anyway, I guess the answer to my question is that there is no simple way
to make fscanf() treat an EOL as an input terminator. It seems slightly
bizarre to me that fscanf() has no concept of "end of input", other than
EOF!

Regards,

David Mathog
Aug 29 '07 #15
Al Balmer wrote, On 28/08/07 23:50:
On Tue, 28 Aug 2007 22:36:44 +0100, "Malcolm McLean"
<re*******@btin ternet.comwrote :
>The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().

Newline *is* whitespace. N1124 6.4.3.
Even outside discussion of C I would consider newline to be whitespace.
See for example
http://www.google.co.uk/search?hl=en...G=Search&meta=

After all, if your paper is white then when printing it causes the print
position to move whilst leaving the intervening paper white. I suspect
that the term whitespace originates.
--
Flash Gordon
Aug 29 '07 #16
Flash Gordon wrote:
>
Al Balmer wrote, On 28/08/07 23:50:
On Tue, 28 Aug 2007 22:36:44 +0100, "Malcolm McLean"
<re*******@btin ternet.comwrote :
The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().
Newline *is* whitespace. N1124 6.4.3.

Even outside discussion of C I would consider newline to be whitespace.
See for example
http://www.google.co.uk/search?hl=en...G=Search&meta=
Even Whitespace considers a newline to be whitespace:

http://compsoc.dur.ac.uk/whitespace/

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer .h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th***** ********@gmail. com>
Aug 29 '07 #17
Kenneth Brody wrote:
Flash Gordon wrote:
>Al Balmer wrote, On 28/08/07 23:50:
>>On Tue, 28 Aug 2007 22:36:44 +0100, "Malcolm McLean"
<re*******@bt internet.comwro te:

The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().
Newline *is* whitespace. N1124 6.4.3.
Even outside discussion of C I would consider newline to be whitespace.
See for example
http://www.google.co.uk/search?hl=en...G=Search&meta=

Even Whitespace considers a newline to be whitespace:

http://compsoc.dur.ac.uk/whitespace/
The problem is not so much that fscanf() normally considers EOL to be
whitespace, but rather that fscanf()'s only concept of
"end of input" within the scope of an fscanf() call is either when
it sees an EOF or "all parts of the format string have been used up".
Using the [] method in the format string one can make EOL whitespace
or not (effectively), but it doesn't resolve the primary issue. As
I posted elsewhere in this thread, a more general "end of input"
specifier would allow much better control of parsing, for instance,
letting a colon, dash, or other normal character indicate the end of a
region of data.

Sadly a lot of real world data is organized in lines of text which are
terminated by an EOL. Since there's no way to tell fscanf() that the
EOL character (or any other character) is an input terminator, there's
no simple way to handle improperly formatted data using only fscanf().
It can certainly be done other ways, just not solely with this function.

Regards,

David Mathog
Aug 29 '07 #18
Richard Heathfield wrote:
CBFalconer said:
>Richard Heathfield wrote:
>>>
<snip>

[ggets has] been out there and used for about 5 years now, and
[nobody worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25
June 2002. He was the first, as far as I can make out, but he is
certainly not the last.

Well, I certainly never saw it,

Oh, I see. There must be two CBFalconers then, since CBFalconer
did in fact post a prompt reply to Pat Foley.
Well, maybe I should modify my answer to 'I don't remember'. This
also indicates how seriously I took any such objection at the time.
>
>and I have given my reasons for rejecting any change to the
functions header.

That's fine - but it makes your function less useful than it
could be. For example, it oughtn't to be used in environments
that are open to accidental or malicious data abuse, or in low
memory situations (because of its leak-encouraging design).
That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous. Systems have
better methods of limiting overuse, such as memory maxima. Nor
should any recursive code be let out into the wild, since overuse
can crash. Ptui.

After all, it is just one more choice. You can use gets, ggets
fgets, getline (I think that is your routines name), getc, fscanf,
etc. as you wish. Scratch gets from that list. You pays your money
and takes your choice. Or write your own.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '07 #19
CBFalconer <cb********@yah oo.comwrites:
Richard Heathfield wrote:
[...]
>That's fine - but it makes your function less useful than it
could be. For example, it oughtn't to be used in environments
that are open to accidental or malicious data abuse, or in low
memory situations (because of its leak-encouraging design).

That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous. Systems have
better methods of limiting overuse, such as memory maxima. Nor
should any recursive code be let out into the wild, since overuse
can crash. Ptui.
A program can use malloc reasonably safely as long as the program can
control how much memory is allocated. Similarly for recursion, if the
program can control the depth of recursion.

gets() is dangerous because its misbehavior (buffer overflow) can be
triggered by factors that the program cannot control, namely the
contents of stdin.

ggets() is less dangerous, but nevertheless its misbehavior
(attempting to allocate more memory that it should) can likewise be
triggered by the contents of stdin. Once my program call ggets(), it
has *no control* over how much memory may be allocated.

If you consider that to be an acceptable price to pay for the relative
simplicity of ggets(), that's your call, but it's something that
anyone thinking about using ggets() should consider.

[...]

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 30 '07 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
2841
by: Kay | last post by:
1) If i want to read data from a txt file, eg John; 23; a Mary; 16; i How can I read the above data stopping reading b4 each semi-colon and save it in three different variables ? 2) If I enter a number, can I use to call a particular node ? eg enter a number: 3 calling node of number 3 is it possible ?
4
2349
by: Cal Lidderdale | last post by:
My input line is i1,i2,i3,i4,i5,i6,i7,i8^,...i596,597, ... 14101,14102...NL/CR very long line of data - I only want the first 8 items and the delimiter between 8 & 9 is a carrot "^". The line can end at the 100th item or the 40,000th item. My code is: char data, mynull;
4
4236
by: John | last post by:
I need to read data from the file like the following with name and score, but some line may only has name without score: joe 100 amy 80 may Here's my code, but it couldn't read the line with "may" because there is no score. Anyone knows what is the workaround to this problem?
37
5004
by: PeterOut | last post by:
I am using MS Visual C++ 6.0 on Windows XP 5.1 (SP2). I am not sure if this is a C, C++ or MS issue but fscanf has been randomly hanging on me. I make the call hundreds, if not thousands, of times but it hangs in different places with the same data. The offending code follows. ReadFile(char *csFileName) { float fFloat1, fFloat2;
0
9978
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9819
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11222
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10918
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10460
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
8015
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5845
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
4270
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3275
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.