How to force fscanf to find only data on a single input line?

David Mathog

Apologies if this is in the FAQ. I looked, but didn't find it.

In a particular program the input read from a file is supposed to be:

+ 100 200 name1
- 101 201 name2

It is parsed by reading the + character, and then sending the
remainder into fscanf() like

count = fscanf(fp,"%d %d %s",&first_int,&second_int,&string);

This works fine unless the input is bogus. In particular, if
"name1" is left off, fscanf happily reads past the EOL of the
first line and comes back with "-" from the second line
stored in the string. Effectively it sees the bogus line as:

+ 100 200 - 101 201 name2

since it makes no distinction between EOL and other white space.
So count is 3 but the wrong characters are stored in string.

What I want is for count to be 2 and string's contents to be
undefined. Is there some magic format specifier that tells fscanf()
not to go past the EOL when looking for data? Sure, it can be done by
reading a whole line into a buffer, and then using sscanf() on that. It
just seems that there should be a way to make fscanf() "line aware".

Possible?

Thanks,

David Mathog

Aug 28 '07 #1

Subscribe Post Reply

5522

Malcolm McLean

"David Mathog" <ma****@caltech.eduwrote in message
news:fb**********@naig.caltech.edu...

Apologies if this is in the FAQ. I looked, but didn't find it.

In a particular program the input read from a file is supposed to be:

+ 100 200 name1
- 101 201 name2

It is parsed by reading the + character, and then sending the
remainder into fscanf() like

count = fscanf(fp,"%d %d %s",&first_int,&second_int,&string);

This works fine unless the input is bogus. In particular, if
"name1" is left off, fscanf happily reads past the EOL of the
first line and comes back with "-" from the second line
stored in the string. Effectively it sees the bogus line as:

+ 100 200 - 101 201 name2

since it makes no distinction between EOL and other white space.
So count is 3 but the wrong characters are stored in string.

What I want is for count to be 2 and string's contents to be
undefined. Is there some magic format specifier that tells fscanf()
not to go past the EOL when looking for data? Sure, it can be done by
reading a whole line into a buffer, and then using sscanf() on that. It
just seems that there should be a way to make fscanf() "line aware".

Use fgets() or Chuck Falconer's ggets() (Google his name and ggets to find
it) to read in a line, and then parse it with sscanf().

The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 28 '07 #2

Richard Heathfield

Malcolm McLean said:

<snip>

Use fgets() or Chuck Falconer's ggets() (Google his name and ggets to
find it) to read in a line, and then parse it with sscanf().

Chuck's ggets function suffers from at least two problems, one being
that every call creates a new buffer that must be managed, and another
being the absence of any way to specify an upper limit on memory
consumption.

The fact that newline is treated as whitepace is a recognised design
flaw in fscanf().

Recognised by whom?

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 28 '07 #3

Malcolm McLean

"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:n6******************************@bt.com...

Malcolm McLean said:

<snip>

>Use fgets() or Chuck Falconer's ggets() (Google his name and ggets to
find it) to read in a line, and then parse it with sscanf().

Chuck's ggets function suffers from at least two problems, one being
that every call creates a new buffer that must be managed, and another
being the absence of any way to specify an upper limit on memory
consumption.

It's a big improvement on fgets(). No one's going to try to crash David
Mathog's program by feeding it a 4 billion character .line, now, are they?

>
>The fact that newline is treated as whitepace is a recognised design
flaw in fscanf().

Recognised by whom?

I am I the only one who has realised this? I don't think so, it has been
discussed before, though I'm afraid I couldn't reference the threads.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 28 '07 #4

Al Balmer

On Tue, 28 Aug 2007 22:36:44 +0100, "Malcolm McLean"
<re*******@btinternet.comwrote:

>The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().

Newline *is* whitespace. N1124 6.4.3.

--
Al Balmer
Sun City, AZ

Aug 28 '07 #5

CBFalconer

Malcolm McLean wrote:

"Richard Heathfield" <rj*@see.sig.invalidwrote in message
>Malcolm McLean said:

<snip>

>>Use fgets() or Chuck Falconer's ggets() (Google his name and ggets
to find it) to read in a line, and then parse it with sscanf().

You find it on my page. See sig.

>>
Chuck's ggets function suffers from at least two problems, one
being that every call creates a new buffer that must be managed,
and another being the absence of any way to specify an upper limit
on memory consumption.

It's a big improvement on fgets(). No one's going to try to crash
David Mathog's program by feeding it a 4 billion character .line,
now, are they?

.... snip ...

>
I am I the only one who has realised this? I don't think so, it has
been discussed before, though I'm afraid I couldn't reference the
threads.

It's been out there and used for about 5 years now, and nobody
worried about the possible infinite string until now.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 29 '07 #6

CBFalconer

Malcolm McLean wrote:

>

.... snip ...

>
The fact that newline is treated as whitepace is a recognised
design flaw in fscanf().

is a recognized feature, which may become helpful or a flaw,
dependent on the usage desired.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 29 '07 #7

pete

David Mathog wrote:

a way to make fscanf() "line aware".

If the case is that it is acceptable
to truncate any lines longer than LENGTH number of characters,
then you can make fscanf() "line aware" this way:

http://www.mindspring.com/~pfilandr/...fscanf_input.c
If the case is that you can rewind() the text file,
then you can make fscanf() "line aware" this way:

http://www.mindspring.com/~pfilandr/..._input/type_.c

--
pete

Aug 29 '07 #8

Richard Heathfield

Malcolm McLean said:

>
"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:n6******************************@bt.com...
>Malcolm McLean said:

<snip>

>>Use fgets() or Chuck Falconer's ggets() (Google his name and ggets
to find it) to read in a line, and then parse it with sscanf().

Chuck's ggets function suffers from at least two problems, one being
that every call creates a new buffer that must be managed, and
another being the absence of any way to specify an upper limit on
memory consumption.

It's a big improvement on fgets().

I'm not convinced of that. Convince me.

No one's going to try to crash
David Mathog's program by feeding it a 4 billion character .line, now,
are they?

I have no idea what David Mathog's threat model is. I do know, however,
that he will find buffer management under ggets either inconvenient,
inefficient, or both.

>>The fact that newline is treated as whitepace is a recognised design
flaw in fscanf().

Recognised by whom?

I am I the only one who has realised this?

I don't know. You're the one who says it's a recognised design flaw, so
it's up to you to come up with some recognisers.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 29 '07 #9

Richard Heathfield

CBFalconer said:

<snip>

[ggets has] been out there and used for about 5 years now, and nobody
worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25 June
2002. He was the first, as far as I can make out, but he is certainly
not the last.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 29 '07 #10

Malcolm McLean

"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:0N******************************@bt.com...

Malcolm McLean said:

>I am I the only one who has realised this?

I don't know. You're the one who says it's a recognised design flaw, so
it's up to you to come up with some recognisers.

We used to have regular discussions about how to use the fscanf() format
string to do amazing things with the function. If I remember rightly these
were in the days of Dan Pop (anyone know what became of him after he left
CERN? He is sorely missed.) One thing that came out of this was that the
treatment of a newline as matching whitespace meant that there was no nice
way of doing line-based formatting.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 29 '07 #11

CBFalconer

Richard Heathfield wrote:

CBFalconer said:

<snip>

>[ggets has] been out there and used for about 5 years now, and nobody
worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25 June
2002. He was the first, as far as I can make out, but he is certainly
not the last.

Well, I certainly never saw it, and I have given my reasons for
rejecting any change to the functions header.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 29 '07 #12

Richard Heathfield

Malcolm McLean said:

>
"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:0N******************************@bt.com...
>Malcolm McLean said:

>>I am I the only one who has realised this?

I don't know. You're the one who says it's a recognised design flaw,
so it's up to you to come up with some recognisers.

We used to have regular discussions about how to use the fscanf()
format string to do amazing things with the function. If I remember
rightly these were in the days of Dan Pop (anyone know what became of
him after he left CERN? He is sorely missed.) One thing that came out
of this was that the treatment of a newline as matching whitespace
meant that there was no nice way of doing line-based formatting.

Never forget the (non-rhyming, non-scanning) fscanf limerick:

The ability to process information
That is spread arbitrarily
Over a number of lines
Might reasonably be seen
As a feature instead of a flaw.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 29 '07 #13

Richard Heathfield

CBFalconer said:

Richard Heathfield wrote:
>CBFalconer said:

<snip>

>>[ggets has] been out there and used for about 5 years now, and
[nobody
worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25 June
2002. He was the first, as far as I can make out, but he is certainly
not the last.

Well, I certainly never saw it,

Oh, I see. There must be two CBFalconers then, since CBFalconer did in
fact post a prompt reply to Pat Foley.

and I have given my reasons for
rejecting any change to the functions header.

That's fine - but it makes your function less useful than it could be.
For example, it oughtn't to be used in environments that are open to
accidental or malicious data abuse, or in low memory situations
(because of its leak-encouraging design).

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 29 '07 #14

David Mathog

pete wrote:

David Mathog wrote:

>a way to make fscanf() "line aware".

If the case is that it is acceptable
to truncate any lines longer than LENGTH number of characters,
then you can make fscanf() "line aware" this way:

http://www.mindspring.com/~pfilandr/...fscanf_input.c

This example includes the line:

rc = fscanf(stdin, "%" xstr(LENGTH) "[^\n]%*[^\n]", array);

I don't see that as being an improvement over using fgetc() and storing
the characters one by one into array, checking for \n and LENGTH as it
goes. If the data is to be read into a buffer, then sscanf() can be
employed instead of fscanf(), and the problem goes away.

I already looked at the [] notation as a possible solution for this
but couldn't figure out how to force it into shape. For instance:

rc = fscanf(fp,"%d[ \t]%d[ \t]%s[\n]",&int1,&int2,%string);

and the input is (missing name1 the end of the first line):

+ 100 200 \n- 300 400 name2\n

and fscanf is called after the "+" is read, then string will be
"\n-300 400 name2", which is not at all the desired result.

Seems like to solve this cleanly one would need to amend the spec to either:

1. Add a new format specifier which tells fscanf to STOP at the first \n.
2. Or more generally, %[\n.:] - terminate input at any of the specified
characters. I believe the %[] syntax would generate an error now, so
extending that way should not break any current code, but you folks are
the experts.

Anyway, I guess the answer to my question is that there is no simple way
to make fscanf() treat an EOL as an input terminator. It seems slightly
bizarre to me that fscanf() has no concept of "end of input", other than
EOF!

Regards,

David Mathog

Aug 29 '07 #15

Flash Gordon

Al Balmer wrote, On 28/08/07 23:50:

On Tue, 28 Aug 2007 22:36:44 +0100, "Malcolm McLean"
<re*******@btinternet.comwrote:

>The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().

Newline *is* whitespace. N1124 6.4.3.

Even outside discussion of C I would consider newline to be whitespace.
See for example
http://www.google.co.uk/search?hl=en...G=Search&meta=

After all, if your paper is white then when printing it causes the print
position to move whilst leaving the intervening paper white. I suspect
that the term whitespace originates.
--
Flash Gordon

Aug 29 '07 #16

Kenneth Brody

Flash Gordon wrote:

>
Al Balmer wrote, On 28/08/07 23:50:
On Tue, 28 Aug 2007 22:36:44 +0100, "Malcolm McLean"
<re*******@btinternet.comwrote:

The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().
Newline *is* whitespace. N1124 6.4.3.

Even outside discussion of C I would consider newline to be whitespace.
See for example
http://www.google.co.uk/search?hl=en...G=Search&meta=

Even Whitespace considers a newline to be whitespace:

http://compsoc.dur.ac.uk/whitespace/

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Aug 29 '07 #17

David Mathog

Kenneth Brody wrote:

Flash Gordon wrote:
>Al Balmer wrote, On 28/08/07 23:50:
>>On Tue, 28 Aug 2007 22:36:44 +0100, "Malcolm McLean"
<re*******@btinternet.comwrote:

The fact that newline is treated as whitepace is a recognised design flaw in
fscanf().
Newline *is* whitespace. N1124 6.4.3.
Even outside discussion of C I would consider newline to be whitespace.
See for example
http://www.google.co.uk/search?hl=en...G=Search&meta=

Even Whitespace considers a newline to be whitespace:

http://compsoc.dur.ac.uk/whitespace/

The problem is not so much that fscanf() normally considers EOL to be
whitespace, but rather that fscanf()'s only concept of
"end of input" within the scope of an fscanf() call is either when
it sees an EOF or "all parts of the format string have been used up".
Using the [] method in the format string one can make EOL whitespace
or not (effectively), but it doesn't resolve the primary issue. As
I posted elsewhere in this thread, a more general "end of input"
specifier would allow much better control of parsing, for instance,
letting a colon, dash, or other normal character indicate the end of a
region of data.

Sadly a lot of real world data is organized in lines of text which are
terminated by an EOL. Since there's no way to tell fscanf() that the
EOL character (or any other character) is an input terminator, there's
no simple way to handle improperly formatted data using only fscanf().
It can certainly be done other ways, just not solely with this function.

Regards,

David Mathog

Aug 29 '07 #18

CBFalconer

Richard Heathfield wrote:

CBFalconer said:
>Richard Heathfield wrote:
>>>
<snip>

[ggets has] been out there and used for about 5 years now, and
[nobody worried about the possible infinite string until now.

Not so. Pat Foley raised the issue, here in comp.lang.c, on 25
June 2002. He was the first, as far as I can make out, but he is
certainly not the last.

Well, I certainly never saw it,

Oh, I see. There must be two CBFalconers then, since CBFalconer
did in fact post a prompt reply to Pat Foley.

Well, maybe I should modify my answer to 'I don't remember'. This
also indicates how seriously I took any such objection at the time.

>
>and I have given my reasons for rejecting any change to the
functions header.

That's fine - but it makes your function less useful than it
could be. For example, it oughtn't to be used in environments
that are open to accidental or malicious data abuse, or in low
memory situations (because of its leak-encouraging design).

That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous. Systems have
better methods of limiting overuse, such as memory maxima. Nor
should any recursive code be let out into the wild, since overuse
can crash. Ptui.

After all, it is just one more choice. You can use gets, ggets
fgets, getline (I think that is your routines name), getc, fscanf,
etc. as you wish. Scratch gets from that list. You pays your money
and takes your choice. Or write your own.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '07 #19

Keith Thompson

CBFalconer <cb********@yahoo.comwrites:

Richard Heathfield wrote:

[...]

>That's fine - but it makes your function less useful than it
could be. For example, it oughtn't to be used in environments
that are open to accidental or malicious data abuse, or in low
memory situations (because of its leak-encouraging design).

That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous. Systems have
better methods of limiting overuse, such as memory maxima. Nor
should any recursive code be let out into the wild, since overuse
can crash. Ptui.

A program can use malloc reasonably safely as long as the program can
control how much memory is allocated. Similarly for recursion, if the
program can control the depth of recursion.

gets() is dangerous because its misbehavior (buffer overflow) can be
triggered by factors that the program cannot control, namely the
contents of stdin.

ggets() is less dangerous, but nevertheless its misbehavior
(attempting to allocate more memory that it should) can likewise be
triggered by the contents of stdin. Once my program call ggets(), it
has *no control* over how much memory may be allocated.

If you consider that to be an acceptable price to pay for the relative
simplicity of ggets(), that's your call, but it's something that
anyone thinking about using ggets() should consider.

[...]

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 30 '07 #20

Richard Heathfield

CBFalconer said:

Richard Heathfield wrote:
>CBFalconer said:
>>>
Well, I certainly never saw [Pat Foley's ggets objection],

Oh, I see. There must be two CBFalconers then, since CBFalconer
did in fact post a prompt reply to Pat Foley.

Well, maybe I should modify my answer to 'I don't remember'. This
also indicates how seriously I took any such objection at the time.

If you had taken it more seriously, ggets would be a better function.

>>and I have given my reasons for rejecting any change to the
functions header.

That's fine - but it makes your function less useful than it
could be. For example, it oughtn't to be used in environments
that are open to accidental or malicious data abuse, or in low
memory situations (because of its leak-encouraging design).

That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous.

Well, I didn't say it was dangerous. Nor do I agree that my claim is
ridiculous. This is what people want to be able to do:

1) initialise
2) main loop
2a) gather input
2b) process input
3) possibly do post-processing on intermediate results
4) produce output
5) clean up
6) quit

Okay, that doesn't quite cover all eventualities, but it gives a general
model for batch code. (The problem still remains for interactive code,
but let's keep our example simple.) The problem with ggets is that it
mandates an additional step within the main loop - effectively moving
part of the cleanup into the loop itself. Requiring people to do that
is a weakness. When they forget - and they will - the result can hardly
be called a leak, because it's more like a firehose.

After all, it is just one more choice. You can use gets, ggets
fgets, getline (I think that is your routines name),

You think wrong.

getc, fscanf,
etc. as you wish. Scratch gets from that list.

And scratch ggets too, until it does what it ought.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 30 '07 #21

Richard

Keith Thompson <ks***@mib.orgwrites:

CBFalconer <cb********@yahoo.comwrites:
>Richard Heathfield wrote:
[...]

>>That's fine - but it makes your function less useful than it
could be. For example, it oughtn't to be used in environments
that are open to accidental or malicious data abuse, or in low
memory situations (because of its leak-encouraging design).

That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous. Systems have
better methods of limiting overuse, such as memory maxima. Nor
should any recursive code be let out into the wild, since overuse
can crash. Ptui.

A program can use malloc reasonably safely as long as the program can
control how much memory is allocated. Similarly for recursion, if the
program can control the depth of recursion.

gets() is dangerous because its misbehavior (buffer overflow) can be
triggered by factors that the program cannot control, namely the
contents of stdin.

ggets() is less dangerous, but nevertheless its misbehavior
(attempting to allocate more memory that it should) can likewise be
triggered by the contents of stdin. Once my program call ggets(), it
has *no control* over how much memory may be allocated.

If you consider that to be an acceptable price to pay for the relative
simplicity of ggets(), that's your call, but it's something that
anyone thinking about using ggets() should consider.

[...]

See previous post on the matter. Without the necessary "limit"
parameters then ggets is positively dangerous.

Aug 30 '07 #22

Malcolm McLean

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...

That's ridiculous. Similarly, you can say anything that uses
malloc to collect and store information is dangerous. Systems have
better methods of limiting overuse, such as memory maxima. Nor
should any recursive code be let out into the wild, since overuse
can crash. Ptui.

What you should do is extend the buffer by size/10 + 128 on every call, or
every call after the first one or two.
Growing by 10% each time means that the total number of cals to realloc()
before you run out of memory will be rather small.
Richard Heathfield will probably insist on shrinking the request on failure.

Obviously you should make size a size_t, even though I persoanlly hate that
type, and check for nextsize prevsize, for termination. Again, you could
shrink so that an array of pretty much exactly size_t max can be allocagted,
but it becomes increasingly futile.
The run it on dev/zero and see if it comes back with "out of memory"
reasonably sharpish.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 30 '07 #23

CBFalconer

Richard Heathfield wrote:

CBFalconer said:

.... snip ...

>
>After all, it is just one more choice. You can use gets, ggets
fgets, getline (I think that is your routines name),

You think wrong.

>getc, fscanf, etc. as you wish. Scratch gets from that list.

And scratch ggets too, until it does what it ought.

To be consistent you have to add 'scratch malloc' to your list.
BTW, if you add Navias favorite malloc clean-up to your system, it
will apply to ggets also. This illustrates that the 'fault' is not
within ggets, but within malloc.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '07 #24

Richard Heathfield

CBFalconer said:

Richard Heathfield wrote:
>CBFalconer said:

... snip ...
>>
>>After all, it is just one more choice. You can use gets, ggets
fgets, getline (I think that is your routines name),

You think wrong.

>>getc, fscanf, etc. as you wish. Scratch gets from that list.

And scratch ggets too, until it does what it ought.

To be consistent you have to add 'scratch malloc' to your list.

If you think so, you have missed my point completely.

Here is how most people use fgets:

code to set up buffer

while(fgets(buffer, sizeof buffer, stream) != NULL)
{
process the line
}

cleanup
And that's how they will tend to use ggets too, whereas what ggets needs
(normally) is:

setup buffer pointer

while whatever the ggets syntax is
{
process the line
free the buffer
}

In my opinion, that's bad design. It should be possible to write:

setup buffer pointer

while whatever the ggets syntax is
{
process the line
}

cleanup

without going to extravagant lengths. I can see that the ggets function
isn't quite so broken if you want the line data to persist beyond a
single loop iteration - but very often you don't.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 30 '07 #25

Bart van Ingen Schenau

On 29 aug, 19:13, David Mathog <mat...@caltech.eduwrote:

I already looked at the [] notation as a possible solution for this
but couldn't figure out how to force it into shape. For instance:

rc = fscanf(fp,"%d[ \t]%d[ \t]%s[\n]",&int1,&int2,%string);

Try this instead:
rc = fscanf(fp,"%d%*[ \t]%d%*[ \t]%[^\n]%*[\n]",&int1,&int2,string);

This should only have problems with input lines like
+ 100 \n-300 400 name2\n

because the %d format specifier eats leading whitespace
unconditionally.

<snip>

Anyway, I guess the answer to my question is that there is no simple way
to make fscanf() treat an EOL as an input terminator. It seems slightly
bizarre to me that fscanf() has no concept of "end of input", other than
EOF!

The fact is that it is non-trivial to use (f)scanf in situations where
the input stream may contain errors.

>
Regards,

David Mathog

Bart v Ingen Schenau

Aug 30 '07 #26

CBFalconer

Richard Heathfield wrote:

>

.... snip ...

>
Here is how most people use fgets:

code to set up buffer
while(fgets(buffer, sizeof buffer, stream) != NULL) {
process the line
}
cleanup

And that's how they will tend to use ggets too, whereas what ggets
needs (normally) is:

setup buffer pointer /* which is "char *ptr;" */
while whatever the ggets syntax is {
process the line
free the buffer /* which may be "free(buf);" */
}

In my opinion, that's bad design. It should be possible to write:

setup buffer pointer /* which is more complex */
while whatever the ggets syntax is {
process the line /* which has to handle parts or sizes */
}
cleanup /* not actually needed */

without going to extravagant lengths. I can see that the ggets
function isn't quite so broken if you want the line data to
persist beyond a single loop iteration - but very often you don't.

Note that the ggets syntax is "while (0 == ggets(&ptr)) {". If you
save the error return you can discriminate between memory
exhaustion and i/o errors (including EOF). If you save the
returned pointer (rather than freeing it immediately) you can tuck
the lines away for future use. Remember the design objective was
the simplicity of gets without the penalties.

The ggets source code (and usage examples) is available at:

<http://cbfalconer.home.att.net/download/>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '07 #27

Richard Heathfield

CBFalconer said:

Richard Heathfield wrote:
>>
... snip ...
>>
Here is how most people use fgets:

code to set up buffer
while(fgets(buffer, sizeof buffer, stream) != NULL) {
process the line
}
cleanup

And that's how they will tend to use ggets too, whereas what ggets
needs (normally) is:

setup buffer pointer /* which is "char *ptr;" */
while whatever the ggets syntax is {
process the line
free the buffer /* which may be "free(buf);" */
}

In my opinion, that's bad design. It should be possible to write:

setup buffer pointer /* which is more complex */
while whatever the ggets syntax is {
process the line /* which has to handle parts or sizes */
}
cleanup /* not actually needed */

Yeah, but the cost of not needing the cleanup is that you have to do the
cleanup inside the loop, which means that either you forget (which,
let's face it, is what most newbies will do) or you are doomed to
free/alloc/free/alloc/free/alloc/free/ad loopeam.

Note that the ggets syntax is "while (0 == ggets(&ptr)) {".

Great, so you're most of the way there. All you need now is a size_t *
to save you from having to reallocate a fresh buffer each time, and a
size_t to indicate the maximum allowable allocation. So near!

If you
save the error return you can discriminate between memory
exhaustion and i/o errors (including EOF). If you save the
returned pointer (rather than freeing it immediately) you can tuck
the lines away for future use. Remember the design objective was
the simplicity of gets without the penalties.

You haven't got the simplicity of gets - you lost that when you took
char ** instead of char * - and you still have the downsides of
unnecessarily complex memory management and exposure to a
denial-of-memory attack. Worst of both worlds.

The ggets source code (and usage examples) is available at:

Yes, I know. What I don't know is why you think it's worth promulgating.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 30 '07 #28

Anand Hariharan

On Aug 29, 11:05 pm, Richard Heathfield <r...@see.sig.invalidwrote:

This is what people want to be able to do:

1) initialise
2) main loop
2a) gather input
2b) process input
3) possibly do post-processing on intermediate results
4) produce output
5) clean up
6) quit

Okay, that doesn't quite cover all eventualities, but it gives a general
model for batch code. (The problem still remains for interactive code,
but let's keep our example simple.)

In terms of a routine that reads a line of arbitrary length, I have
had this wish-list for sometime:

It should be possible for this routine to gracefully handle text files
created in a platform whose EOL convention is different from the one
on the target binary's platform.

E.g., if I use Cygwin or MSYS to create a text file (by say
redirecting some command's output), it is a text file on a Windows
file system, but with UNIX EOL conventions. Even if this file is only
a few hundred lines long, trying to read such a file line-by-line
causes most routines (including the fgets approach where the buffer's
memory is doubled each time its capacity is reached) to read the
entire file as a single line. This usually lands up causing the
system to become unstable due to memory exhaustion.

- Anand

PS: I wish to stay clear of "virtues and follies of ggets' design"
debate.

Aug 30 '07 #29

Keith Thompson

CBFalconer <cb********@yahoo.comwrites:

Richard Heathfield wrote:
>CBFalconer said:
... snip ...
>>
>>After all, it is just one more choice. You can use gets, ggets
fgets, getline (I think that is your routines name),

You think wrong.

>>getc, fscanf, etc. as you wish. Scratch gets from that list.

And scratch ggets too, until it does what it ought.

To be consistent you have to add 'scratch malloc' to your list.

[...]

Not at all. A program that calls malloc can have complete control
over the maximum amount of memory that can be allocated. Even if the
amount of memory required is determined externally, the program itself
can always do a sanity check and reject a huge allocation. With
ggets, the amount of memory it attempts to allocate is entirely
controlled by the contents of stdin, something over which the program
has no control.

Do you not see the difference?

Maybe in some environments this is ok; maybe there's no harm in
allocating just as much memory as the system will allow me. But I'd
like to make that decision myself.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 30 '07 #30

CBFalconer

Richard Heathfield wrote:

>

.... snip ...

>
Great, so you're most of the way there. All you need now is a size_t *
to save you from having to reallocate a fresh buffer each time, and a
size_t to indicate the maximum allowable allocation. So near!

Well, nobody is budging, so I think I'll drop this thread. My (and
other) opinions are buried in there somewhere for anyone to review.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 30 '07 #31

Francine.Neary

On 30 Aug 2007 at 21:59, CBFalconer wrote:

Richard Heathfield wrote:
>>
... snip ...
>>
Great, so you're most of the way there. All you need now is a size_t *
to save you from having to reallocate a fresh buffer each time, and a
size_t to indicate the maximum allowable allocation. So near!

Well, nobody is budging, so I think I'll drop this thread. My (and
other) opinions are buried in there somewhere for anyone to review.

What doesn't seem to be somewhere for anyone to review is any attempt
by you to actually engage with the criticisms people have made of
ggets...

>
--
Chuck F (cbfalconerat maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account fromhttp://www.teranews.com

Aug 30 '07 #32

CBFalconer

Fr************@googlemail.com wrote:

On 30 Aug 2007 at 21:59, CBFalconer wrote:
>Richard Heathfield wrote:
>>>
... snip ...
>>>
Great, so you're most of the way there. All you need now is a size_t *
to save you from having to reallocate a fresh buffer each time, and a
size_t to indicate the maximum allowable allocation. So near!

Well, nobody is budging, so I think I'll drop this thread. My (and
other) opinions are buried in there somewhere for anyone to review.

What doesn't seem to be somewhere for anyone to review is any attempt
by you to actually engage with the criticisms people have made of
ggets...

I answered all I saw, with reasons for the existing prototype.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 31 '07 #33

Malcolm McLean

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...

Fr************@googlemail.com wrote:
>On 30 Aug 2007 at 21:59, CBFalconer wrote:
>>Richard Heathfield wrote:

... snip ...

Great, so you're most of the way there. All you need now is a size_t *
to save you from having to reallocate a fresh buffer each time, and a
size_t to indicate the maximum allowable allocation. So near!

Well, nobody is budging, so I think I'll drop this thread. My (and
other) opinions are buried in there somewhere for anyone to review.

What doesn't seem to be somewhere for anyone to review is any attempt
by you to actually engage with the criticisms people have made of
ggets...

I answered all I saw, with reasons for the existing prototype.

My fix doesn't involve any changes to the prototype.
Whilst it won't exactly return quickly after being called on /dev/zero, and
there's no particular reason it should, it might not tie up resources for
too long either.

Problem if not solved, at least substantially alleviated.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 31 '07 #34

Keith Thompson

"Malcolm McLean" <re*******@btinternet.comwrites:

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>Fr************@googlemail.com wrote:
>>On 30 Aug 2007 at 21:59, CBFalconer wrote:
Richard Heathfield wrote:
>
... snip ...
>
Great, so you're most of the way there. All you need now is a size_t *
to save you from having to reallocate a fresh buffer each time, and a
size_t to indicate the maximum allowable allocation. So near!

Well, nobody is budging, so I think I'll drop this thread. My (and
other) opinions are buried in there somewhere for anyone to review.

What doesn't seem to be somewhere for anyone to review is any attempt
by you to actually engage with the criticisms people have made of
ggets...

I answered all I saw, with reasons for the existing prototype.

My fix doesn't involve any changes to the prototype.
Whilst it won't exactly return quickly after being called on
/dev/zero, and there's no particular reason it should, it might not
tie up resources for too long either.

Problem if not solved, at least substantially alleviated.

With your change, ggets would perform fewer allocations while
attempting to read an infinitely long input line, but it would still
attempt to allocate an arbitrarily large amount of memory (until an
allocation fails). In an environment where that's a problem, it will
misbehave more quickly, but it will still misbehave.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 31 '07 #35

CBFalconer

Malcolm McLean wrote:

>

.... snip discussion about ggets() ...

>
My fix doesn't involve any changes to the prototype. Whilst it
won't exactly return quickly after being called on /dev/zero, and
there's no particular reason it should, it might not tie up
resources for too long either.

Problem if not solved, at least substantially alleviated.

I see no 'fix'. What are you talking about?

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 31 '07 #36

Malcolm McLean

"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...

"Malcolm McLean" <re*******@btinternet.comwrites:
With your change, ggets would perform fewer allocations while
attempting to read an infinitely long input line, but it would still
attempt to allocate an arbitrarily large amount of memory (until an
allocation fails). In an environment where that's a problem, it will
misbehave more quickly, but it will still misbehave.

As it stands ggets() calls realloc repeatedly with increments of 128 bytes.
realloc() generally performs internal copying - I read somewhere that it is
rare for actual implementations to extend the block, though I suppose with
an increment as small as 128 bytes that might happen more often than not.

Anyway, because of the copying, we have an O(N^2) algorithm, where N is the
total amount of available memory / 128, which could well be over a million.
So it's not surprising that the whole thing crawls.

replace the 128-byte increment with an increment of 10%, and the buffer
grows exponentially. This is the standard schoolboy question, if you
invested a penny at 10% interest, applied annually, what would it be worth
in 100 years' time? The answer is rather a large sum.

So within a hundred or so allocations the system will run out of memory.
We've now got an O(N logN) algorithm. It is still gobbling lots of memory,
but it releases it in relatively short time. That's much more acceptable.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 31 '07 #37

Richard Heathfield

Malcolm McLean said:

<snip>

This is the standard schoolboy question,
if you invested a penny at 10% interest, applied annually, what would
it be worth in 100 years' time? The answer is rather a large sum.

137.80 isn't really all that large. In any case, after a hundred years,
you're likely to have lost the account book.

So within a hundred or so allocations the system will run out of
memory. We've now got an O(N logN) algorithm. It is still gobbling
lots of memory, but it releases it in relatively short time. That's
much more acceptable.

Not as acceptable as placing an upper limit on tolerable consumption.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 31 '07 #38

Richard Tobin

In article <3p*********************@bt.com>,
Malcolm McLean <re*******@btinternet.comwrote:

>As it stands ggets() calls realloc repeatedly with increments of 128 bytes.
realloc() generally performs internal copying - I read somewhere that it is
rare for actual implementations to extend the block, though I suppose with
an increment as small as 128 bytes that might happen more often than not.

>Anyway, because of the copying, we have an O(N^2) algorithm,

Probably not. realloc() will only copy when it has to reallocate, and
it almost certainly has an algorithm that doesn't do constant
increments.

Try timing the following program with arguments suitable to the amount
of real memory you have. It doesn't show any sign of super-linearity
on my system.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
int m = atoi(argv[1]);
int i;
void *buf = 0;

fprintf(stderr, "please wait\n");

for(i=1; i<=m; i++)
if(!(buf = realloc(buf, i)))
{
fprintf(stderr, "realloc(%d) failed\n", i);
return 1;
}

return 0;
}

By the way, what's the right way to convert a decimal string to a
size_t?

>replace the 128-byte increment with an increment of 10%, and the buffer
grows exponentially. This is the standard schoolboy question, if you
invested a penny at 10% interest, applied annually, what would it be worth
in 100 years' time? The answer is rather a large sum.

So within a hundred or so allocations the system will run out of memory.
We've now got an O(N logN) algorithm. It is still gobbling lots of memory,
but it releases it in relatively short time. That's much more acceptable.

Actually no (though my first thought was the same as yours). It's
O(N). It does O(log N) copies, but they aren't of an average amount
proportional to N. Consider the easy case of doubling the allocation
each time: the copy sizes are

1 2 4 8 ... 2^k

where 2^k < N <= 2^(k+1), and the sum will be 2^(k+1) - 1, which is
O(N). For your 10% increase it's a geometric progression with ratio
1.1, so the total bytes copied will be (1.1^(k+1) -1) / (1.1 - 1), or
roughly 10N.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Aug 31 '07 #39

Malcolm McLean

"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:Cv******************************@bt.com...

Malcolm McLean said:

Not as acceptable as placing an upper limit on tolerable consumption.

What matters is the time the memory is held, as well as the amount. If a
process hogs all available memory, but releases it after a few cycles,
you've got to be pretty unlucky for anything else to notice. If it holds it
for a few seconds you may cause a few glitches, or make badly-written
programs exit. If it takes it for a few minutes you've got a very impatient
user, for a few hours and effectively you've lost the system.

Of course if a ggets() caller processes the input line, he is using all the
memory in the machine to process some data. You can't do anything about
that, nor should you, that's what it's there for.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 31 '07 #40

Malcolm McLean

"Richard Tobin" <ri*****@cogsci.ed.ac.ukwrote in message
news:fb***********@pc-news.cogsci.ed.ac.uk...

In article <3p*********************@bt.com>,
Malcolm McLean <re*******@btinternet.comwrote:

Try timing the following program with arguments suitable to the amount
of real memory you have. It doesn't show any sign of super-linearity
on my system.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
int m = atoi(argv[1]);
int i;
void *buf = 0;

fprintf(stderr, "please wait\n");

for(i=1; i<=m; i++)
if(!(buf = realloc(buf, i)))
{
fprintf(stderr, "realloc(%d) failed\n", i);
return 1;
}

return 0;
}

But what happens once you fragment the heap a little, by performing other
allocations? You can count the copies, incidentally, simply by testing if
the return pointer equals the input pointer.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 31 '07 #41

Richard Tobin

In article <ud*********************@bt.com>,
Malcolm McLean <re*******@btinternet.comwrote:

>But what happens once you fragment the heap a little, by performing other
allocations?

What do you think happens?

Possibly each reallocation takes a little longer, depending on the
allocator. I suppose it's possible that there's some first-fit
allocator out there where it changes the number of reallocations, but
I very much doubt it. Can you think of a plausible implementation
that makes it not O(N)?

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Aug 31 '07 #42

Malcolm McLean

"Richard Tobin" <ri*****@cogsci.ed.ac.ukwrote in message
news:fb***********@pc-news.cogsci.ed.ac.uk...

In article <ud*********************@bt.com>,
Malcolm McLean <re*******@btinternet.comwrote:

>>But what happens once you fragment the heap a little, by performing other
allocations?

What do you think happens?

Possibly each reallocation takes a little longer, depending on the
allocator. I suppose it's possible that there's some first-fit
allocator out there where it changes the number of reallocations, but
I very much doubt it. Can you think of a plausible implementation
that makes it not O(N)?

Best fit allocator. Imagine we've got a block of one, two, three, four, five
.... bytes. When we ask for one byte it slots us into the one, to conserve
the bigger blocks. Then into the two, recopying, the three, and so on.
If we've just got one block of 100 then it gives us the start of that, and
extends.

Mine makes 134 reallocations for N = 1million, and 2335 for N = 10 million.
I've got about 1 GB installed. So mine's showing superlinear speedup on
moderate loading, at least. A run on 1 GB hasn't finished yet. At about 20
million bytes and 4000 reallocations it is visibly slowing. I suspect your
system has a nice uniform block for realloc() to play at whilst mine is
slotting memory in with other programs.

Aug 31 '07 #43

Malcolm McLean

"Malcolm McLean" <re*******@btinternet.comwrote in message
news:eq*********************@bt.com...

>
"Richard Tobin" <ri*****@cogsci.ed.ac.ukwrote in message
news:fb***********@pc-news.cogsci.ed.ac.uk...
>In article <ud*********************@bt.com>,
Malcolm McLean <re*******@btinternet.comwrote:

>>>But what happens once you fragment the heap a little, by performing other
allocations?

What do you think happens?

Possibly each reallocation takes a little longer, depending on the
allocator. I suppose it's possible that there's some first-fit
allocator out there where it changes the number of reallocations, but
I very much doubt it. Can you think of a plausible implementation
that makes it not O(N)?

Best fit allocator. Imagine we've got a block of one, two, three, four,
five ... bytes. When we ask for one byte it slots us into the one, to
conserve the bigger blocks. Then into the two, recopying, the three, and
so on.
If we've just got one block of 100 then it gives us the start of that, and
extends.

Mine makes 134 reallocations for N = 1million, and 2335 for N = 10
million. I've got about 1 GB installed. So mine's showing superlinear
speedup on moderate loading, at least. A run on 1 GB hasn't finished yet.
At about 20 million bytes and 4000 reallocations it is visibly slowing. I
suspect your system has a nice uniform block for realloc() to play at
whilst mine is slotting memory in with other programs.

Duh. By "reallocations" I mean "moves" of course. The program make N calls
to realloc().

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 31 '07 #44

Richard Tobin

In article <eq*********************@bt.com>,
Malcolm McLean <re*******@btinternet.comwrote:

>Best fit allocator. Imagine we've got a block of one, two, three, four, five
... bytes. When we ask for one byte it slots us into the one, to conserve
the bigger blocks. Then into the two, recopying, the three, and so on.
If we've just got one block of 100 then it gives us the start of that, and
extends.

This seems like a very poor way to implement realloc(). If you call
realloc(), you are probably growing the buffer, so best fit is
inappropriate. Expanding by a factor and then doing best fit would be
more reasonable.

>Mine makes 134 reallocations for N = 1million, and 2335 for N = 10 million.

How do you come to have 2335-134=2201 differently-sized blocks blocks
of between 1 and 10 million bytes lying around? Are you splitting
bits of them for other allocations? Again, this seems like a very
poor allocator if it does that many reallocations for a growing
buffer.

What system is this?

>I've got about 1 GB installed. So mine's showing superlinear speedup on
moderate loading, at least. A run on 1 GB hasn't finished yet. At about 20
million bytes and 4000 reallocations it is visibly slowing. I suspect your
system has a nice uniform block for realloc() to play at whilst mine is
slotting memory in with other programs.

Other progams? Are you not using a system with virtual memory?

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Aug 31 '07 #45

Malcolm McLean

"Richard Tobin" <ri*****@cogsci.ed.ac.ukwrote in message
news:fb***********@pc-news.cogsci.ed.ac.uk...

In article <eq*********************@bt.com>,
Malcolm McLean <re*******@btinternet.comwrote:

>>Mine makes 134 reallocations for N = 1million, and 2335 for N = 10
million.

How do you come to have 2335-134=2201 differently-sized blocks blocks
of between 1 and 10 million bytes lying around? Are you splitting
bits of them for other allocations? Again, this seems like a very
poor allocator if it does that many reallocations for a growing
buffer.

What system is this?

This is the Windows Vista freebie C Express compiler.
I am not such an inferior being, really. I have a nice Beowulf cluster at
university. However currently I have a week off.

>
>>I've got about 1 GB installed. So mine's showing superlinear speedup on
moderate loading, at least. A run on 1 GB hasn't finished yet. At about 20
million bytes and 4000 reallocations it is visibly slowing. I suspect your
system has a nice uniform block for realloc() to play at whilst mine is
slotting memory in with other programs.

Other progams? Are you not using a system with virtual memory?

I would hope that pointers are not raw chip addresses. However exactly how
Vista chops up memory between processes I couldn't tell you. Obviously there
must be somethign else in the system, or if you have only one allocated
block on the heap there wouldn't seem to be much call to move it. Unless it
puts blocks at the top, which I wouldn't put past it. It's a horrid
compiler. Nothing works.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 31 '07 #46

CBFalconer

Malcolm McLean wrote:

>

.... snip ...

>
As it stands ggets() calls realloc repeatedly with increments of
128 bytes. realloc() generally performs internal copying - I read
somewhere that it is rare for actual implementations to extend the
block, though I suppose with an increment as small as 128 bytes
that might happen more often than not.

And that is so arranged because the _normal_ usage is to acquire
inter-active lines. These are not expected to be especially large,
and will probably normally not exceed the initial 112 byte
assignment, and almost never exceed the 1st reallocation of 240
bytes. However the system CAN copy with very large requirements,
although not especially quickly. Again, such speed is totally
useless in interactive work. This was all selected quite
deliberately, it didn't just happen.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 31 '07 #47

Malcolm McLean

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...

Malcolm McLean wrote:
>>
... snip ...
>>
As it stands ggets() calls realloc repeatedly with increments of
128 bytes. realloc() generally performs internal copying - I read
somewhere that it is rare for actual implementations to extend the
block, though I suppose with an increment as small as 128 bytes
that might happen more often than not.

And that is so arranged because the _normal_ usage is to acquire
inter-active lines. These are not expected to be especially large,
and will probably normally not exceed the initial 112 byte
assignment, and almost never exceed the 1st reallocation of 240
bytes. However the system CAN copy with very large requirements,
although not especially quickly. Again, such speed is totally
useless in interactive work. This was all selected quite
deliberately, it didn't just happen.

Sure. But this is production code, not teaching code.
Someone could tie up system resources for a long period, on my system if not
on Richard Tobin's far superior one, by piping a maliciously long line to
stdin. If you can fix that without any other implications, such as limiting
line length or complicating the interface, you should do it. The penalty is
only a couple of extra lines of code, to increment delta by 10% on each
pass, maybe after the first two or three expansions, if you want to maintain
the 128-size block for speed of normal lines.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 31 '07 #48

Richard Tobin

In article <46***************@yahoo.com>,
CBFalconer <cb********@maineline.netwrote:

>And that is so arranged because the _normal_ usage is to acquire
inter-active lines. These are not expected to be especially large,
and will probably normally not exceed the initial 112 byte
assignment, and almost never exceed the 1st reallocation of 240
bytes.

I would have thought it was just as useful for non-interactive reading
of files. I often encounter text files with absurdly long lines
(megabytes) - but these are XML files, and are normally read with an
XML parser, so I suppose it's not a compelling example.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Aug 31 '07 #49

Ian Collins

Malcolm McLean wrote:

>
"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>Malcolm McLean wrote:
>>>
... snip ...
>>>
As it stands ggets() calls realloc repeatedly with increments of
128 bytes. realloc() generally performs internal copying - I read
somewhere that it is rare for actual implementations to extend the
block, though I suppose with an increment as small as 128 bytes
that might happen more often than not.

And that is so arranged because the _normal_ usage is to acquire
inter-active lines. These are not expected to be especially large,
and will probably normally not exceed the initial 112 byte
assignment, and almost never exceed the 1st reallocation of 240
bytes. However the system CAN copy with very large requirements,
although not especially quickly. Again, such speed is totally
useless in interactive work. This was all selected quite
deliberately, it didn't just happen.

Sure. But this is production code, not teaching code.

Doesn't matter, it is a good choice for most situations, libraries are
full of compromises.

Someone could tie up system resources for a long period, on my system if
not on Richard Tobin's far superior one, by piping a maliciously long
line to stdin.

Most systems would be I/O bound (where do you keep the long line?), not
CPU bound in this case.

--
Ian Collins.

Aug 31 '07 #50

How to force fscanf to find only data on a single input line?

Similar topics