By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,587 Members | 1,070 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,587 IT Pros & Developers. It's quick & easy.

Reading long lines from a file

P: n/a
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

Thanks in advance,
Vlad Dogaru

--
Number one reason to date an engineer:
The world does revolve around us; we pick the coordinate system.
Aug 14 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Vlad Dogaru said:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal
size 2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P[u] = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations, see
http://www.cpax.org.uk/prg/writings/fgetdata.php

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 14 '07 #2

P: n/a
Vlad Dogaru wrote:
>
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this?
Possibly with the phrase "statically allocated".
There's three kinds of duration:
1 automatic
2 static
3 allocated

Only allocated memory can be reallocated.
If so, how can I improve it?
A few of the regulars here
have written their own getline functions:
http://www.cpax.org.uk/prg/writings/...ta.php#related

--
pete
Aug 14 '07 #3

P: n/a
Richard Heathfield wrote:
Vlad Dogaru said:
>Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal
size 2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

But your solution is much more elegant and now I see why fgets is
unnecessary.
>
Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P[u] = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations, see
http://www.cpax.org.uk/prg/writings/fgetdata.php
Thank you for the clarification and the link. I will look into it and I
am confident that I can write a similar function.

Vlad
--
Number one reason to date an engineer:
The world does revolve around us; we pick the coordinate system.
Aug 14 '07 #4

P: n/a
Vlad Dogaru wrote:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?
This may not apply to your particular case, but in some instances I have
encountered with "arbitrarily long lines" one can just read a character
at a time, examine it, perform some action, and then continue. This
removes the need for a huge buffer, which in the worst case, might not
even fit into the computer's memory. Obviously this won't work if any
modification to the front of the line depends on a value near the end of
the line.

If you do go with the expanding buffer method be sure you that you do
NOT use strcat() to append each new chunk of text. Doing so will result
in each such addition scanning from the front of the buffer for the
terminal '\0' in the string. I've seen this bug many, many times.
It can cause a huge performance hit. Instead, keep track of the
length of the string in the buffer and just copy the new string directly
to the appropriate position, then adjust the length variable, and repeat.

Regards,

David Mathog

Aug 14 '07 #5

P: n/a
Vlad Dogaru wrote, On 14/08/07 11:46:
Richard Heathfield wrote:
<snip>
>To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).
Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not. You can only pass pointers
returned by malloc or realloc to realloc.

Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.

<snip>
--
Flash Gordon
Aug 14 '07 #6

P: n/a
On 2007-08-14 17:43, Flash Gordon <sp**@flash-gordon.me.ukwrote:
Vlad Dogaru wrote, On 14/08/07 11:46:
>Richard Heathfield wrote:
>>To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.
[...]
>If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not.
We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly ;-).
Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.
ACK. But that's probably not something which should be hard-coded into
the application. After all, the program might run on a machine with 64
GB RAM where 5 GB of memory usage is quite acceptable. You could use a
configurable limit or rely on OS features to limit memory consumption
(e.g. ulimit on unixoid systems).

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Aug 20 '07 #7

P: n/a
On Aug 20, 1:57 pm, "Peter J. Holzer" <hjp-usen...@hjp.atwrote:
On 2007-08-14 17:43, Flash Gordon <s...@flash-gordon.me.ukwrote:
Vlad Dogaru wrote, On 14/08/07 11:46:
Richard Heathfield wrote:
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.
[...]
If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).
Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not.

We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly ;-).
Reading Flash Gordon's post I don't see him assuming anything.
He was simply aiming to cover all possibilities and I'm all for
that ; we do aim to be accurate around here.

Aug 20 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.