473,378 Members | 1,346 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Reading long lines from a file

Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

Thanks in advance,
Vlad Dogaru

--
Number one reason to date an engineer:
The world does revolve around us; we pick the coordinate system.
Aug 14 '07 #1
7 2603
Vlad Dogaru said:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal
size 2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P[u] = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations, see
http://www.cpax.org.uk/prg/writings/fgetdata.php

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 14 '07 #2
Vlad Dogaru wrote:
>
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this?
Possibly with the phrase "statically allocated".
There's three kinds of duration:
1 automatic
2 static
3 allocated

Only allocated memory can be reallocated.
If so, how can I improve it?
A few of the regulars here
have written their own getline functions:
http://www.cpax.org.uk/prg/writings/...ta.php#related

--
pete
Aug 14 '07 #3
Richard Heathfield wrote:
Vlad Dogaru said:
>Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal
size 2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

But your solution is much more elegant and now I see why fgets is
unnecessary.
>
Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P[u] = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations, see
http://www.cpax.org.uk/prg/writings/fgetdata.php
Thank you for the clarification and the link. I will look into it and I
am confident that I can write a similar function.

Vlad
--
Number one reason to date an engineer:
The world does revolve around us; we pick the coordinate system.
Aug 14 '07 #4
Vlad Dogaru wrote:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?
This may not apply to your particular case, but in some instances I have
encountered with "arbitrarily long lines" one can just read a character
at a time, examine it, perform some action, and then continue. This
removes the need for a huge buffer, which in the worst case, might not
even fit into the computer's memory. Obviously this won't work if any
modification to the front of the line depends on a value near the end of
the line.

If you do go with the expanding buffer method be sure you that you do
NOT use strcat() to append each new chunk of text. Doing so will result
in each such addition scanning from the front of the buffer for the
terminal '\0' in the string. I've seen this bug many, many times.
It can cause a huge performance hit. Instead, keep track of the
length of the string in the buffer and just copy the new string directly
to the appropriate position, then adjust the length variable, and repeat.

Regards,

David Mathog

Aug 14 '07 #5
Vlad Dogaru wrote, On 14/08/07 11:46:
Richard Heathfield wrote:
<snip>
>To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).
Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not. You can only pass pointers
returned by malloc or realloc to realloc.

Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.

<snip>
--
Flash Gordon
Aug 14 '07 #6
On 2007-08-14 17:43, Flash Gordon <sp**@flash-gordon.me.ukwrote:
Vlad Dogaru wrote, On 14/08/07 11:46:
>Richard Heathfield wrote:
>>To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.
[...]
>If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not.
We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly ;-).
Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.
ACK. But that's probably not something which should be hard-coded into
the application. After all, the program might run on a machine with 64
GB RAM where 5 GB of memory usage is quite acceptable. You could use a
configurable limit or rely on OS features to limit memory consumption
(e.g. ulimit on unixoid systems).

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Aug 20 '07 #7
On Aug 20, 1:57 pm, "Peter J. Holzer" <hjp-usen...@hjp.atwrote:
On 2007-08-14 17:43, Flash Gordon <s...@flash-gordon.me.ukwrote:
Vlad Dogaru wrote, On 14/08/07 11:46:
Richard Heathfield wrote:
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.
[...]
If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).
Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not.

We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly ;-).
Reading Flash Gordon's post I don't see him assuming anything.
He was simply aiming to cover all possibilities and I'm all for
that ; we do aim to be accurate around here.

Aug 20 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Alex Hopson | last post by:
I'm trying to read an html file from my local server into a string, I'm using the following code: $attfile = $attachment; //create filenames $file_name = basename ($attfile); $lines =...
3
by: Rajarshi Guha | last post by:
Hi I have a file containing 168092 lines (each line a single word) and when I use for line in f: s = s + line it takes for ages to read it all in - so long in fact that it makes the program...
2
by: adpsimpson | last post by:
Hi, I have a file which I wish to read from C++. The file, created by another programme, contains both text and numbers, all as ascii (it's a .txt file). A sample of the file is shown below: <<...
8
by: Andrew Robert | last post by:
Hi Everyone. I tried the following to get input into optionparser from either a file or command line. The code below detects the passed file argument and prints the file contents but the...
2
by: Scott Simpson | last post by:
I have a loop for line in f: ... and if the line is over about 10,000 characters it lops it off. How do I get around this?
12
bartonc
by: bartonc | last post by:
Here's something cool that I just discovered (on IE7, I wonder about the others): I was viewing a long code block with some really long lines in it. Since the horizontal scroll bar was WAY of my...
13
by: rizzie | last post by:
I am currently creating a program in vb6 that reads thousands of lines from a text file. So I use loop to read each line. It works perfectly but the problem occur when I run the program and try to...
2
by: Derik | last post by:
I've got a XML file I read using a file_get_contents and turn into a simpleXML node every time index.php loads. I suspect this is causing a noticeable lag in my page-execution time. (Or the...
2
by: friend.blah | last post by:
i have a text file lets say in this format abc abs ajfhg agjfh fhs ghg jhgjs fjhg dj djk djghd dkfdf .... .... ...... i want to read the first line at certain time for eg : at 10clk
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.