473,808 Members | 2,832 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Reading long lines from a file

Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

Thanks in advance,
Vlad Dogaru

--
Number one reason to date an engineer:
The world does revolve around us; we pick the coordinate system.
Aug 14 '07 #1
7 2635
Vlad Dogaru said:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal
size 2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P[u] = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations , see
http://www.cpax.org.uk/prg/writings/fgetdata.php

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 14 '07 #2
Vlad Dogaru wrote:
>
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this?
Possibly with the phrase "statically allocated".
There's three kinds of duration:
1 automatic
2 static
3 allocated

Only allocated memory can be reallocated.
If so, how can I improve it?
A few of the regulars here
have written their own getline functions:
http://www.cpax.org.uk/prg/writings/...ta.php#related

--
pete
Aug 14 '07 #3
Richard Heathfield wrote:
Vlad Dogaru said:
>Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal
size 2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

If anything, my lack of English skills has contributed to the
misunderstandin g. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

But your solution is much more elegant and now I see why fgets is
unnecessary.
>
Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P[u] = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations , see
http://www.cpax.org.uk/prg/writings/fgetdata.php
Thank you for the clarification and the link. I will look into it and I
am confident that I can write a similar function.

Vlad
--
Number one reason to date an engineer:
The world does revolve around us; we pick the coordinate system.
Aug 14 '07 #4
Vlad Dogaru wrote:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?
This may not apply to your particular case, but in some instances I have
encountered with "arbitraril y long lines" one can just read a character
at a time, examine it, perform some action, and then continue. This
removes the need for a huge buffer, which in the worst case, might not
even fit into the computer's memory. Obviously this won't work if any
modification to the front of the line depends on a value near the end of
the line.

If you do go with the expanding buffer method be sure you that you do
NOT use strcat() to append each new chunk of text. Doing so will result
in each such addition scanning from the front of the buffer for the
terminal '\0' in the string. I've seen this bug many, many times.
It can cause a huge performance hit. Instead, keep track of the
length of the string in the buffer and just copy the new string directly
to the appropriate position, then adjust the length variable, and repeat.

Regards,

David Mathog

Aug 14 '07 #5
Vlad Dogaru wrote, On 14/08/07 11:46:
Richard Heathfield wrote:
<snip>
>To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

If anything, my lack of English skills has contributed to the
misunderstandin g. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).
Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not. You can only pass pointers
returned by malloc or realloc to realloc.

Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.

<snip>
--
Flash Gordon
Aug 14 '07 #6
On 2007-08-14 17:43, Flash Gordon <sp**@flash-gordon.me.ukwro te:
Vlad Dogaru wrote, On 14/08/07 11:46:
>Richard Heathfield wrote:
>>To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.
[...]
>If anything, my lack of English skills has contributed to the
misunderstandi ng. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not.
We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly ;-).
Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.
ACK. But that's probably not something which should be hard-coded into
the application. After all, the program might run on a machine with 64
GB RAM where 5 GB of memory usage is quite acceptable. You could use a
configurable limit or rely on OS features to limit memory consumption
(e.g. ulimit on unixoid systems).

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
Aug 20 '07 #7
On Aug 20, 1:57 pm, "Peter J. Holzer" <hjp-usen...@hjp.atw rote:
On 2007-08-14 17:43, Flash Gordon <s...@flash-gordon.me.ukwro te:
Vlad Dogaru wrote, On 14/08/07 11:46:
Richard Heathfield wrote:
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.
[...]
If anything, my lack of English skills has contributed to the
misunderstandin g. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).
Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not.

We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly ;-).
Reading Flash Gordon's post I don't see him assuming anything.
He was simply aiming to cover all possibilities and I'm all for
that ; we do aim to be accurate around here.

Aug 20 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
2514
by: Alex Hopson | last post by:
I'm trying to read an html file from my local server into a string, I'm using the following code: $attfile = $attachment; //create filenames $file_name = basename ($attfile); $lines = file($attfile); //get file into array foreach ($lines as $line_num => $line) { //concatenate each line $fcontent.= $line; }
3
5209
by: Rajarshi Guha | last post by:
Hi I have a file containing 168092 lines (each line a single word) and when I use for line in f: s = s + line it takes for ages to read it all in - so long in fact that it makes the program unusable. Is there any way to do something like C's fread in Python so that I can just slurp in 1.7MB of data at one go, rather than
2
5249
by: adpsimpson | last post by:
Hi, I have a file which I wish to read from C++. The file, created by another programme, contains both text and numbers, all as ascii (it's a .txt file). A sample of the file is shown below: << LEDAR V1.3 - Real Time Detection >> <LEFT 144> <TOP 165> <RIGHT 265> <BOTTOM 376>
8
3522
by: Andrew Robert | last post by:
Hi Everyone. I tried the following to get input into optionparser from either a file or command line. The code below detects the passed file argument and prints the file contents but the individual swithces do not get passed to option parser.
2
1253
by: Scott Simpson | last post by:
I have a loop for line in f: ... and if the line is over about 10,000 characters it lops it off. How do I get around this?
12
1878
bartonc
by: bartonc | last post by:
Here's something cool that I just discovered (on IE7, I wonder about the others): I was viewing a long code block with some really long lines in it. Since the horizontal scroll bar was WAY of my screen while viewing the line in question, I started fiddling with my mouse buttons (if you don't have a scroll wheel yet, you are missing out, but center button may work). I clicked the scroll wheel and got a <-> looking gizmo on the screen. Moving the...
13
5492
by: rizzie | last post by:
I am currently creating a program in vb6 that reads thousands of lines from a text file. So I use loop to read each line. It works perfectly but the problem occur when I run the program and try to minimize the form or try to use another application. Seems that the form lost it focus and isnt responding though it is still in the state of processing the loop.
2
2847
by: Derik | last post by:
I've got a XML file I read using a file_get_contents and turn into a simpleXML node every time index.php loads. I suspect this is causing a noticeable lag in my page-execution time. (Or the wireless where I'm working could just be ungodly slow-- which it is.) Is reading a file much more resource/processor intensive than, say, including a .php file? What about the act of creating a simpleXML object? What about the act of checking the...
2
1994
by: friend.blah | last post by:
i have a text file lets say in this format abc abs ajfhg agjfh fhs ghg jhgjs fjhg dj djk djghd dkfdf .... .... ...... i want to read the first line at certain time for eg : at 10clk
0
9721
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9600
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10374
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10114
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7651
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6880
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5548
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4331
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3011
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.