473,695 Members | 2,577 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Reading whole text files

Cheerio,
I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.
- Default: Use fgets(); ugly, if we are not interested in lines
and have many newline characters to read.
- Interesting: fscanf("%"XSTR( BUFLEN)"c%n", curr, &read), where
XSTR(BUFLEN) gives me BUFLEN in a string literal.

From the labels, it is pretty obvious that I would favour the
last one, so there is the question about possible pitfalls
(yes, I will use the return value and "read") and whether there
are environmental limits for BUFLEN.
If I missed some obvious source (looking for the wrong sort of
stuff in the FAQ and google archives), then please point me
toward it :-)
Regards,
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 14 '05 #1
50 4943
Michael Mair wrote:

Cheerio,

I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.
Why inefficient? I'd prefer getc in case you're fortunate enough
to have it implemented as a macro, but it should be efficient
enough.
- Default: Use fgets(); ugly, if we are not interested in lines
and have many newline characters to read.
And you have to maintain /two/ buffers (quite apart from the buffer
maintained by your text stream handler) - your expanding buffer,
and the buffer you give to fgets (unless you use the expanding
buffer for that too, which is certainly doable but probably gives
you more headaches).
- Interesting: fscanf("%"XSTR( BUFLEN)"c%n", curr, &read), where
XSTR(BUFLEN) gives me BUFLEN in a string literal.

From the labels, it is pretty obvious that I would favour the
last one,


Fine, so use that. But it wouldn't be my choice.

Vive la difference!
Nov 14 '05 #2
infobahn wrote:
Michael Mair wrote:
Cheerio,

I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially , I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.
Why inefficient? I'd prefer getc in case you're fortunate enough
to have it implemented as a macro, but it should be efficient
enough.


"Probably" inefficient in that I cannot rely on getc() being
implemented as a macro and that I do not want to make assumptions
about the underlying library. So, essentially, the question is
for me whether having a loop in my code is "better" than just
telling fscanf() to get, say 8K characters in one go.
The main beauty of this approach lies for me in the clarity of the
code. Thanks for reminding me of getc() vs. fgetc().
- Default: Use fgets(); ugly, if we are not interested in lines
and have many newline characters to read.


And you have to maintain /two/ buffers (quite apart from the buffer
maintained by your text stream handler) - your expanding buffer,
and the buffer you give to fgets (unless you use the expanding
buffer for that too, which is certainly doable but probably gives
you more headaches).


Actually, I have implemented it first with fgets() and one extending
buffer but found, looking at the final code, that approach too unwieldy
and error prone, as you need more code and variables.
Usually, I would have gone for the "Low" approach due to the clarity
of the resulting code but -- as I was at it -- I just asked myself
which options do I have.

- Interesting: fscanf("%"XSTR( BUFLEN)"c%n", curr, &read), where
XSTR(BUFLEN) gives me BUFLEN in a string literal.

From the labels, it is pretty obvious that I would favour the
last one,


Fine, so use that. But it wouldn't be my choice.


I _was_ asking for opinions.

Vive la difference!


:-)
Thank you for your input!
Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

Nov 14 '05 #3
Michael Mair wrote:
Cheerio,
I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.
- Default: Use fgets(); ugly, if we are not interested in lines
and have many newline characters to read.
- Interesting: fscanf("%"XSTR( BUFLEN)"c%n", curr, &read), where
XSTR(BUFLEN) gives me BUFLEN in a string literal.

From the labels, it is pretty obvious that I would favour the
last one, so there is the question about possible pitfalls
(yes, I will use the return value and "read") and whether there
are environmental limits for BUFLEN.
If I missed some obvious source (looking for the wrong sort of
stuff in the FAQ and google archives), then please point me
toward it :-)
Regards,
Michael


What about this?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *ReadFileIntoRa m(char *fname,int *plen)
{
FILE *infile;
char *contents;
int actualBytesRead =0;
unsigned int len;

infile = fopen(fname,"rb ");
if (infile == NULL) {
fprintf(stderr, "impossible to open %s\n",fname);
return NULL;
}
fseek(infile,0, SEEK_END);
len = ftell(infile);
fseek(infile,0, SEEK_SET);
contents = calloc(len+1,1) ;
if (contents) {
actualBytesRead = fread(contents, 1,len,infile);
}
else {
fprintf(stderr, "Can't allocate memory to read the file\n");
}
fclose(infile);
*plen = actualBytesRead ;
return contents;
}

int main(int argc,char *argv[])
{
if (argc < 2) {
printf("usage: readfile <filename>\n" );
exit(1);
}
int len=0;
char *contents=ReadF ileIntoRam(argv[1],&len);
// work with the contents of the file
}
Nov 14 '05 #4


jacob navia wrote:
Michael Mair wrote:
Cheerio,
I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.
- Default: Use fgets(); ugly, if we are not interested in lines
and have many newline characters to read.
- Interesting: fscanf("%"XSTR( BUFLEN)"c%n", curr, &read), where
XSTR(BUFLEN) gives me BUFLEN in a string literal.

From the labels, it is pretty obvious that I would favour the
last one, so there is the question about possible pitfalls
(yes, I will use the return value and "read") and whether there
are environmental limits for BUFLEN.
If I missed some obvious source (looking for the wrong sort of
stuff in the FAQ and google archives), then please point me
toward it :-)
Regards,
Michael

What about this?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *ReadFileIntoRa m(char *fname,int *plen)
{
FILE *infile;
char *contents;
int actualBytesRead =0;
unsigned int len;

infile = fopen(fname,"rb ");


Here is the crux: I want/have to work with a _text_ file.
Everything else may give me wrong results.
if (infile == NULL) {
fprintf(stderr, "impossible to open %s\n",fname);
return NULL;
}
fseek(infile,0, SEEK_END);
len = ftell(infile);
fseek(infile,0, SEEK_SET);
contents = calloc(len+1,1) ;
if (contents) {
actualBytesRead = fread(contents, 1,len,infile);
This is what I would do for binary files.
Essentially, I am looking for the text file equivalent of fread().
}
else {
fprintf(stderr, "Can't allocate memory to read the file\n");
}
fclose(infile);
*plen = actualBytesRead ;
return contents;
}

int main(int argc,char *argv[])
{
if (argc < 2) {
printf("usage: readfile <filename>\n" );
exit(1);
}
int len=0;
char *contents=ReadF ileIntoRam(argv[1],&len);
// work with the contents of the file
}


Thank you for trying :-)
Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

Nov 14 '05 #5
infobahn <in******@btint ernet.com> wrote:
Michael Mair wrote:

Cheerio,

I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.
Why inefficient? I'd prefer getc in case you're fortunate enough
to have it implemented as a macro, but it should be efficient
enough.


In thread-safe libraries getc() family functions can actually
be quite inefficient, because they must lock the stream object,
which takes time. This is the reason why some systems provide
getc_unlocked() (thread-unsafe) family (I remember a noticeable
difference between them in my tests some time ago).

+++

Excuse my ignorance, I have no experience with text files in
the C Std context. Why wouldn't fread() be suitable for
reading text files? In 7.19.8p2 it says the fread() call is
performed as if by use of fgetc() function in the bottom.
I haven't spotted any mention where these functions would be
constrained to binary streams only.

--
Stan Tobias
mailx `echo si***@FamOuS.Be dBuG.pAlS.INVALID | sed s/[[:upper:]]//g`
Nov 14 '05 #6


S.Tobias wrote:
infobahn <in******@btint ernet.com> wrote:
Michael Mair wrote:
Cheerio,

I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentiall y, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.

Why inefficient? I'd prefer getc in case you're fortunate enough
to have it implemented as a macro, but it should be efficient
enough.

In thread-safe libraries getc() family functions can actually
be quite inefficient, because they must lock the stream object,
which takes time. This is the reason why some systems provide
getc_unlocked() (thread-unsafe) family (I remember a noticeable
difference between them in my tests some time ago).


Interesting.
+++

Excuse my ignorance, I have no experience with text files in
the C Std context. Why wouldn't fread() be suitable for
reading text files? In 7.19.8p2 it says the fread() call is
performed as if by use of fgetc() function in the bottom.
I haven't spotted any mention where these functions would be
constrained to binary streams only.


It seems I am plain stupid... Somewhere in my brain, there was
"fread()/fwrite() <-> binary I/O" hardwired :-/
So, if I open the stream as text stream, everything should be
fine. (If this is wrong, please correct me.)
Moreover, if I read the data into dynamically allocated
storage pointed to by an unsigned char *, I circumvent potential
problems with the is** functions from <ctype.h> (as I asked in
another thread).

Thank you :-)
Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

Nov 14 '05 #7


Michael Mair wrote:


jacob navia wrote:
Michael Mair wrote:
Cheerio,
I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.
- Default: Use fgets(); ugly, if we are not interested in lines
and have many newline characters to read.
- Interesting: fscanf("%"XSTR( BUFLEN)"c%n", curr, &read), where
XSTR(BUFLEN) gives me BUFLEN in a string literal.

From the labels, it is pretty obvious that I would favour the
last one, so there is the question about possible pitfalls
(yes, I will use the return value and "read") and whether there
are environmental limits for BUFLEN.
If I missed some obvious source (looking for the wrong sort of
stuff in the FAQ and google archives), then please point me
toward it :-)
Regards,
Michael
What about this?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *ReadFileIntoRa m(char *fname,int *plen)
{
FILE *infile;
char *contents;
int actualBytesRead =0;
unsigned int len;

infile = fopen(fname,"rb ");

Here is the crux: I want/have to work with a _text_ file.
Everything else may give me wrong results.


Sorry, the "b" brought me back onto the wrong track I already
was on. See the other subthread.
Cheers
Michael
if (infile == NULL) {
fprintf(stderr, "impossible to open %s\n",fname);
return NULL;
}
fseek(infile,0, SEEK_END);
len = ftell(infile);
fseek(infile,0, SEEK_SET);
contents = calloc(len+1,1) ;
if (contents) {
actualBytesRead = fread(contents, 1,len,infile);

This is what I would do for binary files.
Essentially, I am looking for the text file equivalent of fread().
}
else {
fprintf(stderr, "Can't allocate memory to read the file\n");
}
fclose(infile);
*plen = actualBytesRead ;
return contents;
}

int main(int argc,char *argv[])
{
if (argc < 2) {
printf("usage: readfile <filename>\n" );
exit(1);
}
int len=0;
char *contents=ReadF ileIntoRam(argv[1],&len);
// work with the contents of the file
}

Thank you for trying :-)
Cheers
Michael

--
E-Mail: Mine is a gmx dot de address.

Nov 14 '05 #8
Michael Mair <Mi**********@i nvalid.invalid> wrote:
# Cheerio,
#
#
# I would appreciate opinions on the following:
#
# Given the task to read a _complete_ text file into a string:
# What is the "best" way to do it?
# Handling the buffer is not the problem -- the character
# input is a different matter, at least if I want to remain within
# the bounds of the standard library.
#
# Essentially, I can think of three variants:
# - Low: Use fgetc(). Simple, straightforward , probably inefficient.

char *contents=0; int m=0,n=0,ch;
while ((ch=fgetc(file ))!=EOF) {
if (n+2>=m) {m = 2*n+2; contents = realloc(content s,m);}
contents[n++] = ch; contents[n] = 0;
}
contents = realloc(content s,n+1);

You might also include #ifdef/#endif code to use memory mapping on systems
that support it.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
This is one wacky game show.
Nov 14 '05 #9


Michael Mair wrote:
Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward , probably inefficient.

Why inefficient? I'd prefer getc in case you're fortunate enough
to have it implemented as a macro, but it should be efficient
enough.

"Probably" inefficient in that I cannot rely on getc() being
implemented as a macro and that I do not want to make assumptions
about the underlying library. So, essentially, the question is
for me whether having a loop in my code is "better" than just
telling fscanf() to get, say 8K characters in one go.
The main beauty of this approach lies for me in the clarity of the
code. Thanks for reminding me of getc() vs. fgetc().
- Default: Use fgets(); ugly, if we are not interested in lines
and have many newline characters to read.


My intuition is the the definition of a "_complete_ " text file
would require the "ugly". Hence, I would use function fgets in
a loop.

And you have to maintain /two/ buffers (quite apart from the buffer
maintained by your text stream handler) - your expanding buffer,
and the buffer you give to fgets (unless you use the expanding
buffer for that too, which is certainly doable but probably gives
you more headaches).

Actually, I have implemented it first with fgets() and one extending
buffer but found, looking at the final code, that approach too unwieldy
and error prone, as you need more code and variables.


Use fgets to copy into a buffer. And, then append to a
expanding dynamically allocated char array. This is not unwieldy.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void)
{
char buffer[128],*fstr, *tmp;
size_t slen, blen;
FILE *fp;

if((fp = fopen("test.c", "r")) == NULL) exit(EXIT_FAILU RE);
for(slen = 0, fstr = NULL;
(fgets(buffer,s izeof buffer, fp)) ; slen+=blen)
{
blen = strlen(buffer);
if((tmp = realloc(fstr,sl en+blen+1)) == NULL)
{
free(fstr);
exit(EXIT_FAILU RE);
}
if(slen == 0) *tmp = '\0';
fstr = tmp;
strcat(fstr,buf fer);
}
fclose(fp);
puts(fstr);
free(fstr);
return 0;
}
--
Al Bowers
Tampa, Fl USA
mailto: xa******@myrapi dsys.com (remove the x to send email)
http://www.geocities.com/abowers822/

Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
2235
by: Michael | last post by:
Hi, I moved to c++ from c, and wanted to know what the best way to read data from files is in c++. Any thoughts? fscanf() is possible but fairly painful! Regards Michael
8
18231
by: Phil Slater | last post by:
I'm trying to process a collection of text files, reading word by word. The program run hangs whenever it encounters a word with an accented letter (like rôle or passé) - ie something that's not a "char" with an ASCII code in 0..127 I've searched the ANSI C++ standard, the internet and various text books, but can't see how to workaround this one. I've tried wchar_t and wstring without success. But rather than spending lots of time on...
19
10315
by: Lionel B | last post by:
Greetings, I need to read (unformatted text) from stdin up to EOF into a char buffer; of course I cannot allocate my buffer until I know how much text is available, and I do not know how much text is available until I have read it... which seems to imply that multiple reads of the input stream will be inevitable. Now I can correctly find the number of characters available by: |
2
3061
by: jimmyfishbean | last post by:
Hi, I am using VB6, SAX (implementing IVBSAXContentHandler). I need to extract binary encoded data (images) from large XML files and decode this data and generate the appropriate images onto disk. My XML files have the following structure: <?xml version="1.0" encoding="utf-8" ?> <imagepla xmlns:dt="urn:schemas-microsoft-com:datatypes">
4
5970
by: Matthew Crema | last post by:
Hello, Say I have 1000 text files and each is a list of 32768 integers. I have written a C program to read this data into a large matrix. I am using fopen in combination with fscanf to read the data in. However, it takes about 20 seconds to complete and I wonder if there is a faster way. For example, I found that I could use 'fread' to read the data into a string that looks like this:
3
9867
by: vbnewbie | last post by:
I've been testing some of the example streamreads on 40 meg text file. I notice that it take awhile to read in debug.write and textbox1.text multi line. but if i use a textpad program, example editplus, it loads a lot quicker. My question is... what is the limitation here or is there something that can be done different for faster reading?
7
5040
by: Dmitry Akselrod | last post by:
Hello everyone, I am attempting to extract some header information from typical Microsoft Outlook MSG files in VB.NET. I am not after a complete message or attachments that may be enclosed. I am particularly interested in the Message ID field. I have examined MSG files in notepad and hex editors. I can see that the Internet Headers are there and present. I can do a search for Message-ID and locate it without any problems in notepad....
14
1667
by: noridotjabi | last post by:
Two questions. 1)Is there any way that I can read from an executable and then execute what I have read. EXAMPLE: text text this is more text
10
3171
by: lancer6238 | last post by:
Hi all, I'm having programs reading from files. I have a text file "files.txt" that contains the names of the files to be opened, i.e. the contents of files.txt are Homo_sapiens.fa Rattus_norvegicus.fa (They are FA files that can be opened in any text editor.)
0
8619
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
8826
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8818
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7652
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6487
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5832
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4575
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2996
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
1971
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.