473,394 Members | 1,810 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

An exercise in fread optimisation

Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
memset(buf, '\0', BUFSIZE + 1);

FILE *file = fopen("stuff.txt", "rb");

while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
}
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}
-- END CODE --

Average CPU time: 140-156

BTW, stuff.txt is a 200MB binary file of randomness.

I looked at the memsets, and thought - this could maybe be faster.

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

FILE *file = fopen("stuff.txt", "rb");

// Prepare
fseek(file, 0 , SEEK_END);
int size = ftell(file);
int iterations = size / BUFSIZE;
int remaining = size % BUFSIZE;
rewind(file);

/*
printf("Size: %d\n", size);
printf("Iterations: %d\n", iterations);
printf("Remainder: %d\n", remaining);
*/

// Iterate
int i;
for (i = 0; i < iterations; i++) {
fread(buf, BUFSIZE, CHAR_SIZE, file);
//printf("%s", buf);
}

fread(buf, BUFSIZE, CHAR_SIZE, file);
buf[remaining] = '\0';
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}
-- END CODE --

Average CPU time: 125

What does everyone think? Could it be better?

Hope this was useful to someone.

Chris
Dec 10 '07 #1
10 1851
Khookie wrote:
Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
memset(buf, '\0', BUFSIZE + 1);

FILE *file = fopen("stuff.txt", "rb");

while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
That's a perverse way of expressing it, IMHO... You are specifying you
want one item (CHAR_SIZE is used where a count is expected) of size
BUFSIZE. I'm not sure whether fread is guaranteed to produce consistent
data when the file is a fractional multiple of BUFSIZE bytes long.
//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
}
[snip]
BTW, stuff.txt is a 200MB binary file of randomness.

I looked at the memsets, and thought - this could maybe be faster.
Why are you memset()ing at all?

If it's truly random, then it could contain a '\0' anywhere, surely?

So there's no point in trying to use '\0' as a terminator.

fread() will tell you how many items it read. If you used
fread(buf,CHAR_SIZE,BUFSIZE,file)
you would get a useful return code (the number of bytes read) from it.
Dec 10 '07 #2
On Dec 10, 6:55 am, Khookie <chris.k...@gmail.comwrote:
[snip]
#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
This is useless:
/*
memset(buf, '\0', BUFSIZE + 1);
*/
>
FILE *file = fopen("stuff.txt", "rb");
/* Create a 16K I/O buffer: */
setvbuf ( file , NULL , _IOFBF , 1024*16 );

/* We should check the return of setvbuf, as well as the file pointer
itself, of course. */
/* I will leave that to you. */
while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
}
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);}
[snip]
What does everyone think? Could it be better?
1. The memset() calls are totally pointless. You set the data buffer
to zero and then set it to the wanted value via reads. This is not
different than just setting the value via reads.
2. If you want to read faster, then enlarge the read buffer via
setvbuff(). It makes a bigger difference for writing rather than
reading, but it should reduce the total number of reads.
Dec 10 '07 #3
On 10 Dec, 14:55, Khookie <chris.k...@gmail.comwrote:
Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1
BUFSIZ is defined in stdio.h. It is selected (in part)
as the size that is most efficient for I/O on the
implementation. Unless you have a really good
reason not to, you should probably use it instead
of randomly selecting your own value.

Dec 10 '07 #4
On Dec 11, 6:40 am, William Pursell <bill.purs...@gmail.comwrote:
On 10 Dec, 14:55, Khookie <chris.k...@gmail.comwrote:
Hi everyone
BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).
Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:
-- START CODE --
#include <stdio.h>
#include <time.h>
#define BUFSIZE 32768
#define CHAR_SIZE 1

BUFSIZ is defined in stdio.h. It is selected (in part)
as the size that is most efficient for I/O on the
implementation. Unless you have a really good
reason not to, you should probably use it instead
of randomly selecting your own value.
whoops - thanks everyone for pointing mistakes out & giving me
suggestions... will definitely fix it

Chris
Dec 10 '07 #5
On Dec 11, 9:29 am, Khookie <chris.k...@gmail.comwrote:
On Dec 11, 6:40 am, William Pursell <bill.purs...@gmail.comwrote:
On 10 Dec, 14:55, Khookie <chris.k...@gmail.comwrote:
Hi everyone
BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).
Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:
-- START CODE --
#include <stdio.h>
#include <time.h>
#define BUFSIZE 32768
#define CHAR_SIZE 1
BUFSIZ is defined in stdio.h. It is selected (in part)
as the size that is most efficient for I/O on the
implementation. Unless you have a really good
reason not to, you should probably use it instead
of randomly selecting your own value.

whoops - thanks everyone for pointing mistakes out & giving me
suggestions... will definitely fix it

Chris
Here is a hopefully more sane version.

#include <stdio.h>
#include <time.h>

#define BUFSIZE BUFSIZ

int main() {
clock_t start = clock();

FILE* file = fopen("stuff.txt", "rb");

char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

int length;
while ((length = fread(buf, 1, BUFSIZE, file)) == BUFSIZE) {
printf(buf);
}
buf[length] = '\0';
printf(buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}
Dec 11 '07 #6
Khookie <ch********@gmail.comwrites:
On Dec 11, 9:29 am, Khookie <chris.k...@gmail.comwrote:
[...]
Here is a hopefully more sane version.

#include <stdio.h>
#include <time.h>

#define BUFSIZE BUFSIZ
That's a bit silly. Why not use BUFSIZ directly?
int main() {
Better: int main(void)
clock_t start = clock();

FILE* file = fopen("stuff.txt", "rb");
You open the file (in binary mode), but you don't check whether the
open succeeded. (It seems odd to open a file named "stuff.txt" in
binary mode, but it's not necessarily wrong.)
char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

int length;
while ((length = fread(buf, 1, BUFSIZE, file)) == BUFSIZE) {
fread() returns a size_t result. Why not assign that result to a
size_t variable?
printf(buf);
This is dangerous. You said before that the input file contains
random data (though that's not very specific; it could be random
bytes, random printable characters, random letters, or random
Shakespeare quotations). You pass buf as the format string to printf.
If buf happens to contain a '%' character, it will likely be
interpreted as a format specifier. Kaboom.

And if "stuff.txt" contains random binary data, it likely contains
null bytes ('\0'), which will terminate the string early.

In other words, you're reading data as if it were binary, and writing
it as if it were text (with no '%' characters). Be consistent.
}
buf[length] = '\0';
printf(buf);
See above.
fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
"%d" expects an argument of type int. ``finish - start'' is of type
clock_t. And since the value returned by clock() is scaled by
CLOCKS_PER_SEC, the value printed may not mean much anyway.

I'd add a "return 0;" here.
}
You mix declarations and statements within a block. This is a
C99-specific feature, not supported by all compilers.

--
Keith Thompson (The_Other_Keith) <ks***@mib.org>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Dec 11 '07 #7
[Yo, Keith. Just a little nit, here.]

Keith Thompson said:
Khookie <ch********@gmail.comwrites:
>>
#define BUFSIZE BUFSIZ

That's a bit silly.
No, it isn't.
Why not use BUFSIZ directly?
Because he might want to change it later.

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Dec 11 '07 #8
Richard Heathfield <rj*@see.sig.invalidwrites:
[Yo, Keith. Just a little nit, here.]

Keith Thompson said:
>Khookie <ch********@gmail.comwrites:
>>>
#define BUFSIZE BUFSIZ

That's a bit silly.

No, it isn't.
>Why not use BUFSIZ directly?

Because he might want to change it later.
Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.

--
Keith Thompson (The_Other_Keith) <ks***@mib.org>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Dec 11 '07 #9
On Dec 11, 8:45 pm, Keith Thompson <ks...@mib.orgwrote:
Richard Heathfield <r...@see.sig.invalidwrites:
[Yo, Keith. Just a little nit, here.]
Keith Thompson said:
Khookie <chris.k...@gmail.comwrites:
>#define BUFSIZE BUFSIZ
That's a bit silly.
No, it isn't.
Why not use BUFSIZ directly?
Because he might want to change it later.

Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.

--
Keith Thompson (The_Other_Keith) <ks...@mib.org>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Apologies - the above code is meant for a sockets application, hence
"#define BUFSIZE BUFSIZ" might look odd, especially in the context of
printf. I'm still trying to determine what makes an optimal buffer
size for sending data via sockets.

Chris
Dec 11 '07 #10
On Dec 11, 3:08 am, Khookie <chris.k...@gmail.comwrote:
On Dec 11, 8:45 pm, Keith Thompson <ks...@mib.orgwrote:


Richard Heathfield <r...@see.sig.invalidwrites:
[Yo, Keith. Just a little nit, here.]
Keith Thompson said:
>Khookie <chris.k...@gmail.comwrites:
>>#define BUFSIZE BUFSIZ
>That's a bit silly.
No, it isn't.
>Why not use BUFSIZ directly?
Because he might want to change it later.
Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.
--
Keith Thompson (The_Other_Keith) <ks...@mib.org>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Apologies - the above code is meant for a sockets application, hence
"#define BUFSIZE BUFSIZ" might look odd, especially in the context of
printf. I'm still trying to determine what makes an optimal buffer
size for sending data via sockets.
Ludicrously off-topic, but this is what you want for that:
http://dast.nlanr.net/Projects/Iperf/
Dec 11 '07 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Luc Holland | last post by:
Hey, I'm working on a program that reads a binary file. It's opened with ==== if ((f1=fopen(argv,"rb"))==NULL) { fprintf(stderr,"Error opening %s for reading . . .\n",argv); exit(2); } ====...
10
by: Alain Lafon | last post by:
Helas, I got something that should be a minor problem, but anyhow it isn't to me right now. A little code fragment: fread(&file_qn, x, 1, fp_q); The corresponding text file looks like...
6
by: Patrice Kadionik | last post by:
Hi all, I want to make a brief comparison between read() and fread() (under a Linux OS). 1. Family read and Co: open, close, read, write, ioctl... 2. Family fread and Co: fopen, fclose,...
14
by: Pete | last post by:
Is anyone familiar with this book? Exercise 6-1 of Accelerated C++ asks us to reimplement the frame() and hcat() operations using iterators. I've posted my answers below, but I'm wondering if...
13
by: 010 010 | last post by:
I found this very odd and maybe someone can explain it to me. I was using fread to scan through a binary file and pull bytes out. In the middle of a while loop, for no reason that i could...
5
by: David Mathog | last post by:
When reading a binary input stream with fread() one can read N bytes in two ways : count=fread(buffer,1,N,fin); /* N bytes at a time */ or count=fread(buffer,N,1,fin); /* 1 buffer at a...
5
by: howa | last post by:
are there any advantage in replacing all fread() operations with file_get_contents() ? i.e. file_get_contents("/usr/local/something.txt") VS $filename = "/usr/local/something.txt";
26
by: arnuld | last post by:
this is the programme i created, for exercise 2, assignment 3 at http://www.eskimo.com/~scs/cclass/asgn.beg/PS2.html it runs fine. i wanted to know if it needs any improvement: ...
20
by: ericunfuk | last post by:
If fseek() always clears EOF, is there a way for me to fread() from an offset of a file and still be able to detect EOF?i.e. withouting using fseek(). I also need to seek to an offset in the file...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.