By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,799 Members | 1,356 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,799 IT Pros & Developers. It's quick & easy.

fseek speed

P: n/a
Hello All,

I am baffled ... I am trying to improve the speed of a program
that I have written that performs random access within a file. It
relies heavily on fseek and is very slow. To test, I wrote the following
test program which just writes the numbers 1-167721 sequentially to
a binary file:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

static long MAXNUM = 16777214;

int main() {
FILE *fp;
long i, tmp;
int j;
unsigned char c;

fp = NULL;
if ((fp = fopen("test.out", "w+b")) == NULL) {
fprintf(stderr, "Failed.\n");
exit(22);
}

fprintf(stderr, "Writing ... ");
for (i = 0; i < MAXNUM/10; i++) {
//fseek(fp, 0, SEEK_CUR);
tmp = i;
for (j = 0; j < 3; j++) {
c = (unsigned char)(tmp % 256);
tmp /= 256;
fwrite(&c, 1, 1, fp);
}
}
fprintf(stderr, "done.\n");

fclose(fp);

return 0;
}
When compiled and run on a linux-2.4.## system and an old DOS system it
is very fast. Now if you uncomment the line that says fseek(fp, 0,
SEEK_CUR), it runs 17x slower!

Is there anyway to improve on the speed hit incurred by the call to
fseek?

Thanks in advance for any thoughts,
TJ Walls
Ph.D. Candidate - Physics Dept. Stony Brook University
Nov 14 '05 #1
Share this Question
Share on Google+
15 Replies


P: n/a

Ooops ... of course I meant 'writes the numbers 0-167720' ...
Nov 14 '05 #2

P: n/a
TJ Walls wrote:
[...]
for (i = 0; i < MAXNUM/10; i++) {
//fseek(fp, 0, SEEK_CUR);
tmp = i;
for (j = 0; j < 3; j++) {
c = (unsigned char)(tmp % 256);
tmp /= 256;
fwrite(&c, 1, 1, fp);
}
} [...] When compiled and run on a linux-2.4.## system and an old DOS system it
is very fast. Now if you uncomment the line that says fseek(fp, 0,
SEEK_CUR), it runs 17x slower!

Is there anyway to improve on the speed hit incurred by the call to
fseek?


The fseek() call you have is, in essence, a no-op. You are fseek'ing to
the current location.

However, a side-effect of the fseek is the flushing of the buffer. Without
the fseek(), your output will (actually, I suppose "can" is correct in the
general sense) be buffered, and only written when the buffer fille. With
the fseek(), you are forcing the buffer to be written for every character.

What is your purpose for using fseek() here?

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody at spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+

Nov 14 '05 #3

P: n/a
What is your purpose for using fseek() here?


My purpose for putting the fseek() here is to test
what kind of time hit fseek() is giving me (in this case, I thought it
should be a no-op too ... and thus give me a time penalty of 0, but in
fact the cost is HUGE). In my real program I am fseeking to various
places in the file and reading 3 bytes of data, so the fseek has a purpose
there, but it runs very slowly and I am trying to figure out a way to
speed it up.
-TJ Walls
Nov 14 '05 #4

P: n/a
"TJ Walls" <tj*****@mindspring.nospam.com> writes:
Is there anyway to improve on the speed hit incurred by the call to
fseek?


Generally, map the file into memory with mmap() on Unix systems.

--
Brian Gough

Network Theory Ltd,
Publishing Free Software Manuals --- http://www.network-theory.co.uk/
Nov 14 '05 #5

P: n/a
On Tue, 27 Jul 2004 14:42:54 -0400, TJ Walls wrote:
What is your purpose for using fseek() here?


My purpose for putting the fseek() here is to test
what kind of time hit fseek() is giving me (in this case, I thought it
should be a no-op too ... and thus give me a time penalty of 0, but in
fact the cost is HUGE). In my real program I am fseeking to various
places in the file and reading 3 bytes of data, so the fseek has a purpose
there, but it runs very slowly and I am trying to figure out a way to
speed it up.

If you only read few bytes at a time, you could try setting the stream
to be unbuffred (setvbuf(..))

Nov 14 '05 #6

P: n/a
TJ Walls wrote:

[...] In my real program I am fseeking to various
places in the file and reading 3 bytes of data, so the fseek has a purpose
there, but it runs very slowly and I am trying to figure out a way to
speed it up.


The file in your test program was only about 160Kbytes
long. If your actual file is of similar size, or even a
hundred or so times larger, you might do well to read a
copy of the whole thing into memory (sequentially) and
access the data from there.

(Pedants may advance a number of reasons why you should
not do this. There's no fully portable way to discover the
size of a file prior to reading it, there's no guarantee
that the C implementation can handle objects of that size,
and the C language Standard says nothing about the relative
speeds of sequential and random input. Pay attention to
such arguments only long enough to ascertain that you can
safely ignore them.)

--
Er*********@sun.com

Nov 14 '05 #7

P: n/a
In article <41***************@spamcop.net>,
Kenneth Brody <ke******@spamcop.net> wrote:
However, a side-effect of the fseek is the flushing of the buffer.


Is that necessary? It seems like an obvious optimisation to not flush
the buffer when the seek is to somewhere within the buffer.

-- Richard
Nov 14 '05 #8

P: n/a
In article <pa****************************@mindspring.nospam. com>,
TJ Walls <tj*****@mindspring.nospam.com> wrote:
In my real program I am fseeking to various
places in the file and reading 3 bytes of data


Your example program writes rather than reads...

For most operating systems, provided the size of the file is small
compared with the amount of real memory, I would expect caching in the
file system interface to avoid real disk accesses once the file has
all been read. There will still be the overhead of copying between
the cache and your program; you may be able to avoid that by using
some (system-specific) method to map the file into your program's
memory.

-- Richard
Nov 14 '05 #9

P: n/a

Thanks for all the responses! I should mention that the system
I find myself programming on is quite limited ...
In my real program I am fseeking to various
places in the file and reading 3 bytes of data
Your example program writes rather than reads...


True ... sorry for the slight ambiguity. My original test program
did both, but I thought for the sake of example size I would delete
the bottom half (and I think my example illustrates my point?). :)
For most operating systems, provided the size of the file is small
compared with the amount of real memory, I would expect caching in the
file system interface to avoid real disk accesses once the file has
all been read. There will still be the overhead of copying between
the cache and your program; you may be able to avoid that by using
some (system-specific) method to map the file into your program's
memory.


This is the unforuntate part ... The system I am working on is running
DOS 6.22 with 256K of RAM. I was hoping to _not_ have to map the file into
memory, but if that is my only choice, I guess I'll have to make it work.

Any other ideas would be greatly appreciated ...

Sincerely,
TJ Walls
Ph.D. Candidate - Physics Dept. Stony Brook University
Nov 14 '05 #10

P: n/a
On Tue, 27 Jul 2004 21:30:00 +0000, Richard Tobin wrote:
In article <41***************@spamcop.net>,
Kenneth Brody <ke******@spamcop.net> wrote:
However, a side-effect of the fseek is the flushing of the buffer.


Is that necessary? It seems like an obvious optimisation to not flush
the buffer when the seek is to somewhere within the buffer.

-- Richard

True ... it still seems a little strange to me that the cost
of doing effectively nothing is so dramatic ...

-TJ
Nov 14 '05 #11

P: n/a
In article <pa****************************@mindspring.nospam. com>,
TJ Walls <tj*****@mindspring.nospam.com> wrote:
This is the unforuntate part ... The system I am working on is running
DOS 6.22 with 256K of RAM. I was hoping to _not_ have to map the file into
memory, but if that is my only choice, I guess I'll have to make it work.


I have no idea what DOS does for caching file data.

But if you seek around randomly in a file that's bigger than physical
memory, then you're going to have to do a lot of disk reads.

Some possibilities:

- Do you have to do the reads in a random order? If you could list
them all and put them in order, that would be much better. But if
one value points to the next, that won't work.

- Is it really random? Are some parts of the file used more often than
others? If so, it may be worth caching just those parts. Is there
any "locality of reference" (ie, are successive seeks to nearby parts
of the file)? If so, implementing your own cache of recently used
blocks will help.

Incidentally, I doubt that you can map files in DOS: it doesn't just
mean copying it into memory, it means having the operating system
automatically associate an area of memory with the file and keep the
two in sync, reading blocks as they are needed and writing them out
when modified.

-- Richard

Nov 14 '05 #12

P: n/a
"TJ Walls" <tj*****@mindspring.nospam.com> wrote:
# Hello All,
#
# I am baffled ... I am trying to improve the speed of a program
# that I have written that performs random access within a file. It
# relies heavily on fseek and is very slow. To test, I wrote the following
# test program which just writes the numbers 1-167721 sequentially to
# a binary file:

Library provided bufferring and random access don't usually go together
well. If you want to do this in ANSI C, you'll probably have better
performance if you turn off library buffering and do your own.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
So....that would make Bethany part black?
Nov 14 '05 #13

P: n/a
TJ Walls wrote:
Hello All,

I am baffled ... I am trying to improve the speed of a program
that I have written that performs random access within a file. It
relies heavily on fseek and is very slow. To test, I wrote the following
test program which just writes the numbers 1-167721 sequentially to
a binary file: [snip]


When compiled and run on a linux-2.4.## system and an old DOS system it
is very fast. Now if you uncomment the line that says fseek(fp, 0,
SEEK_CUR), it runs 17x slower!

Is there anyway to improve on the speed hit incurred by the call to
fseek?

Thanks in advance for any thoughts,
TJ Walls
Ph.D. Candidate - Physics Dept. Stony Brook University


In general, fseek is used to position the file pointer (cursor) to
an existing location in a file. A file is created containing data,
then the fseek function is applied to the file in read mode. The
fseek function can be applied in write mode to as if you were going
to overwrite data in the file.

To validate the fseek performance, you should use a better
benchmarking algorithm. For example, seeking to random positions
or using a butterfly pattern. A test should read the same amount of
data from the position. Buffering should be turned off, or the
algorithm should fseek to positions outside the size of the buffer.
Buffer, by the device, operating system or the C run-time library,
impacts the measurements of the fseek function. Most of the buffering
is to improve file accesses. Some harddrive controllers have large
buffers to reduce the number of seeks (thus wreaking havoc on your
experiment).

Also, remember that you will need to perform a huge amount of
iterations to get a stable average. Something over 1E+06 should
be enough to average out minor fluctuations.

Consult your applied statistics knowledge.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 14 '05 #14

P: n/a
TJ Walls wrote:

What is your purpose for using fseek() here?


My purpose for putting the fseek() here is to test
what kind of time hit fseek() is giving me (in this case, I thought it
should be a no-op too ... and thus give me a time penalty of 0, but in
fact the cost is HUGE). In my real program I am fseeking to various
places in the file and reading 3 bytes of data, so the fseek has a purpose
there, but it runs very slowly and I am trying to figure out a way to
speed it up.


The fseek(f,0,SEEK_CUR) basically _is_ a no-op, with the single (yet very
important in this case) exception of flushing the buffer.

To see that fseek() otherwise has virtually no impact on the speed of your
program, compare speeds in an unbuffered environment. (Use setbuf or
setvbuf to get rid of the buffer.)

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody at spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+

Nov 14 '05 #15

P: n/a
Normally seeking isn't such a huge impact. But if you seek at *every* byte
you read, it can indeed really slow down, thats not a big surprise.

--
cody

Freeware Tools, Games and Humour
http://www.deutronium.de.vu || http://www.deutronium.tk
"TJ Walls" <tj*****@mindspring.nospam.com> schrieb im Newsbeitrag
news:pa****************************@mindspring.nos pam.com...
Hello All,

I am baffled ... I am trying to improve the speed of a program
that I have written that performs random access within a file. It
relies heavily on fseek and is very slow. To test, I wrote the following
test program which just writes the numbers 1-167721 sequentially to
a binary file:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

static long MAXNUM = 16777214;

int main() {
FILE *fp;
long i, tmp;
int j;
unsigned char c;

fp = NULL;
if ((fp = fopen("test.out", "w+b")) == NULL) {
fprintf(stderr, "Failed.\n");
exit(22);
}

fprintf(stderr, "Writing ... ");
for (i = 0; i < MAXNUM/10; i++) {
//fseek(fp, 0, SEEK_CUR);
tmp = i;
for (j = 0; j < 3; j++) {
c = (unsigned char)(tmp % 256);
tmp /= 256;
fwrite(&c, 1, 1, fp);
}
}
fprintf(stderr, "done.\n");

fclose(fp);

return 0;
}
When compiled and run on a linux-2.4.## system and an old DOS system it
is very fast. Now if you uncomment the line that says fseek(fp, 0,
SEEK_CUR), it runs 17x slower!

Is there anyway to improve on the speed hit incurred by the call to
fseek?

Thanks in advance for any thoughts,
TJ Walls
Ph.D. Candidate - Physics Dept. Stony Brook University

Nov 14 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.