473,399 Members | 3,302 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

fast multiple file access

Hello,

I am hoping that someone can answer a question or two regarding file
access. I have created an app that reads an image from a file then
displays it (using OpenGL). It works well using fopen() with fgetc()
to access each byte. I have decided to move further with this app and
allow the user to select the first file of an image sequence and it
will play the sequence back at at 24 frames per second. I have almost
everything worked out but am curious if using fgetc() is the fastest
and most efficient means of reading in data. Each image is at least
1.1 MB is file size. My main question is: When one uses fopen() does
this just provide a Pointer (via a stream) to the file on disk or does
this actually load the image into RAM and return a pointer to that?
Also, can anyone recommend a decent workflow for reading in files very
quickly (the files have to be parsed to retreive the image data as well
as the header info)? My drives are fast enough but I want to make sure
that I am not slowing things down with poor file access. Thanks for
any advice.

Cable

Nov 15 '05 #1
6 2784
In article <11**********************@g49g2000cwa.googlegroups .com>,
Cable <tr*********@yahoo.com> wrote:
I have almost
everything worked out but am curious if using fgetc() is the fastest
and most efficient means of reading in data.

It depends on how good the optimizer is.

Generally speaking, when you know you are reading a number of bytes,
fread() is faster, as it avoids the overhead of invoking fgetc()
each time. However, if you have a good optimizer then it might all
come out the same.

Each image is at least
1.1 MB is file size. My main question is: When one uses fopen() does
this just provide a Pointer (via a stream) to the file on disk or does
this actually load the image into RAM and return a pointer to that?
fopen() does NOT read any of the file. The first fgetc() or fread()
or equivilents will read the first bufferful into memory, according to
the size of buffer that has been configured. Subsequent fgetc()
or fread() read out of the in-memory buffer until they get to the
end of it, then read another bufferful, then go back to reading
out of memory, and so on.
The rest of this message gets into non-portable extensions.
Also, can anyone recommend a decent workflow for reading in files very
quickly (the files have to be parsed to retreive the image data as well
as the header info)? My drives are fast enough but I want to make sure
that I am not slowing things down with poor file access. I have created an app that reads an image from a file then
displays it (using OpenGL)


The OpenGL part is not part of the C standard, so you are already
using non-portable constructs. You need to decide how far into
non-portability you are willing to go. If you find that your
current fgetc() scheme isn't fast enough, and fread() isn't either,
then you should consider using system extensions such as:

- read() -- implemented on all Unix systems and many others

- open( O_DIRECT ) -- in association with read(), allows direct I/O
bypassing system buffers; not supported in all Unixes

- mmap() -- allows a file to be mapped into memory -- possibly more
common than O_DIRECT

- readv() -- allows scatter/gather I/O -- probably not particularily
common
- real-time filesystems such as via SGI's grio extensions and
XFS real-time volumes
- placing the files into a raw partition and handling the filesystem
management yourself

- writing your own device driver

- turning on Command Tag Queuing on SCSI devices

- pre-processing the files into raw data files that can be DMA'd
directly into a buffer suitable for passing to OpenGL

- read the files through once so as to bring their contents into
the system file cache, before starting the graphics process

- figuring out which part of your disk delivers data most quickly,
and ensuring that the files are written to that part of the disk

- when writing the files, figure out about how big they are
going to be, seek to that position, write a byte, and seek
back to the beginning and fill in the data. On many systems,
this will result in contiguous blocks being allocated for the
storage, whereas if you did the standard write of a buffer at
a time, the buffers could end up fragmented all over the disk

- pay attention to time needed to finish processing one file
and open the next, and to the relative positions on disk.
Ideally, when you issue the next read to disk, the disk block
you need should be the very next one that spins under the
head of the current track, so that there is no track-to-track
seek time and no time spent waiting for the appropriate sector
to spin around. This may require fetching information about the
drive geometry -- and for most SCSI disks, geometry is only
an approximation because there are variable number of sectors
per track (outer tracks hold more.)

- issue the largest read request that you can, so that the disk
can read as many consequative blocks as practical

- for SCSI disks, examine the bad-block information so as to
ensure that you aren't seeking wildly to a replacement block
in the middle of an important stream

- if you really get cramped for time, use a solid-state disk
if you can

I'm sure there are many additional disk optimization methods.
See if your OS has a tool named 'diskperf' available for it.

You may have noticed that nearly all of these optimizations are
system and/or hardware specific. The C language itself is
not concerned with filesystem representations or I/O
optimization.
--
"Never install telephone wiring during a lightning storm." -- Linksys
Nov 15 '05 #2
>In article <11**********************@g49g2000cwa.googlegroups .com>,
Cable <tr*********@yahoo.com> wrote:
... My main question is: When one uses fopen() does
this just provide a Pointer (via a stream) to the file on disk or does
this actually load the image into RAM and return a pointer to that?

In article <d9**********@canopus.cc.umanitoba.ca>
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:fopen() does NOT read any of the file.
Unless, of course, it does. (Consider, e.g., FTP-based file
systems, that act as an FTP client to an FTP server. Wind River
has one in VxWorks, so they do exist. There are some catches
though; and merely opening the file does not always read the
entire thing.)
The first fgetc() or fread() or equivalents will read the first
bufferful into memory, according to the size of buffer that has
been configured.
Ideally, anyway. If your C implementation has been reasonably
optimized, it should choose "reasonably good" buffer sizes
automatically as well, and your input should proceed at something
approaching maximum possible speed without any foolery at all.

Of course, there are always exceptions ... but then you have to:
The rest of this message gets into non-portable extensions.
.... get into those non-portable extensions. You also have to
experiment, as many attempts to go faster will prove to go slower
instead. Things like interrupt latency, DMA, and overlapping
read transactions can have surprising interactions. For instance,
this trick stands to reason, and does work on some systems:
- issue the largest read request that you can, so that the disk
can read as many consequative blocks as practical
but on others it backfires badly since a request for (say) one
megabyte has to wait for the entire megabyte, while 16 requests
for 64K each can process each 64K chunk "on the fly" while the
next chunk arrives.
- if you really get cramped for time, use a solid-state disk
if you can


Be aware, however, that flash memory is significantly *slower*
than rotating media (though reads are not as bad as writes).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 15 '05 #3
Walter Roberson wrote:
.... snip ...
Generally speaking, when you know you are reading a number of
bytes, fread() is faster, as it avoids the overhead of invoking
fgetc() each time. However, if you have a good optimizer then it
might all come out the same.


Not necessarily so. getc may be faster if the purpose is to scan
for something. Using it may well provide controlled access to the
file buffer, without ever moving any data, and with no function
call overhead (getc can be a macro, while fgetc may not).

Typical expansion of a getc call might be the inline equivalent of:

if (f->n--) fillbuffer(f);
return *f->p++;

which the implementation can do, because it knows the structure of
a FILE.

--
"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Nov 15 '05 #4
CBFalconer wrote:
Walter Roberson wrote:

... snip ...
Generally speaking, when you know you are reading a number of
bytes, fread() is faster, as it avoids the overhead of invoking
fgetc() each time. However, if you have a good optimizer then it
might all come out the same.

Not necessarily so. getc may be faster if the purpose is to scan
for something. Using it may well provide controlled access to the
file buffer, without ever moving any data, and with no function
call overhead (getc can be a macro, while fgetc may not).

Typical expansion of a getc call might be the inline equivalent of:

if (f->n--) fillbuffer(f);

ITYM
if (!(f->n--))
return *f->p++;

which the implementation can do, because it knows the structure of
a FILE.

--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 15 '05 #5
On Wed, 29 Jun 2005 17:05:00 +0000, CBFalconer wrote:
Walter Roberson wrote:

... snip ...

Generally speaking, when you know you are reading a number of
bytes, fread() is faster, as it avoids the overhead of invoking
fgetc() each time. However, if you have a good optimizer then it
might all come out the same.


Not necessarily so. getc may be faster if the purpose is to scan
for something. Using it may well provide controlled access to the
file buffer, without ever moving any data, and with no function
call overhead (getc can be a macro, while fgetc may not).


The standard requires both getc() and fgetc() to be available as a
function e.g. (getc)(stdin) is valid. It also allows <stdio.h> to provide
macro definitions for both. However it allows a getc() macro to violate
normal function-call-like semantics by evaluating its FILE * argument more
than once. fgetc() must evaluate it exactly once so possibilities for
implementing it as a macro are limited.

Lawrence
Nov 15 '05 #6
Lawrence Kirby wrote:
On Wed, 29 Jun 2005 17:05:00 +0000, CBFalconer wrote:
Walter Roberson wrote:

... snip ...

Generally speaking, when you know you are reading a number of
bytes, fread() is faster, as it avoids the overhead of invoking
fgetc() each time. However, if you have a good optimizer then it
might all come out the same.


Not necessarily so. getc may be faster if the purpose is to scan
for something. Using it may well provide controlled access to the
file buffer, without ever moving any data, and with no function
call overhead (getc can be a macro, while fgetc may not).


The standard requires both getc() and fgetc() to be available as a
function e.g. (getc)(stdin) is valid. It also allows <stdio.h> to
provide macro definitions for both. However it allows a getc()
macro to violate normal function-call-like semantics by evaluating
its FILE * argument more than once. fgetc() must evaluate it
exactly once so possibilities for implementing it as a macro are
limited.


Yes, your exposition is more accurate than mine, and better exposes
cases where you should or should not prefer one over the other.
However, in the context of efficiency vis a vis fread, the point is
that getc may well be the better choice, may break even, and in
some cases may be the poorer chice. It depends on the
implementation. If it matters, test.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 15 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Neil | last post by:
I have a very puzzling situation with a database. It's an Access 2000 mdb with a SQL 7 back end, with forms bound using ODBC linked tables. At our remote location (accessed via a T1 line) the time...
20
by: GS | last post by:
The stdint.h header definition mentions five integer categories, 1) exact width, eg., int32_t 2) at least as wide as, eg., int_least32_t 3) as fast as possible but at least as wide as, eg.,...
6
by: G.Esmeijer | last post by:
Friends, I would like to read a text file (fixed length formaated) really fast and store the data into an Access database (2003). Using the streamreader and reading line by line, separating the...
9
by: Graham | last post by:
I have been having some fun learning and using the new Controls and methods in .Net 2.0 which will make my life in the future easier and faster. Specifically the new databinding practises and...
6
by: James Radke | last post by:
Hello, I have a multithreaded windows NT service application (vb.net 2003) that I am working on (my first one), which reads a message queue and creates multiple threads to perform the processing...
2
by: Steve | last post by:
I have a FileSystemWatcher watching a directory where I will add and update files. When it detects a change, I search through all the files and update my list of files in the UI. This works fine....
11
by: Olie | last post by:
This post is realy to get some opinions on the best way of getting fast comunication between multiple applications. I have scowered the web for imformation on this subject and have just found...
10
by: javuchi | last post by:
I just want to share some code with you, and have some comments and improvements if you want. This header file allocates and add and delete items of any kind of data from a very fast array: ...
9
by: Salad | last post by:
I have access, for testing at my client's site, a Win2000 computer running A2003 retail. He recently upgraded all of his other machines to DualCore Pentiums with 2 gig ram and run A2003 runtime. ...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.