473,837 Members | 1,634 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to speed up ftell()/fseek()



Hello,

I am trying to fastly read large binary files (order of 100-200 MB)
using ftell() and fseek(). My class gets a pointer to the
data stored in the file, and then uses fseek() to access
and read the data. The problem is that when the file grows
in size, the access time also increases. I initially used
fseek() with option SEEK_SET, but later switched to SEEK_CUR
in the hope that this will speed up the access, but there
is no improvement. My question is: is there anything else
one can do in order to have the access time independed
on the file size?
Stream classes are not conceivable here, as they are even
much slower.

L.B.
*-------------------------------------------------------------------*
| Dr. Leslaw Bieniasz, |
| Institute of Physical Chemistry of the Polish Academy of Sciences,|
| Department of Electrochemical Oxidation of Gaseous Fuels, |
| ul. Zagrody 13, 30-318 Cracow, Poland. |
| tel./fax: +48 (12) 266-03-41 |
| E-mail: nb******@cyf-kr.edu.pl |
*-------------------------------------------------------------------*
| Interested in Computational Electrochemistr y? |
| Visit my web site: http://www.cyf-kr.edu.pl/~nbbienia |
*-------------------------------------------------------------------*
Jul 23 '05 #1
7 4992
"Leslaw Bieniasz" <nb******@cyf-kr.edu.pl> wrote in message
news:Pi******** *************** *******@kinga.c yf-kr.edu.pl...


Hello,

I am trying to fastly read large binary files (order of 100-200 MB)
using ftell() and fseek(). My class gets a pointer to the
data stored in the file, and then uses fseek() to access
and read the data. The problem is that when the file grows
in size, the access time also increases. I initially used
fseek() with option SEEK_SET, but later switched to SEEK_CUR
in the hope that this will speed up the access, but there
is no improvement. My question is: is there anything else
one can do in order to have the access time independed
on the file size?
Stream classes are not conceivable here, as they are even
much slower.


Let me guess: You are using a Microsoft compiler. I once wrote a language
interpreter that did all the necessary token recognition, parsing and
expression evaluation, and it turned out that an lseek I was doing just to
keep track of the current file position (and not to actually seek anywhere)
was taking 50% of the execution time! That was easy to fix because I only
had to use my own counter to keep track of the position myself. In your case
the fseek is really seeking, so I don't know what you can do. Are you sure
the delays are excessive? You would expect some degradation in performance
as the file size increases and the physical seek distances on the disk get
larger.

DW
Jul 23 '05 #2

"Leslaw Bieniasz" <nb******@cyf-kr.edu.pl> wrote in message news:Pi******** *************** *******@kinga.c yf-kr.edu.pl...


Hello,

I am trying to fastly read large binary files (order of 100-200 MB)
using ftell() and fseek(). My class gets a pointer to the
data stored in the file, and then uses fseek() to access
and read the data. The problem is that when the file grows
in size, the access time also increases. I initially used
fseek() with option SEEK_SET, but later switched to SEEK_CUR
in the hope that this will speed up the access, but there
is no improvement. My question is: is there anything else
one can do in order to have the access time independed
on the file size?
Stream classes are not conceivable here, as they are even
much slower.

[snip]

Perhaps the following links will give some tips:
http://groups-beta.google.com/group/...5e065030?hl=en
http://groups-beta.google.com/group/...a4c4e9bb?hl=en

--
Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Jul 23 '05 #3
Leslaw Bieniasz wrote:

Hello,

I am trying to fastly read large binary files (order of 100-200 MB)
using ftell() and fseek(). My class gets a pointer to the
data stored in the file, and then uses fseek() to access
and read the data. The problem is that when the file grows
in size, the access time also increases. I initially used
fseek() with option SEEK_SET, but later switched to SEEK_CUR
in the hope that this will speed up the access, but there
is no improvement. My question is: is there anything else
one can do in order to have the access time independed
on the file size?
Stream classes are not conceivable here, as they are even
much slower.

L.B.
*-------------------------------------------------------------------*
| Dr. Leslaw Bieniasz, |
| Institute of Physical Chemistry of the Polish Academy of Sciences,|
| Department of Electrochemical Oxidation of Gaseous Fuels, |
| ul. Zagrody 13, 30-318 Cracow, Poland. |
| tel./fax: +48 (12) 266-03-41 |
| E-mail: nb******@cyf-kr.edu.pl |
*-------------------------------------------------------------------*
| Interested in Computational Electrochemistr y? |
| Visit my web site: http://www.cyf-kr.edu.pl/~nbbienia |
*-------------------------------------------------------------------*


did you consider mmap'ing the file instead? I don't know whether
this is available on your platform and more performant than fseek,
but it might be worth a try.

Tom
Jul 23 '05 #4

How can I do the mmapping of a file?
I heard about that but I don't know how to actually do this.
L.B.

*-------------------------------------------------------------------*
| Dr. Leslaw Bieniasz, |
| Institute of Physical Chemistry of the Polish Academy of Sciences,|
| Department of Electrochemical Oxidation of Gaseous Fuels, |
| ul. Zagrody 13, 30-318 Cracow, Poland. |
| tel./fax: +48 (12) 266-03-41 |
| E-mail: nb******@cyf-kr.edu.pl |
*-------------------------------------------------------------------*
| Interested in Computational Electrochemistr y? |
| Visit my web site: http://www.cyf-kr.edu.pl/~nbbienia |
*-------------------------------------------------------------------*
Jul 23 '05 #5
Leslaw Bieniasz wrote:
How can I do the mmapping of a file?
I heard about that but I don't know how to actually do this.


Be aware that it's basically a Unix thing... see if you have a header file called sys/mman.h in your sytem path.

http://www.gnu.org/software/libc/man...mapped-I_002fO

I think you'll find some sample code if you follow the links in the post by Alex Vinokur earlier in this thread (it's a
bit tricky getting all the parameters right, I seem to recall).

--
Lionel B

Jul 23 '05 #6
Lionel B wrote:
Leslaw Bieniasz wrote:
How can I do the mmapping of a file?
I heard about that but I don't know how to actually do this.

Be aware that it's basically a Unix thing... see if you have a header file called sys/mman.h in your sytem path.

http://www.gnu.org/software/libc/man...mapped-I_002fO

I think you'll find some sample code if you follow the links in the post by Alex Vinokur earlier in this thread (it's a
bit tricky getting all the parameters right, I seem to recall).


when it comes to POSIX and UNIX the best places to go to are IMHO
- http://www.opengroup.org
(in this case
http://www.opengroup.org/onlinepubs/.../xsh/mmap.html)
- usenet: comp.unix.progr ammer
- the man pages on your systems
- the docs of your system provider (e.g.: http://docs.sun.com)

Tom
Jul 23 '05 #7
Thomas Maier-Komor wrote:
Lionel B wrote:
Leslaw Bieniasz wrote:
How can I do the mmapping of a file?
I heard about that but I don't know how to actually do this.


Be aware that it's basically a Unix thing... see if you have a
header file called sys/mman.h in your sytem path.

http://www.gnu.org/software/libc/man...mapped-I_002fO

I think you'll find some sample code if you follow the links in the
post by Alex Vinokur earlier in this thread (it's a bit tricky
getting all the parameters right, I seem to recall).


when it comes to POSIX and UNIX the best places to go to are IMHO
- http://www.opengroup.org


Didn't know about this resource - looks very handy.

Thanks,

--
Lionel B

Jul 23 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
667
by: kate | last post by:
salve. per favore rispondete alla mia domanda: Come faccio a ottenere le dimensioni di un file?(con c/c++) risp presto grazie
15
16213
by: TJ Walls | last post by:
Hello All, I am baffled ... I am trying to improve the speed of a program that I have written that performs random access within a file. It relies heavily on fseek and is very slow. To test, I wrote the following test program which just writes the numbers 1-167721 sequentially to a binary file: #include <stdio.h> #include <stdlib.h>
18
2215
by: Martin Johansen | last post by:
Hello When opening a CR-NL file, ftell returns the length of the file with the CR-NL as two bytes, is it supposed to do so? I am comparing two file-sizes, one CR-NL and one NL using ftell to get the filesize. Any alternative suggestion is welcomed. Thanks - Martin Johansen
2
3563
by: cedarson | last post by:
I am writing a program and have been instructeed to use the 'fseek', 'ftell', and 'stat' functions, however, after looking in the online manual for each of these, I am still unsure on how to use them. In my program, I am to write a code that opens a file, uses 'stat' to determine the file size, use 'fseek' to move the offset of the pointer, and finally use 'ftell' to obtain the file pointer index. Will someone please help? Again, thanks...
10
5986
by: Kenneth Brody | last post by:
I recently ran into an "issue" related to text files and ftell/fseek, and I'd like to know if it's a bug, or simply an annoying, but still conforming, implementation. The platform is Windows, where text files use CF+LF (0x0d, 0x0a) to mark end-of-line. The file in question, however, was in Unix format, with only LF (0x0a) at the end of each line. First, does the above situation already invoke "implementation defined" or "undefined"...
3
2960
by: Chen ShuSheng | last post by:
HI, I am now study a segment of codes: ------------------------ printf("%p\t",fp); /*add by me*/ fseek(fp, 0L, SEEK_END); /* go to end of file */ printf("%p\t",fp); /*add by me*/ last = ftell(fp); cout<<"last="<<last<<"\t"; /*add by me*/ -------------------------
7
3136
by: Hallvard B Furuseth | last post by:
I'm trying to clean up a program which does arithmetic on text file positions, and also reads text files in binary mode. I can't easily get rid of it all, so I'm wondering which of the following assumptions are, well, least unportable. In particular, do anyone know if there are real-life systems where the text file assumptions below don't hold? For text mode FILE*s,
25
3378
by: subramanian100in | last post by:
Consider the following program: #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv) { if (argc != 2) { printf("Usage: <program-name<text-file>\n");
2
3403
by: Seongsu Lee | last post by:
Hi all, I want to get the size of a block device by ftell(). I found that I can get the size of a device by seek() and tell() in Python. But not in C. What is difference between them? How can I get the size of a block device by ftell()?
0
9682
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10867
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10562
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
7803
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6989
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5666
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5842
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4469
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3122
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.