472,789 Members | 1,133 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,789 software developers and data experts.

Designing Data Interface for Very Large Files [more than GB size]

Hi,

I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..

I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?

Can somebody help me with some information about how to tackle this
problem? Or some pointers to where relavant information can be found?

Thanx and regards
Shailesh Kumar
Jul 19 '05 #1
6 6054
> I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..
Chances are that your OS and/or the implementation of the standard
library already does some caching. Personally I would not implement
caching right away, but design the interface in such a way that caching
can be added transparently later if the need arises.
I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?


One potential problem you may run is that the data type used for file
positioning isn't large enough. When dealing with files larger than 2 or
4 GByte this may very well be a problem. AFAIK there is no portable
solution guaranteed to be able to do randomly access all data on files
larger than the files sizes mentioned before.

--
Peter van Merkerk
peter.van.merkerk(at)dse.nl
Jul 19 '05 #2

This is just a suggestion, If you are using windows, then use the Memory
Mapped files to open large files rather than using iostreams.
Memory Mapped file access is fast and the load time will also be less
than the iostreams.

And as someone already said, on 32 bit systems it is not possible to
load a file of size more than 4 GB.

shailesh kumar wrote:
Hi,

I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..

I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?

Can somebody help me with some information about how to tackle this
problem? Or some pointers to where relavant information can be found?

Thanx and regards
Shailesh Kumar


Jul 19 '05 #3
amit gulati <am********@cox.net> wrote in message news:<bp**********@news2.news.larc.nasa.gov>...
This is just a suggestion, If you are using windows, then use the Memory
Mapped files to open large files rather than using iostreams.
Memory Mapped file access is fast and the load time will also be less
than the iostreams.
I am developing on windows only, but the code is going to be portable,
hence
i am not sure if using these memory mapped files would be good.
And as someone already said, on 32 bit systems it is not possible to
load a file of size more than 4 GB.
This is exactly one of my concerns... Does the Visual C++ compiler
have some support
for 64-bit file access? Or in general which file systems really
support such a thing?

regards
shailesh
shailesh kumar wrote:
Hi,

I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..

I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?

Can somebody help me with some information about how to tackle this
problem? Or some pointers to where relavant information can be found?

Thanx and regards
Shailesh Kumar

Jul 19 '05 #4


shailesh kumar wrote:
amit gulati <am********@cox.net> wrote in message news:<bp**********@news2.news.larc.nasa.gov>...
This is just a suggestion, If you are using windows, then use the Memory
Mapped files to open large files rather than using iostreams.
Memory Mapped file access is fast and the load time will also be less
than the iostreams.

I am developing on windows only, but the code is going to be portable,
hence
i am not sure if using these memory mapped files would be good.


I think linux has some sort of a file memory mapping mechanism, you can
use #ifdef and #define to seperate the machine dependent code.
And as someone already said, on 32 bit systems it is not possible to
load a file of size more than 4 GB.

This is exactly one of my concerns... Does the Visual C++ compiler
have some support
for 64-bit file access? Or in general which file systems really
support such a thing?


You need a 64 bit processor and operating system. Windows has not come
up out with a 64 bit operating system for x86, AMD recently cam out with
a 64 bit x86 baesd processor.
regards
shailesh

shailesh kumar wrote:

Hi,

I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..

I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?

Can somebody help me with some information about how to tackle this
problem? Or some pointers to where relavant information can be found?

Thanx and regards
Shailesh Kumar


Jul 19 '05 #5
sh******@interrasystems.com (shailesh kumar) wrote in message news:<cc**************************@posting.google. com>...
amit gulati <am********@cox.net> wrote in message news:<bp**********@news2.news.larc.nasa.gov>...
This is just a suggestion, If you are using windows, then use the Memory
Mapped files to open large files rather than using iostreams.
Memory Mapped file access is fast and the load time will also be less
than the iostreams.

I am developing on windows only, but the code is going to be portable,
hence
i am not sure if using these memory mapped files would be good.


AFAIK memory-mapped files are available on Linux (mmap()) and perhaps
on other unix variants (?). The usage semantics are quite similar to
Windows and therefore you can create a thin abstraction layer between
the two.

Sandeep
Jul 19 '05 #6
> > This is exactly one of my concerns... Does the Visual C++ compiler
have some support
File access is not a compiler issue, but a library and OS API issue. The
Win32 API does have support for 64-bit file access.
for 64-bit file access? Or in general which file systems really
support such a thing?
You need a 64 bit processor and operating system. Windows has not come
up out with a 64 bit operating system for x86, AMD recently cam out

with a 64 bit x86 baesd processor.


That is not true. The maximum file size an OS can handle is not related
to whether it runs on a 32-bit or 64-bit processor. Just like the good
old 16-bit OSes could handle files larger than 64Kbytes, 32-bit Windows
(and many other 32-bit OSes for that matter) can handle files larger
than 4GB. If you look for example at Win32 API function SetFilePointer()
you see it uses two (32-bit) signed long variables for the position, so
it can potentially address 2^63 bytes. More than seven years ago I wrote
software for the Windows NT platform that handled files that were larger
than 4GB. Only if you intend to load/map the complete file in memory a
64-bit OS would come in handy.

The real problem is that there is not standard way that is guaranteed to
work with large (>2 GByte) files. You will have to use platform specific
functions for that. If you intend to port your software to another
platform it is best write one or more wrappers around those platform
specific function calls. If you port to another platform you will only
have to rewrite those wrappers. As long as there are no platform or
compiler specific things on the interface of the wrappers, porting to
another platform should be relatively straightforward. When designing
the wrappers interface it might be wise to look at various OS API's to
see if there is a common denominator. I also recommend looking for
cross-platform libraries that wrap the OS API; it can save you a lot of
work. However if reading 16KByte chucks is all you ever going to need I
think the interface can be very straightforard and simple.

--
Peter van Merkerk
peter.van.merkerk(at)dse.nl

Jul 19 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: dave | last post by:
Hello there, I am at my wit's end ! I have used the following script succesfully to upload an image to my web space. But what I really want to be able to do is to update an existing record in a...
6
by: E G | last post by:
Hi! I am having problems in designing a class. First, I have a base class that allocates a 3D data set and allows some other mathematical operations with it, something like this: template...
13
by: Shailesh Humbad | last post by:
I wrote a short page as a quick reference to c++ integer data types. Any feedback welcome: http://www.somacon.com/blog/page11.php
9
by: Eric Lilja | last post by:
Hello, consider the following two functions: /* function foo() */ void foo() { float y = 0.0f; float sum = 0.0f; for(int i = 0; i < num; ++i) {
12
by: Chris Springer | last post by:
I'd like to get some feedback on the issue of storing data out to disk and where to store it. I've never been in a production environment in programming so you'll have to bear with me... My...
0
by: Christoph Haas | last post by:
Hi, list... I have written an application in Perl some time ago (I was young and needed the money) that parses multiple large text files containing nested data structures and allows the user to...
7
by: =?Utf-8?B?TW9iaWxlTWFu?= | last post by:
Hello everyone: I am looking for everyone's thoughts on moving large amounts (actually, not very large, but large enough that I'm throwing exceptions using the default configurations). We're...
18
by: Jens | last post by:
I'm starting a project in data mining, and I'm considering Python and Java as possible platforms. I'm conserned by performance. Most benchmarks report that Java is about 10-15 times faster than...
15
by: akomiakov | last post by:
Is there a technical reason why one can't initialize a cost static non- integral data member in a class?
0
by: Rina0 | last post by:
Cybersecurity engineering is a specialized field that focuses on the design, development, and implementation of systems, processes, and technologies that protect against cyber threats and...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
0
by: Taofi | last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same This are my field names ID, Budgeted, Actual, Status and Differences ...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: lllomh | last post by:
How does React native implement an English player?
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.