By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,433 Members | 1,328 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,433 IT Pros & Developers. It's quick & easy.

Huge file access in C

P: n/a

I am writing a program in C to parse binary data files in excess of 4
GB. They could conceivably be up to 2 Terabytes long. I am using
fopen and fread to do the file io. I am compiling right now with the
Visual C++ .NET compiler (it's a console application,) but ultimately I
want to compile it for Linux. Using the windows compiler, fread craps
out around the 1 GB mark of my test file, so I'm assuming it has
something to do with pointers not being long enough (32 bits, 4 bytes,
= 4 GB max.) What could the problem be? Is there anything I can do to
read huge files in C? Thanks.

Dec 21 '06 #1
Share this Question
Share on Google+
9 Replies


P: n/a
BlackMagic a écrit :
I am writing a program in C to parse binary data files in excess of 4
GB. They could conceivably be up to 2 Terabytes long. I am using
fopen and fread to do the file io. I am compiling right now with the
Visual C++ .NET compiler (it's a console application,) but ultimately I
want to compile it for Linux. Using the windows compiler, fread craps
out around the 1 GB mark of my test file, so I'm assuming it has
something to do with pointers not being long enough (32 bits, 4 bytes,
= 4 GB max.) What could the problem be? Is there anything I can do to
read huge files in C? Thanks.
You have 64 bit file primitives in the Win32 API.
Use that. The API names are:

CreateFile/ReadFile/WriteFile and CloseHandle

Dec 21 '06 #2

P: n/a
jacob navia a écrit :
BlackMagic a écrit :
>I am writing a program in C to parse binary data files in excess of 4
GB. They could conceivably be up to 2 Terabytes long. I am using
fopen and fread to do the file io. I am compiling right now with the
Visual C++ .NET compiler (it's a console application,) but ultimately I
want to compile it for Linux. Using the windows compiler, fread craps
out around the 1 GB mark of my test file, so I'm assuming it has
something to do with pointers not being long enough (32 bits, 4 bytes,
= 4 GB max.) What could the problem be? Is there anything I can do to
read huge files in C? Thanks.
You have 64 bit file primitives in the Win32 API.
Use that. The API names are:

CreateFile/ReadFile/WriteFile and CloseHandle
That would not run under linux though...

:-(
Dec 21 '06 #3

P: n/a
BlackMagic wrote:
I am writing a program in C to parse binary data files in excess of 4
GB. They could conceivably be up to 2 Terabytes long. I am using
fopen and fread to do the file io. I am compiling right now with the
Visual C++ .NET compiler (it's a console application,) but ultimately I
want to compile it for Linux. Using the windows compiler, fread craps
out around the 1 GB mark of my test file, so I'm assuming it has
something to do with pointers not being long enough (32 bits, 4 bytes,
= 4 GB max.) What could the problem be? Is there anything I can do to
read huge files in C? Thanks.
Remaining within the C standard, I think you are dependent on what the
compiler and library supports. In particular, since say fseek takes a
long int offset, you are limited to whatever a long int is on your
platform at least as far as seeking is concerned. For fread, you are
reading size_t members of size size_t, so the maximum read possible may
depend on how the calculation is done or other limits.

<OTon linux/POSIX, you can look at open with the O_LARGEFILE flag.
If you have questions about it, try in comp.unix.programmer or some
other group. If you want it to work on windows, try asking in a
Windows newsgroup, as the answer is no doubt platform dependent. You
might find this illuminating too:
http://en.wikipedia.org/wiki/Large_file_support</OT>

-David

Dec 21 '06 #4

P: n/a
On 21 Dec 2006 13:15:24 -0800, in comp.lang.c , "BlackMagic"
<al***********@yahoo.comwrote:
>
I am writing a program in C to parse binary data files in excess of 4
GB. They could conceivably be up to 2 Terabytes long.
....
>Using the windows compiler, fread craps
out around the 1 GB mark of my test file,
Use OS-specific file-handling calls. These will handle whatever size
your OS can support. And you can't read files whose size exceeds the
max your OS supports anyway, so forget about 2TB files till you're
using a 64-bit OS.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Dec 21 '06 #5

P: n/a
Mark McIntyre wrote:
>
Use OS-specific file-handling calls. These will handle whatever size
your OS can support. And you can't read files whose size exceeds the
max your OS supports anyway, so forget about 2TB files till you're
using a 64-bit OS.
Wrong. The NTFS file system supports 17592186044416 - 65536 bytes, i.e.
around 16 Tera bytes with 1TB= 1,099,511,627,776 bytes. All this in
32 bits...

The linux file system "ext3" has a max file size of 2TB using a 4K
block size.

Obviously you will need several disks (a virtual volume) to store those
files...
Dec 21 '06 #6

P: n/a
In article <jq********************************@4ax.com>,
Mark McIntyre <ma**********@spamcop.netwrote:
>Use OS-specific file-handling calls. These will handle whatever size
your OS can support. And you can't read files whose size exceeds the
max your OS supports anyway, so forget about 2TB files till you're
using a 64-bit OS.
"64-bit OS" usually means an OS that can handle processes with 64-bit
address spaces, rather than ones that can handle 64-bit file offsets.
The two are not necessarily related.

-- Richard

--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Dec 21 '06 #7

P: n/a
2006-12-21 <11*********************@48g2000cwx.googlegroups.c om>,
David Resnick wrote:
BlackMagic wrote:
>I am writing a program in C to parse binary data files in excess of 4
GB. They could conceivably be up to 2 Terabytes long. I am using
fopen and fread to do the file io. I am compiling right now with the
Visual C++ .NET compiler (it's a console application,) but ultimately I
want to compile it for Linux. Using the windows compiler, fread craps
out around the 1 GB mark of my test file, so I'm assuming it has
something to do with pointers not being long enough (32 bits, 4 bytes,
= 4 GB max.) What could the problem be? Is there anything I can do to
read huge files in C? Thanks.

Remaining within the C standard, I think you are dependent on what the
compiler and library supports. In particular, since say fseek takes a
long int offset, you are limited to whatever a long int is on your
platform at least as far as seeking is concerned. For fread, you are
reading size_t members of size size_t, so the maximum read possible may
depend on how the calculation is done or other limits.

<OTon linux/POSIX, you can look at open with the O_LARGEFILE flag.
um. no and NO. That (A) is not posix and (B) not the right way to do it on
linux unless you're writing your own C library.

The same goes for llseek().

the correct way would be with fopen64 and friends.

Or, more easily, write with fopen, fread, etc as you normally would (but
use fseeko/ftello or fgetpos/fsetpos instead of fseek/ftell) as usual,
and #define _FILE_OFFSET_BITS 64 before all #includes.
On windows, you can use _fseeki64 and _ftelli64, which use __int64
instead of long, and apparently 64-bit file sizes are otherwise cleanly
supported. You could just use fgetpos/fsetpos unless you need to seek
directly to a location.

So, a simple stdio64.h for linux and windows:

#if /* linux */
#define _FILE_OFFSET_BITS 64
#endif

#include <stdio.h>

#if /* windows */
#define fseeko _fseeki64
#define ftello _ftelli64
typedef __int64 off_t;
#endif

#if /* other */
#define fseeko fseek
#define ftello ftell
typedef long off_t;
#endif

</OT>
Dec 22 '06 #8

P: n/a
On Fri, 22 Dec 2006 00:01:44 +0100, in comp.lang.c , jacob navia
<ja***@jacob.remcomp.frwrote:
>Mark McIntyre wrote:
>>
Use OS-specific file-handling calls. These will handle whatever size
your OS can support. And you can't read files whose size exceeds the
max your OS supports anyway, so forget about 2TB files till you're
using a 64-bit OS.

Wrong. The NTFS file system supports 17592186044416 - 65536 bytes, i.e.
around 16 Tera bytes
Well actually, XP's version of NTFS can support a volume size of
256TB, provided you use special formatting and don't care about
wastage.
Accessing individual bytes of this would require very OS-specific
handling of course. Certainly you could not expect a simple
integer=pointer model to be useful...
>All this in 32 bits...
Just goes to show why people shouldn't answer offtopic questions here
then. A lesson for us both there perhaps?

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Dec 22 '06 #9

P: n/a
On 21 Dec 2006 23:26:04 GMT, in comp.lang.c , ri*****@cogsci.ed.ac.uk
(Richard Tobin) wrote:
>In article <jq********************************@4ax.com>,
Mark McIntyre <ma**********@spamcop.netwrote:
>>Use OS-specific file-handling calls. These will handle whatever size
your OS can support. And you can't read files whose size exceeds the
max your OS supports anyway, so forget about 2TB files till you're
using a 64-bit OS.

"64-bit OS" usually means an OS that can handle processes with 64-bit
address spaces, rather than ones that can handle 64-bit file offsets.
The two are not necessarily related.
true. I was thinking more of loading the file into memory and usefully
fseeking around it.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Dec 22 '06 #10

This discussion thread is closed

Replies have been disabled for this discussion.