473,320 Members | 1,858 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Storing/processing binary file input help needed

I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed. I haven't worked with binary files before so
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.

I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
file

Is that right?

Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.

Nov 14 '05 #1
9 4010
[Cross-post to comp.lang.c++ removed. If you want a C answer, ask here.
If you want a C++ answer, ask there. Don't ask in both places. C and C++
are two very different languages. The best solution in one may not even
be valid in the other.]

Arnold wrote:
I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD).
You should leave the Microsoftisms at the door when you ask a question
here. We discuss standard, portable C in this group. We know what
unsigned long is. We don't know or care about DWORD.
I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed. I haven't worked with binary files before so
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input?
You can declare your buffer basically any way you want, but the
functions for reading will always read a sequence of chars. The problem
with declaring the buffer as something other than char[] is that it
results in basically reinterpreting the raw bits, and the result may be
incorrect or even illegal (resulting in undefined behavior - possibly a
program crash) if the format of the file doesn't match the exact layout
that the C implementation uses for the type (unsigned long in this case).

Basically, you are talking about allowing the C implementation to
dictate the file format. Not only is this a bad idea, but it sounds like
it's backward in your case - the file format is already defined.

The correct, portable way to read a binary file is almost always to read
it as raw bytes, then convert the raw bytes according to the format of
the file. So if your file is made up of 4-byte unsigned values, stored
most-significant-byte first, you could do something like this:

#define FIELD_BYTES 4

unsigned char buf[FIELD_BYTES];
unsigned long value = 0
size_t i;

fread(buf, FIELD_BYTES, 1, fp);
for (i=0; i<FIELD_BYTES; ++i)
{
value = (value << CHAR_BIT) | buf[i];
}

You could also handle more than one value at a time, with a little more
work.
Or do I need to know the format
of the original data that binary file is encoding and store it in that?
Not sure what you mean by that. Of course you need to know the format of
the file, and write the code accordingly. You can't wave a magic wand
and make your code handle files in an unknown format.
That's the part that is really confusing me.

I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
file

Is that right?
It's a possible starting point. It's certainly not a complete, portable
solution.

Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.


My main tip for improving speed and efficiency is don't even try to.
Write simple, correct code first. Only worry about making it faster if
it's determined to be too slow, and then profile to determine where the
time is being lost so you can target optimizing effort appropriately.

In particular, if you are only able to handle 512 elements at a time, I
wouldn't bother reading more than that from the file each iteration.
There's probably no need to read the entire file into memory, and it
would probably be more complicated. On the other hand, reading larger
blocks (and thus minimizing I/O function calls) /might/ improve
execution speed, but don't worry about that until it's time (as
described above).

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.
Nov 14 '05 #2
On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:
Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.


As an alternative to the mmap solution from Glanni, the easiest way to do
this would be to read 512 words, process them, write back result, repaet
until end-of-file. No need to read the whole file in memory.

You can write back results in place, if they should occupy the same
storage, ro to some other file. If the data has to be replaced, it is
often best to write the output to a new file, then move the new file over
the old file. That way you will not corrupt the original file if your
program crashes half way through.

HTH,
M4

Nov 14 '05 #3
"Arnold" <ar****@nothpole.com> wrote in message news:<g4***************@newssvr29.news.prodigy.com >...

I am not a C wizard but I have some suggestions.
I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed.
By the term "words" means to say that it is a chunk of chars and a
delimiters with an ASCII space? Or each "words" size is 512 bytes?
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.
By the term binary file and file format are you talking about the
first two letters in a file according to the DOS assembly language
(example MZ in .exe file) or the format of data present in a file
(fields and record with a kind of delimiter). If it is the second then
it is more related with the file's record design concept.

I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
file

Is that right?

Just 512 elements or unknown during the run time? Is not the time to
take up with linked list rather than using array data type?
Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.


Optimizing in C is not a kind of "instructions management" like in
asm.
Nov 14 '05 #4

"Martijn Lievaart" <m@remove.this.part.rtij.nl> wrote in message
news:pa****************************@remove.this.pa rt.rtij.nl...
On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:
Once I get the file into the buffer, I can then do a loop where I pass 512 elements of the array to a function until all 9000 elements are processed. I hope that's right. Any other tips on improving speed and efficiency would be appreciated. Thanks.
As an alternative to the mmap solution from Glanni, the easiest way to do
this would be to read 512 words, process them, write back result, repaet
until end-of-file. No need to read the whole file in memory.


I thought of that but speed is a concern so I want to keep the number of
disk accesses at a minimum.

You can write back results in place, if they should occupy the same
storage, ro to some other file. If the data has to be replaced, it is
often best to write the output to a new file, then move the new file over
the old file. That way you will not corrupt the original file if your
program crashes half way through.

In my case, I don't have to write any data back to the original file. Thanks
for the suggestions. HTH,
M4

Nov 14 '05 #5

"sathyashrayan" <sa************@yahoo.co.in> wrote in message
news:23**************************@posting.google.c om...
"Arnold" <ar****@nothpole.com> wrote in message news:<g4***************@newssvr29.news.prodigy.com >...
I am not a C wizard but I have some suggestions.
I need to read a binary file and store it into a buffer in memory (system has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the whole file has been processed.
By the term "words" means to say that it is a chunk of chars and a
delimiters with an ASCII space? Or each "words" size is 512 bytes?


Each word is a DWORD, so each one is 32 bits. I can pass a maximum of 512
DWORDs at a time to the function.

I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since that's what the function is taking as input? Or do I need to know the format of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.


By the term binary file and file format are you talking about the
first two letters in a file according to the DOS assembly language
(example MZ in .exe file) or the format of data present in a file
(fields and record with a kind of delimiter). If it is the second then
it is more related with the file's record design concept.


It is the second.

I believe I'll need to used fread to copy the file to that array. I plan on getting the size of file, then determining how many DWORD are present in it (for example 9000) and use that my number of object parameter in fread. So in this case:

fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary file

Is that right?

Just 512 elements or unknown during the run time? Is not the time to
take up with linked list rather than using array data type?


512 is the maximum the function can handle at a time so that is fixed,
except for the last iteration though as the file won't have a multiple of
512 number of DWORDs.

Once I get the file into the buffer, I can then do a loop where I pass 512 elements of the array to a function until all 9000 elements are processed. I hope that's right. Any other tips on improving speed and efficiency would be appreciated. Thanks.


Optimizing in C is not a kind of "instructions management" like in
asm.

Nov 14 '05 #6

"Arnold" <ar****@nothpole.com> wrote in message
news:g4***************@newssvr29.news.prodigy.com. ..
I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the whole file has been processed. I haven't worked with binary files before so I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.

I believe I'll need to used fread to copy the file to that array. I plan on getting the size of file, then determining how many DWORD are present in it (for example 9000) and use that my number of object parameter in fread. So
in this case:

fread(buffer, 4,9000,fp); file://each DWORD is 4 bytes, 900 DWORDs in my binary file

Is that right?


You don't need to read the whole file, you can read 512 bytes at a time into
a buffer of appropriate size:

char buffer[512];
x=fread(buffer,512 1, fp); // don't forget to check the value of x (which
is the number of bytes actually read)
...

You can then pass a pointer to this buffer to you function which has been
prototyped to accept an
array of DWORD, and the number of elements to process (which will be x/4
from the fread above)
e.g.

int process_buf(DWORD *my_array, int number_of_elements);

Then you function can iterate across this array as follows:

int process_buff(DWORD * my_array,int no_elements)
{
int i;
DWORD next_val;
for(i=0;i<no_elements;i++){
next_val=my_array[i]; // You might need to convert from
big-endian to little-endian here (see below)
}

}
Of course this makes an assumption that the data in the file is stored in
the same byte order as the processor you are running your program on (most
likely you are using an Intel Pentium so Little-Endian is the byte order you
are assuming). If the file uses another byte order then you can write
(or google for) a macro that will do the conversion for you..

Hope this helps
Sean

Nov 14 '05 #7
On Tue, 06 Jan 2004 08:10:52 GMT, "Arnold" <ar****@nothpole.com>
wrote:
I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed. I haven't worked with binary files before so
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.
The I/O function (fread as you suggest below) does not care how you
define the buffer. However, how you use the buffer may make a
difference. If you define the buffer as unsigned char, then you are
guaranteed that all possible 256 values are acceptable (unsigned char
cannot have trap values) and the buffer will be portable (at least for
systems which have CHAR_BIT defined as 8). If you define the buffer
as DWORD, are you sure that all 4 billion plus possible values that
could come from a binary file are acceptable and your program will
never execute on a machine with a different sizeof(unsigned long)?

I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:
There is no portable way to get the file size (unless you read the
entire file) so you probably need to use a system specific extension
or function for this.

fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
file
You meant 9000.

Is that right?

Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.


How you pass a quantity of array elements will determine the
suitability of your design. (Actually, the method of passing the
argument(s) should drive the design.) What is the prototype for the
receiving function?

The odds on the file containing an exact multiple of 512 DWORDs is
about 1 in 500 so you may want to be able to handle the last set as a
smaller quantity.

<<Remove the del for email>>
Nov 14 '05 #8
On Tue, 06 Jan 2004 03:41:14 -0500, Michael B Allen
<mb*****@ioplex.com> wrote in comp.lang.c:
On Tue, 06 Jan 2004 03:10:52 -0500, Arnold wrote:
I need to read a binary file and store it into a buffer in memory
(system has large amount of RAM, 2GB+) then pass it to a function. The
function accepts input as 32 bit unsigned longs (DWORD). I can pass a
max of 512 words to it at a time. So I would pass them in chunks of 512
words until the whole file has been processed. I haven't worked with
binary files before so I'm confused with how to store the binary file
into memory.
The term "binary file" is a bit of a misnomer. It just means it's not
text. Otherwise *everything* is "binary".
What sort of array do I use? Does C allow char only? Can I
declare a DWORD buffer since that's what the function is taking as
input? Or do I need to know the format of the original data that binary
file is encoding and store it in that? That's the part that is really
confusing me.


Pretend for a minute that you have a really big array in memory:

struct mystruct {
int foo;
char bar[10];
float zap;
}
...
struct mystruct *s = malloc(100000 * sizeof(struct mystruct));


Are you new in comp.lang.c? Everybody here by now should know the clc
preferred idiom:

struct mystruct *s = malloc(100000 * sizeof *s);

....and the magic number is anathema, of course, so:

#define NUM_STRUCTS 100000

struct mystruct *s = malloc(STRUCTS * sizeof *s);
populate(s);

If you write this array to a file you have a "binary file". Now you could
do the reverse and read in your array from the file. At least you can
on the same machine. If you write the file on an a litte-endian i386 and
read it in on a big-endian Sparc you're going to have endianness problems.

Mike

PS: This question didn't warrant cross-posting to two different news
groups. Please refrain from doing that. Some people will simply not
answer your question when they see that.


Why not? The fread() function is part of the standard C++ library as
well, so the post is topical there, and two is certainly not an
excessive number of groups for a cross-post.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 14 '05 #9
On Tue, 06 Jan 2004 18:01:29 +0000, Arnold wrote:

"Martijn Lievaart" <m@remove.this.part.rtij.nl> wrote in message
news:pa****************************@remove.this.pa rt.rtij.nl...
On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:
> Once I get the file into the buffer, I can then do a loop where I pass 512 > elements of the array to a function until all 9000 elements are processed. I > hope that's right. Any other tips on improving speed and efficiency would be > appreciated. Thanks.


As an alternative to the mmap solution from Glanni, the easiest way to do
this would be to read 512 words, process them, write back result, repaet
until end-of-file. No need to read the whole file in memory.


I thought of that but speed is a concern so I want to keep the number of
disk accesses at a minimum.


Memory mapping the file is probably still the best way, but suffers of a
size limit. To get around this, you can also read in large chunks of the
file. Instead of 512 words, read a few 100KB at the time and operate on
that. Experiment with buffer sizes to see what gives the best result.

I'm not sure what will be faster. Large buffers reduce the number of
system calls slightly (good), but decrease locality of reference (bad).
The mmap solution does not suffer either of these disadvantages I think.

Note that the number of disk accesses will be the same whatever solution
you chose. You have to read the whole file, period. I guess the main speed
factors are the number of system calls and how effectively you use your
memory. Also, you should try to do some useful work while waiting for the
disk, maybe asynchronous I/O or multithreading can be of help?

(If you look into multithreading, be sure you know what synchronisation
machisms are lightweight and which are heavyweight, huge difference).

I would just try a simple solution. If it isn't fast enough, try others.
Profile to see where your program spends its time. If most of the time is
spend on calculations, all of the above will give only very marginal
speedups. If run on a fast machine, maybe a naive implementation will be
fast enough for your needs. Remember the old truism about optimizing:
Don't (until you have proven you need it).

HTH,
M4

Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: dave | last post by:
Hello there, I am at my wit's end ! I have used the following script succesfully to upload an image to my web space. But what I really want to be able to do is to update an existing record in a...
7
by: Arnold | last post by:
I need to read a binary file and store it into a buffer in memory (system has large amount of RAM, 2GB+) then pass it to a function. The function accepts input as 32 bit unsigned longs (DWORD). I...
3
by: bissatch | last post by:
Hi, I am wanting to learn how to store images in a postgreSQL database. I have got as far as uploading the file using a file/browse field on an html form and have been able to catch the file...
11
by: bissatch | last post by:
Hi, I am trying to upload an image, create a new file based on that image and then store the base64 encoded image data in a database. I dont really know where my code is going wrong so I will...
2
by: bissatch | last post by:
Hi, I am trying to write script that is run when a form is submitted. The form contains an image input field and when submitted, the image is uploaded, resized and added as binary information to...
4
by: Tarique Jawed | last post by:
Alright I needed some help regarding a removal of a binary search tree. Yes its for a class, and yes I have tried working on it on my own, so no patronizing please. I have most of the code working,...
0
by: Michael Bredbury | last post by:
I am developing using Visual Studio .NET 2002 using ASP.NET and VB.NET. This is a web-based project which needs to install various ActiveX components on the users PC. ActiveX is needed because the...
6
by: (PeteCresswell) | last post by:
User wants to go this route instead of storing pointers in the DB and the documents outside. Only time I tried it was with only MS Word docs - and that was a loooong time ago - and it seemed to...
0
by: NM | last post by:
Hello, I've got a problem inserting binary objects into the postgres database. I have binary objects (e.g. images or smth else) of any size which I want to insert into the database. Funny is it...
6
by: surfivor | last post by:
I may be involved in a data migration project involving databases and creating XML feeds. Our site is PHP based, so I imagine the team might suggest PHP, but I had a look at the PHP documentation...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.