468,457 Members | 1,593 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,457 developers. It's quick & easy.

reading binary files on windows and fedora using fstream

180 100+
Hi @all,
I'm trying to read a binary file containing some data for my cross platform project. Here is the code snippet which will help me explain my problem

const int AVAILSIZE = 100000;
streampos availtable[AVAILSIZE + 1];
static fstream availfile;
static void openavailfile(bool writemode);
int main()
{
streampos j;
int hashvalue = calchashvalue(molecule);
/*here calchashvalue() calculates the position of molecule(some molecule in chemistry) in the file "availfile" by considering its molecular weight*/

openavailfile(false);

if (!(j = availtable[hashvalue])) //NO MOLECULE WITH THIS HASH #
return 0;
else
{
[.....] //operation on the molecule, obtained from file "availfile"
}
return 0;
}
static void openavailfile(bool writemode)
{
string filename = "availfile";
if(writemode)
{
availfile.open(filename.c_str(),ios::in | ios::out | ios::ate);
[.....] //remaining code
}
else
{
availfile.open(filename.c_str(), ios::in | ios::binary);
availfile.seekg(0, ios::beg);
availfile.read((char*)availtable,sizeof(availtable ));
}
}

Now here goes the problem.
This is a snippet from a chemistry project where there is a file called "availfile". This file is repository for molecules(in chemistry) where there are some 5000 molecules stored. In clear technical term "availfile" acts as a database for molecules. Now I need to calculate the hashvalue(the exact position) of the molecule in that database. So i used a fuction called "calchashvalue(molecule)"
which will help me calculate the molecule's position. Once that is over i'll open the file(availfile) in ios::in mode(in else part in func "openavailfile") and read the contents in availtable(which is an array of streampos of size 100001). Once the control returns from "openavailfile" function I put that value in j(which too is of type streampos). Then I check whether that position is zero or non-zero. If it is zero then I simply return(meaning that molecule is not in the database) else I perform some action on that molecule.

On both Linux and windows platform, hashvalue contains 35450(i.e int hashvalue = 35450 int the above code) but when I check for
" if (!(j = availtable[hashvalue])) " On linux platform I get "j" as 29925552 whereas on windows "j" contains 0, which means "j" differs in its position and fails on windows..

I have absolutely no idea, what I should do and I how I should proceed further because i need the same value in "j" on both windows and linux platforms.

Please help me fix this problem.
Thanx in advance
Rajeev
Aug 9 '06 #1
9 9189
Banfa
9,051 Expert Mod 8TB
Check your read of availfile

availfile.read((char*)availtable,sizeof(availtable ));


if this is failing for some reason then it will leave all entries in availtable as 0.
Aug 9 '06 #2
vermarajeev
180 100+
Hi,
Thanx for your reply.

I have checked the condition as follows

Expand|Select|Wrap|Line Numbers
  1. availfile.read((char*)availtable,sizeof(availtable ));
  2.   if(!availfile)
  3.      {
  4.         cout<<"Failed to read"<<endl;
  5.         exit(99);
  6.      } 
But unfortunately it also doesnt work. What I think is the problem with carriage return and "\n" on windows but since I have opened the file in binary mode it doesnt make the difference.

So please suggested me something solid and please let me know how I can overcome that problem. This is very important to me as I need to deliver the product within deadline.

Thanx
Aug 9 '06 #3
Banfa
9,051 Expert Mod 8TB
You read it in binary but you don't write it in binary which may be a bit of a mistake.

Since the data is binary the end of line termination shouldn't make a difference and in a binary file it is irrelevent since end of line only really applies to text files.

I don't know what you linux machine is but it is inherrently non-portable to try and read and write binary data just by copying 2 and from an array or structure.

This is because you do not know if the structure has different padding on the different machines and also different endianess makes this type of binary copy incompatable. The Only portable way to read write binary data is to read the individual bytes of the file and then place them into the array/structure in the correct manor.

Check the actual binary data in the file (using a hex editor) against what appears in the program. Visually compare in binary (hexidecimal) and see if you can spot the pattern of how the data is being imported from the file into the program.
Aug 9 '06 #4
vermarajeev
180 100+
Hi thank for your suggestions.
I have read the individual bytes of the file and got the following output.

It is difficult to show you the output in "*.bmp" format so I'm writing some sample which appears as the original one
Expand|Select|Wrap|Line Numbers
  1. On windows
  2. 00000000     00 00 00 00 00 00 00 00 00 00 00 00 00 D6 00 00
  3. 00000010     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   
  4. 00000020     00 58 00 00 00 00 00 00 A4 00 00 00 00 00 00 00  
  5. 00000030     00 00 00 00 00 58 00 00 00 00 00 1E 9C 00 00 00   
  6. 00000040     00 00 00 00 00 00 00 00 00 DA 00 00 00 00 00 00   
  7. 00000050     00 00 00 00 00 00 00 00 00 00 00 00 00 1A 00 00   
  8.  
  9. On Linux
  10. 00000000     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  11. 00000010     00 00 00 00 D6 00 00 00 00 00 00 00 00 00 00 00   
  12. 00000020     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  13. 00000030     00 00 58 00 00 00 00 00 00 00 00 00 A4 00 00 00   
  14. 00000040     00 00 00 00 00 00 C4 00 00 00 00 00 00 00 00 00   
  15. 00000050     58 00 00 00 00 00 00 00 00 1E 9C 00 00 00 00 00
So, here is the problem, the position of molecule differs on both platforms, I have got the problem but how do I solve this offset problem and be able to read the same offset position on both the platforms.
Please suggest me something.

Thankx.
Aug 10 '06 #5
Banfa
9,051 Expert Mod 8TB
OK the problem is that you code is non-portable in some way, this is what I was talking about in my previous post, when writing to and reading from the file you have to decide on what the structure of the file will be and write your code so that it maintains that structure.

Consider this structure

struct {
char c;
int i;
long l;
} ms = {8, 24, 40};

assuming int is 2 bytes long and long is 4 bytes long.

On a little endian machine with 1 byte packing this will appear in memory as

08 18 00 28 00 00 00

On a little endian machine with 2 byte packing this will appear in memory as

08 00 18 00 28 00 00 00

On a little endian machine with 4 byte packing this will appear in memory as

08 00 18 00 00 00 28 00 00 00

And if the macghine is big endian then that will respectively be

08 00 18 00 00 00 28

08 00 00 18 00 00 00 28

08 00 00 00 00 18 00 00 00 28

And there are many many other strcuture packing schemes.

so if you try to write that structure to a file using

fwrite(&ms, sizeof ms, 1, OutFile);

then you have no way of knowing if another machine will be able to read it because of the different structure packings.

Additionally notice that I assumed int to be 2 bytes in size, however that might not be true int is commonly 2 or 4 bytes in size but all you can reaaly count on is that

sizeof(char) == 1

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

in todays computers int could be anything from 1 - 8 bytes long.

so even

fwrite(&ms.i, sizeof ms.i, 1, OutFile);

Is not guaranteed to write data that can be read on another machine.

Fix the file structure

The file will have this structure

A number 1 byte long
A number 2 bytes long big endian
A number 4 bytes long big endian

And write the file explicitly in that structure

fputc(ms.c, OutFile);
fputc(((ms.i>>8)&0xFF), OutFile);
fputc((ms.&0xFF), OutFile);
fputc(((ms.l>>24)&0xFF), OutFile);
fputc(((ms.l>>16)&0xFF), OutFile);
fputc(((ms.l>>8)&0xFF), OutFile);
fputc((ms.l>>&0xFF), OutFile);

And read the file back in the same way. That is the only real way to guarantee that the file written will be readable on another platform.

True portablility gurus would say avoid binary and write text but I think you could probably get away with binary.
Aug 10 '06 #6
vermarajeev
180 100+
Hi,
Thanx for your reply. I'm really glad that I joined this forum to post my queryies. And the way you suggest it is really fantastic.

Ok what you mean is that I need to fix the file structure itself. The solution sounds perfect. But I have one more doubt Suppose the file already exists and then I need to fix the file structure, such that when I read the already existing file byte by byte I should be able to write the file's contents to some another file with the format I want say(1 char, 2 Int, 4 Long). Then read that file.

I have read your previous post where you suggests
"Check the actual binary data in the file (using a hex editor) against what appears in the program. Visually compare in binary (hexidecimal) and see if you can spot the pattern of how the data is being imported from the file into the program." But that doesnt make much sense to much.

Sorry for so many doubts coz this is my new experience on porting.
Thankx
Aug 11 '06 #7
Banfa
9,051 Expert Mod 8TB
If the file already exists then take your structure from the existing file.

If on the other hand the file already exists on both systems in different form and is in the wild (i.e. in use and you can't just delete the files of the wrong structure) you have more of a problem.

Assuming you control the file structure fix it now, also add a few bytes (1-4) at the front of the file and put a fixed set of data into the (0xF00D for instance) then you can read the first 4 bytes of any file and tell if it is your file type.

The next 2 or 4 bytes should be a version number for the file format so that if you need to change it you can able the new code to recognise and import an old file (the old code will not be able to import the new format but will at least be able to tell).

Then provide a conversion utility for the current format to the new format for different platforms.
Aug 11 '06 #8
vermarajeev
180 100+
Did I make sense in the previous post. I think so not. But my problem is that the file is already there(given by my client) and I need to read that file, also I even dont know in format the file is written.
So is it not possible that I can fix this file structure with my own?

Suggestions please.

Thankx
Aug 11 '06 #9
Banfa
9,051 Expert Mod 8TB
That will complicate things but not make them impossible. Obviously if the files already exist you can not change the structure, however you can still read and write them in a portable manor.

You will have to analyse the file contents (this is where a hexeditor comes in handy) and work out what the file structure is if the customer doesn't know. You have an advantage by the sounds of it since you know the type of data the file is meant to contain and it sounds like on at least 1 platform you are already managing to load the file successfully.

It may be just a case of changing the code to do portable file reads and writes.
Aug 11 '06 #10

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

2 posts views Thread by christos panagiotou | last post: by
4 posts views Thread by nightflyer | last post: by
7 posts views Thread by Daniel Moree | last post: by
2 posts views Thread by WahJava | last post: by
3 posts views Thread by John R. Delaney | last post: by
9 posts views Thread by Use*n*x | last post: by
9 posts views Thread by szclark | last post: by
reply views Thread by NPC403 | last post: by
1 post views Thread by subhajit12345 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.