Telling an empty binary file from a "full" one

Michel Rouzic

I have a binary file used to store the values of variables in order to
use them again. I easily know whether the file exists or not, but the
problem is, in case the program has been earlier interupted before it
could write the variables to the file, the file is gonna be empty, and
then it's gonna load a load of crap into variables, which i want to
avoid.

That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

I thought that using fseek and ftell could work if the end of file
could be told but i read that "Setting the file position indicator to
end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior
for a binary stream (because of possible trailing null characters)"

My file has lots of zero bytes in it, so I guess it means it can't tell
then end of file reliably, right? I'd just like to know how I can, in a
reliable and portable way, tell the size of my binary file, and if not,
tell whether my file is empty or not

Nov 15 '05 #1

Subscribe Post Reply

2957

Walter Roberson

In article <11**********************@o13g2000cwo.googlegroups .com>,
Michel Rouzic <Mi********@yahoo.fr> wrote:

I have a binary file used to store the values of variables in order to
use them again. I easily know whether the file exists or not, but the
problem is, in case the program has been earlier interupted before it
could write the variables to the file, the file is gonna be empty, and
then it's gonna load a load of crap into variables, which i want to
avoid.
In that case, your load routine is programmed without due regard
to the circumstances.

Each fread() or fgetc() or fscanf() that you perform returns a
status. If there is any serious chance that the file might not
be of the proper size, you should be testing those return statuses,
and taking appropriate steps if you do not get enough data.
That file is always 36 bytes big (it contains 4 double-precision floats
and one integer)

Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)

I would, though, make the point that you have emphasized portability
for the test, but the size of double-precision floats is not certain
to be 8 bytes, and integers are not certain to be 4 bytes.
It also appears that you might not have left room for any flags
to indicate representation format and to indicate which "endian"
the data is in. Portably stiching together a double from a binary
number is no fun -- fixed point or printable text or XDR are easier
to deal with in that regard.
--
Feep if you love VT-52's.

Nov 15 '05 #2

Christopher Benson-Manica

Michel Rouzic <Mi********@yahoo.fr> wrote:

That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Nov 15 '05 #3

Anonymous 7843

In article <11**********************@o13g2000cwo.googlegroups .com>,
Michel Rouzic <Mi********@yahoo.fr> wrote:

My file has lots of zero bytes in it, so I guess it means it can't tell
then end of file reliably, right? I'd just like to know how I can, in a
reliable and portable way, tell the size of my binary file, and if not,
tell whether my file is empty or not

I'm curious as to what existing OS's do not accurately
report the lengths of binary files. Does anyone
have any examples?

The only situation like this that I've encountered is
that systems with unix-like device mapping can often
be coerced into opening a raw disk partition. Since
there is no file system present, fseek(SEEK_END) will
often position to the end of the partition whether
you have written meaningful data or not.

Nov 15 '05 #4

Kenneth Brody

Christopher Benson-Manica wrote:

Michel Rouzic <Mi********@yahoo.fr> wrote:
That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.

s/36 bytes/4*sizeof(double)/

In my mind, there's a world of difference between "portable code" and
"code that generates portable data files". (One could also argue that
writing the file as text isn't 100% portable, as ASCII files won't read
correctly on an EBCDIC system.)

In any case, the "real" answer here is to check the return values from
the fread() calls to make sure the data was there to be read.

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Nov 15 '05 #5

Gordon Burditt

>> My file has lots of zero bytes in it, so I guess it means it can't tell

then end of file reliably, right? I'd just like to know how I can, in a
reliable and portable way, tell the size of my binary file, and if not,
tell whether my file is empty or not
I'm curious as to what existing OS's do not accurately
report the lengths of binary files. Does anyone
have any examples?

Under CP/M, the length of a file is a multiple of 128 bytes (the
size of a single-density floppy disk sector). A text file used ^Z
as an end-of-file marker, which was used when the file was opened
in text mode, to give a more fine-grained end-of-file. No such
marker could be used in binary mode, as ^Z is a legitimate binary
value that the file could contain. The file size was a number of
disk sectors.

C implementations (and there were some, although many of them left
out stuff like floating point) had to deal with the imprecise size
of binary files. A few used highly non-standard implementation
decisions (like int = 8 bits, on a z80 or 8080 processor). Others
were much closer to Standard C, but rather cramped as the machine
generally had only a 64k address space total (although some had
memory paging setups).
The only situation like this that I've encountered is
that systems with unix-like device mapping can often
be coerced into opening a raw disk partition. Since
there is no file system present, fseek(SEEK_END) will
often position to the end of the partition whether
you have written meaningful data or not.

That's another example.

Gordon L. Burditt

Nov 15 '05 #6

Skarmander

Kenneth Brody wrote:

Christopher Benson-Manica wrote:
Michel Rouzic <Mi********@yahoo.fr> wrote:

That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.

s/36 bytes/4*sizeof(double)/

In my mind, there's a world of difference between "portable code" and
"code that generates portable data files". (One could also argue that
writing the file as text isn't 100% portable, as ASCII files won't read
correctly on an EBCDIC system.)

And by extension of that argument, "100% portable data" does not exist.
There is only data that is read more and less easily by various
languages on various platforms. But the intent is probably to literally
convey the sense of effort one has to expend to hoist it from one end to
another. ASCII-encoded integers are portable. Doubles encoded by
someone's C implementation are too heavy.

If your program is writing files, it's doing so because it needs to
communicate something across process boundaries. By some Rule or Law
someone no doubt coined, the mere potential to do things inspires the
desire to have them done. Therefore, it's wise to accommodate as broad a
range of processes as you can afford.

Writing binary data in the native format of your C implementation is
probably the narrowest range possible, and only justifiable by laziness.
It may be justifiable laziness, of course, but it's still laziness. Know
that it only works if the process on the other side of the boundary is
a program compiled by the exact same C implementation, running on the
exact same platform. Even upgrading your C library is taking chances --
very mild ones, but you should nevertheless be aware of them.

What am I saying? Oh yes, right. What the other posters said. Use a text
file. It's really not much more involved and saves you ever so much
potential grief.

S.

Nov 15 '05 #7

Keith Thompson

Kenneth Brody <ke******@spamcop.net> writes:

Christopher Benson-Manica wrote:

Michel Rouzic <Mi********@yahoo.fr> wrote:
> That file is always 36 bytes big (it contains 4 double-precision floats
> and one integer) and i'd like to be able to test whether it is 36 bytes
> long or not, but it seems like quite a big problem to get to do it in a
> portable way.

If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.

s/36 bytes/4*sizeof(double)/

s/4*sizeof(double)/4*sizeof(double)+sizeof(int)/

(assuming that "one integer" means "one int").

The most sensible approach *if* you don't care about portability of
the file is probably to declare a struct type

struct foo {
double a;
double b;
double c;
double d;
int e;
}

and use fread/fwrite to read and write values of type struct foo
directly to the file (in binary mode, of course). Never refer to
"36"; always use "sizeof(struct foo)" or "sizeof obj" where obj is of
type struct foo.

The code should be portable to other platforms, but the data file will
not be; it will only be usable on the system where it was created.
That's likely to be good enough. (If it isn't, use some portable
external representation of the data; plain text is a good choice.)

And, of course, choose more descripive names, than a, b, c, d, e, and
foo.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 15 '05 #8

Michel Rouzic

Skarmander wrote:

Writing binary data in the native format of your C implementation is
probably the narrowest range possible, and only justifiable by laziness.
It may be justifiable laziness, of course, but it's still laziness. Know
that it only works if the process on the other side of the boundary is
a program compiled by the exact same C implementation, running on the
exact same platform.

No, I don't have this problem. The reason for that is that it's a
configuration file, it writes to a file whats in memory in order to use
it later. so it works both on big endian and little endian machines,
and indeed it can take absolutly any way of writing double-precision
floats, since it reads only what it writes.

and then, it's not laziness, rather ignorance, i never dealt yet with
using text files (i'm only at my second C program)

Nov 15 '05 #9

Michel Rouzic

Keith Thompson wrote:

The most sensible approach *if* you don't care about portability of
the file is probably to declare a struct type

struct foo {
double a;
double b;
double c;
double d;
int e;
}

and use fread/fwrite to read and write values of type struct foo
directly to the file (in binary mode, of course). Never refer to
"36"; always use "sizeof(struct foo)" or "sizeof obj" where obj is of
type struct foo.

The code should be portable to other platforms, but the data file will
not be; it will only be usable on the system where it was created.
That's likely to be good enough. (If it isn't, use some portable
external representation of the data; plain text is a good choice.)

And, of course, choose more descripive names, than a, b, c, d, e, and
foo.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

thx, but it's quite off topic. I just want to test what's the size of
the file like, I already know how to write my data to it, or even read
it, my problem, is that I don't want to read a file if it's empty.

Nov 15 '05 #10

Michel Rouzic

Walter Roberson wrote:

In that case, your load routine is programmed without due regard
to the circumstances.

Each fread() or fgetc() or fscanf() that you perform returns a
status. If there is any serious chance that the file might not
be of the proper size, you should be testing those return statuses,
and taking appropriate steps if you do not get enough data.
That file is always 36 bytes big (it contains 4 double-precision floats
and one integer)

Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)

I would, though, make the point that you have emphasized portability
for the test, but the size of double-precision floats is not certain
to be 8 bytes, and integers are not certain to be 4 bytes.
It also appears that you might not have left room for any flags
to indicate representation format and to indicate which "endian"
the data is in. Portably stiching together a double from a binary
number is no fun -- fixed point or printable text or XDR are easier
to deal with in that regard.
--
Feep if you love VT-52's.

That's getting helpful, but I don't really know how to deal with what
fread returns (indeed i have never dealt with size_t's before, nor
included stddef.h).

Anyways, my file has no risk of being over the right size, but only
under, so I guess i should try to read
(4*sizeof(double)+sizeof(int_32)) bytes and see what it returns (when
i'll have figured out what to do with what fread returns)

btw, right now, that file is empty, uneditable and undeletable, and `ls
-l` in cygwin tells me "ls: freq.cfg: No such file or directory", is it
because i killed the process before it fclosed the file?

Nov 15 '05 #11

Michel Rouzic

Christopher Benson-Manica wrote:

Michel Rouzic <Mi********@yahoo.fr> wrote:
That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

yeah, i know, i should rather say that its size is
(4*sizeof(double)+sizeof(int32_t)) bytes.

The reason why I don't "simply store a small amount of data as text" is
that I'm a beginner and until now I only dealt with binary files and I
wouldn't know how to deal with read and writing to a text file. I guess
that's what I should try then, even if i wanted toa void having to do
that..

Nov 15 '05 #12

Michel Rouzic

Kenneth Brody wrote:

Christopher Benson-Manica wrote:

Michel Rouzic <Mi********@yahoo.fr> wrote:
That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.

s/36 bytes/4*sizeof(double)/

In my mind, there's a world of difference between "portable code" and
"code that generates portable data files". (One could also argue that
writing the file as text isn't 100% portable, as ASCII files won't read
correctly on an EBCDIC system.)

In any case, the "real" answer here is to check the return values from
the fread() calls to make sure the data was there to be read.

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

The file is not meant to be portable. It's created in the first place
by the program and only used by the program. But I'll try to act like
it has to be portable then...

Nov 15 '05 #13

Keith Thompson

"Michel Rouzic" <Mi********@yahoo.fr> writes:

Keith Thompson wrote:
The most sensible approach *if* you don't care about portability of
the file is probably to declare a struct type

struct foo {
double a;
double b;
double c;
double d;
int e;
}

and use fread/fwrite to read and write values of type struct foo
directly to the file (in binary mode, of course). Never refer to
"36"; always use "sizeof(struct foo)" or "sizeof obj" where obj is of
type struct foo.

The code should be portable to other platforms, but the data file will
not be; it will only be usable on the system where it was created.
That's likely to be good enough. (If it isn't, use some portable
external representation of the data; plain text is a good choice.)

And, of course, choose more descripive names, than a, b, c, d, e, and
foo.

thx, but it's quite off topic. I just want to test what's the size of
the file like, I already know how to write my data to it, or even read
it, my problem, is that I don't want to read a file if it's empty.

Sure, but the best way to do this is to attempt to read it (using
fread() with appropriate arguments) and detect whether it succeeded.
If you use "sizeof(struct foo)" rather than magic numbers like 36, the
code that does this will be easier to read and more maintainable.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 15 '05 #14

Michel Rouzic

Walter Roberson wrote:

In that case, your load routine is programmed without due regard
to the circumstances.

Each fread() or fgetc() or fscanf() that you perform returns a
status. If there is any serious chance that the file might not
be of the proper size, you should be testing those return statuses,
and taking appropriate steps if you do not get enough data.
That file is always 36 bytes big (it contains 4 double-precision floats
and one integer)

Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)

I would, though, make the point that you have emphasized portability
for the test, but the size of double-precision floats is not certain
to be 8 bytes, and integers are not certain to be 4 bytes.
It also appears that you might not have left room for any flags
to indicate representation format and to indicate which "endian"
the data is in. Portably stiching together a double from a binary
number is no fun -- fixed point or printable text or XDR are easier
to deal with in that regard.
--
Feep if you love VT-52's.

Nov 15 '05 #15

Michel Rouzic

Keith Thompson wrote:

"Michel Rouzic" <Mi********@yahoo.fr> writes:
Keith Thompson wrote:
The most sensible approach *if* you don't care about portability of
the file is probably to declare a struct type

struct foo {
double a;
double b;
double c;
double d;
int e;
}

and use fread/fwrite to read and write values of type struct foo
directly to the file (in binary mode, of course). Never refer to
"36"; always use "sizeof(struct foo)" or "sizeof obj" where obj is of
type struct foo.

The code should be portable to other platforms, but the data file will
not be; it will only be usable on the system where it was created.
That's likely to be good enough. (If it isn't, use some portable
external representation of the data; plain text is a good choice.)

And, of course, choose more descripive names, than a, b, c, d, e, and
foo.

thx, but it's quite off topic. I just want to test what's the size of
the file like, I already know how to write my data to it, or even read
it, my problem, is that I don't want to read a file if it's empty.

Sure, but the best way to do this is to attempt to read it (using
fread() with appropriate arguments) and detect whether it succeeded.
If you use "sizeof(struct foo)" rather than magic numbers like 36, the
code that does this will be easier to read and more maintainable.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

OK, I managed to do it. And indeed, looks like a way to really
determine the size of the file, by doing a fread of one char in loop,
stopping the loop when the output of fread is 0, and telling the size
of the file by the iteration at which the loop stop.

Isn't it a reliable and relatively portable way of telling the size of
a file?

Nov 15 '05 #16

Jack Klein

On 19 Sep 2005 11:25:08 -0700, "Michel Rouzic" <Mi********@yahoo.fr>
wrote in comp.lang.c:

I have a binary file used to store the values of variables in order to
use them again. I easily know whether the file exists or not, but the
problem is, in case the program has been earlier interupted before it
could write the variables to the file, the file is gonna be empty, and
then it's gonna load a load of crap into variables, which i want to
avoid.

That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

I thought that using fseek and ftell could work if the end of file
could be told but i read that "Setting the file position indicator to
end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior
for a binary stream (because of possible trailing null characters)"

My file has lots of zero bytes in it, so I guess it means it can't tell
then end of file reliably, right? I'd just like to know how I can, in a
reliable and portable way, tell the size of my binary file, and if not,
tell whether my file is empty or not

You are only addressing one part of the issue. If you have a reason
to verify that your data is truly valid, then make sure it is truly
valid. That means verifying more than just the size, but the actual
contents, by means of a checksum or some such similar mechanism.

Consider:

struct my_data
{
double v1;
double v2;
double v3;
double v4;
int i1;
};

struct my_storage
{
struct my_data data;
unsigned long checksum;
};

When your structure is filled with data and you are ready to write it
to file, pass its address to a function that will compute a checksum
on the inner structure and return it. Assign it to the checksum
member of the outer structure and store it with fwrite() into a binary
file.

To read it back, use fread() on the binary file to read it into a
my_storage structure. Call the checksum function again and compare
the value it returns to the checksum value in the outer structure.

If you read the correct number of bytes from the file, you verify that
the file has no more bytes by getc() returning EOF, and the checksum
matches, then you may have a high degree of confidence that you have
recovered valid data.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Nov 15 '05 #17

Keith Thompson

"Michel Rouzic" <Mi********@yahoo.fr> writes:

Keith Thompson wrote:
"Michel Rouzic" <Mi********@yahoo.fr> writes:
> Keith Thompson wrote:
>> The most sensible approach *if* you don't care about portability of
>> the file is probably to declare a struct type
>>
>> struct foo {
>> double a;
>> double b;
>> double c;
>> double d;
>> int e;
>> }
>>
>> and use fread/fwrite to read and write values of type struct foo
>> directly to the file (in binary mode, of course). Never refer to
>> "36"; always use "sizeof(struct foo)" or "sizeof obj" where obj is of
>> type struct foo.
>>
>> The code should be portable to other platforms, but the data file will
>> not be; it will only be usable on the system where it was created.
>> That's likely to be good enough. (If it isn't, use some portable
>> external representation of the data; plain text is a good choice.)
>>
>> And, of course, choose more descripive names, than a, b, c, d, e, and
>> foo.
>
> thx, but it's quite off topic. I just want to test what's the size of
> the file like, I already know how to write my data to it, or even read
> it, my problem, is that I don't want to read a file if it's empty.
Sure, but the best way to do this is to attempt to read it (using
fread() with appropriate arguments) and detect whether it succeeded.
If you use "sizeof(struct foo)" rather than magic numbers like 36, the
code that does this will be easier to read and more maintainable.

Please snip signatures when you quote.

OK, I managed to do it. And indeed, looks like a way to really
determine the size of the file, by doing a fread of one char in loop,
stopping the loop when the output of fread is 0, and telling the size
of the file by the iteration at which the loop stop.

Isn't it a reliable and relatively portable way of telling the size of
a file?

Yeah, that's one way to do it. In your case, though, you probably
don't need to care exactly how big the file is, just whether it's big
enough.

Given the type "struct foo" above, you could just do this:

FILE *config_file = fopen("filename", "rb");
/* Insert error checking here */
struct foo buffer;
size_t bytes_read = fread(&buffer, sizeof buffer, 1, config_file);
if (bytes_read == sizeof buffer) {
/* ok */
}
else {
/* read failed */
}

This is untested code.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 15 '05 #18

Michel Rouzic

Keith Thompson wrote:

"Michel Rouzic" <Mi********@yahoo.fr> writes:
Keith Thompson wrote:
"Michel Rouzic" <Mi********@yahoo.fr> writes:
> Keith Thompson wrote:
>> The most sensible approach *if* you don't care about portability of
>> the file is probably to declare a struct type
>>
>> struct foo {
>> double a;
>> double b;
>> double c;
>> double d;
>> int e;
>> }
>>
>> and use fread/fwrite to read and write values of type struct foo
>> directly to the file (in binary mode, of course). Never refer to
>> "36"; always use "sizeof(struct foo)" or "sizeof obj" where obj is of
>> type struct foo.
>>
>> The code should be portable to other platforms, but the data file will
>> not be; it will only be usable on the system where it was created.
>> That's likely to be good enough. (If it isn't, use some portable
>> external representation of the data; plain text is a good choice.)
>>
>> And, of course, choose more descripive names, than a, b, c, d, e, and
>> foo.
>
> thx, but it's quite off topic. I just want to test what's the size of
> the file like, I already know how to write my data to it, or even read
> it, my problem, is that I don't want to read a file if it's empty.

Sure, but the best way to do this is to attempt to read it (using
fread() with appropriate arguments) and detect whether it succeeded.
If you use "sizeof(struct foo)" rather than magic numbers like 36, the
code that does this will be easier to read and more maintainable.

Please snip signatures when you quote.

OK, I managed to do it. And indeed, looks like a way to really
determine the size of the file, by doing a fread of one char in loop,
stopping the loop when the output of fread is 0, and telling the size
of the file by the iteration at which the loop stop.

Isn't it a reliable and relatively portable way of telling the size of
a file?

Yeah, that's one way to do it. In your case, though, you probably
don't need to care exactly how big the file is, just whether it's big
enough.

Given the type "struct foo" above, you could just do this:

FILE *config_file = fopen("filename", "rb");
/* Insert error checking here */
struct foo buffer;
size_t bytes_read = fread(&buffer, sizeof buffer, 1, config_file);
if (bytes_read == sizeof buffer) {
/* ok */
}
else {
/* read failed */
}

This is untested code.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Yeah, that's right, and I didn't actually use it to determine the size,
but to see if the last char read returns 0 or 1. I was just saying it
could be used to tell the size too.

Nov 15 '05 #19

Lawrence Kirby

On Mon, 19 Sep 2005 18:56:28 -0700, Michel Rouzic wrote:

....

OK, I managed to do it. And indeed, looks like a way to really
determine the size of the file, by doing a fread of one char in loop,
stopping the loop when the output of fread is 0, and telling the size
of the file by the iteration at which the loop stop.
It would be simpler to use getc().
Isn't it a reliable and relatively portable way of telling the size of
a file?

Depends what you mean by "size of a file". It will tell you how much data
you can read from a file which is a reasonable definition. It won't
necessarily tell you how much data we written to the file in the first
place. Others have mentioned systems where the "file size" stored by the
system is a number of blocks and not a byte count.

Lawrence

Nov 15 '05 #20

Bryan Donlan

Michel Rouzic wrote:

Skarmander wrote:
Writing binary data in the native format of your C implementation is
probably the narrowest range possible, and only justifiable by laziness.
It may be justifiable laziness, of course, but it's still laziness. Know
that it only works if the process on the other side of the boundary is
a program compiled by the exact same C implementation, running on the
exact same platform.
No, I don't have this problem. The reason for that is that it's a
configuration file, it writes to a file whats in memory in order to use
it later. so it works both on big endian and little endian machines,
and indeed it can take absolutly any way of writing double-precision
floats, since it reads only what it writes.

This is true, but note that you'll need to reload it with the same
implementation that saved it.
and then, it's not laziness, rather ignorance, i never dealt yet with
using text files (i'm only at my second C program)

--
Î»z.Î»i.i(i((Î»n.Î»m.Î»z.Î»i.nz(Î»q.mqi))((Î»n.Î»z .Î»i.n(nzi)i)(Î»z.Î»i.i(((Î»n.Î»z.Î»i.n
(nzi)i)(Î»z.Î»i.i(iz)))zi)))((Î»n.Î»z.Î»i.n(nzi)i) (Î»z.Î»i.i(iz)))zi))

Nov 15 '05 #21

Bryan Donlan

Michel Rouzic wrote:

Walter Roberson wrote:
In that case, your load routine is programmed without due regard
to the circumstances.

Each fread() or fgetc() or fscanf() that you perform returns a
status. If there is any serious chance that the file might not
be of the proper size, you should be testing those return statuses,
and taking appropriate steps if you do not get enough data.
>That file is always 36 bytes big (it contains 4 double-precision floats
>and one integer)

Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)

I would, though, make the point that you have emphasized portability
for the test, but the size of double-precision floats is not certain
to be 8 bytes, and integers are not certain to be 4 bytes.
It also appears that you might not have left room for any flags
to indicate representation format and to indicate which "endian"
the data is in. Portably stiching together a double from a binary
number is no fun -- fixed point or printable text or XDR are easier
to deal with in that regard.
--
Feep if you love VT-52's.

That's getting helpful, but I don't really know how to deal with what
fread returns (indeed i have never dealt with size_t's before, nor
included stddef.h).

Anyways, my file has no risk of being over the right size, but only
under, so I guess i should try to read
(4*sizeof(double)+sizeof(int_32)) bytes and see what it returns (when
i'll have figured out what to do with what fread returns)

btw, right now, that file is empty, uneditable and undeletable, and `ls
-l` in cygwin tells me "ls: freq.cfg: No such file or directory", is it
because i killed the process before it fclosed the file?

Sounds like it doesn't exist. Did fopen return NULL?

--
Î»z.Î»i.i(i((Î»n.Î»m.Î»z.Î»i.nz(Î»q.mqi))((Î»n.Î»z .Î»i.n(nzi)i)(Î»z.Î»i.i(((Î»n.Î»z.Î»i.n
(nzi)i)(Î»z.Î»i.i(iz)))zi)))((Î»n.Î»z.Î»i.n(nzi)i) (Î»z.Î»i.i(iz)))zi))

Nov 15 '05 #22

Keith Thompson

Bryan Donlan <bd*****@gmail.com> writes:

Michel Rouzic wrote:

[...]

btw, right now, that file is empty, uneditable and undeletable, and `ls
-l` in cygwin tells me "ls: freq.cfg: No such file or directory", is it
because i killed the process before it fclosed the file?

Sounds like it doesn't exist. Did fopen return NULL?

<WAY_OT>
I think he means that "ls -l" with no arguments, says
ls: freq.cfg: No such file or directory
-- i.e., there's a directory entry for it (so it shows up in a plain
"ls"), but any attempt to read the file itself acts as if it doesn't
exist.

See www.cygwin.com for pointers to mailing lists where you can ask
about this.
</WAY_OT>

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Nov 15 '05 #23

Michel Rouzic

yeah, but you'll also need to recompile the whole program before,
indeed

Bryan Donlan wrote:

Michel Rouzic wrote:

Skarmander wrote:
Writing binary data in the native format of your C implementation is
probably the narrowest range possible, and only justifiable by laziness.
It may be justifiable laziness, of course, but it's still laziness. Know
that it only works if the process on the other side of the boundary is
a program compiled by the exact same C implementation, running on the
exact same platform.

No, I don't have this problem. The reason for that is that it's a
configuration file, it writes to a file whats in memory in order to use
it later. so it works both on big endian and little endian machines,
and indeed it can take absolutly any way of writing double-precision
floats, since it reads only what it writes.

This is true, but note that you'll need to reload it with the same
implementation that saved it.
and then, it's not laziness, rather ignorance, i never dealt yet with
using text files (i'm only at my second C program)

Nov 15 '05 #24

Michel Rouzic

Bryan Donlan wrote:

Michel Rouzic wrote:

Walter Roberson wrote:
In that case, your load routine is programmed without due regard
to the circumstances.

Each fread() or fgetc() or fscanf() that you perform returns a
status. If there is any serious chance that the file might not
be of the proper size, you should be testing those return statuses,
and taking appropriate steps if you do not get enough data.

>That file is always 36 bytes big (it contains 4 double-precision floats
>and one integer)

Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)

I would, though, make the point that you have emphasized portability
for the test, but the size of double-precision floats is not certain
to be 8 bytes, and integers are not certain to be 4 bytes.
It also appears that you might not have left room for any flags
to indicate representation format and to indicate which "endian"
the data is in. Portably stiching together a double from a binary
number is no fun -- fixed point or printable text or XDR are easier
to deal with in that regard.
--
Feep if you love VT-52's.

That's getting helpful, but I don't really know how to deal with what
fread returns (indeed i have never dealt with size_t's before, nor
included stddef.h).

Anyways, my file has no risk of being over the right size, but only
under, so I guess i should try to read
(4*sizeof(double)+sizeof(int_32)) bytes and see what it returns (when
i'll have figured out what to do with what fread returns)

btw, right now, that file is empty, uneditable and undeletable, and `ls
-l` in cygwin tells me "ls: freq.cfg: No such file or directory", is it
because i killed the process before it fclosed the file?

Sounds like it doesn't exist. Did fopen return NULL?

no never mind, i had deleted that message and reposted it without the
end so noboy would reply to that, but you did anyways. I don't know how
I did, but even if it didn't look like that in cygwin, my program was
still running... I only had to kill it to fix it...

Nov 15 '05 #25

Richard Bos

"Michel Rouzic" <Mi********@yahoo.fr> wrote:

Walter Roberson wrote:
Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)
That's getting helpful, but I don't really know how to deal with what
fread returns (indeed i have never dealt with size_t's before, nor
included stddef.h).
One hardly ever #includes <stddef.h>, since nearly everything in it is
also defined in other headers, where necessary (TTBOMK the only
exceptions are ptrdiff_t and offsetof()). For example, size_t is also
defined in <stdio.h>. And you probably _have_ dealt with size_t's
without realising it: sizeof evaluates to a size_t.

FWIW, a size_t is simply an unsigned integer type of the necessary size.
Nothing mysterious about it.
Anyways, my file has no risk of being over the right size, but only
under, so I guess i should try to read
(4*sizeof(double)+sizeof(int_32)) bytes and see what it returns (when
i'll have figured out what to do with what fread returns)

Yes... except that if I were you, I'd make that a macro, so you can use
it both for reading and for writing; and people who say "there is no
risk of foo" _will_ encounter foo the day after they hand in their
programs, so I'd make certain and check for over-sized files anyway.

Richard

Nov 15 '05 #26

Chris Torek

In article <BDEXe.15274$mH.10157@fed1read07>
Anonymous 7843 <an******@example.com> wrote:

I'm curious as to what existing OS's do not accurately
report the lengths of binary files. Does anyone
have any examples?

A whole bunch of old mainframe and minicomputer OSes did this.
They allocated only whole sectors to files, and files were always
sized in whole-sector units. Text files used special encodings so
as to be able to hold "lines of text" that did not come out to an
even number of disk sectors. For instance, each line might be
prefixed by a byte-count indicating how much space the line occupied
within the text file and how many bytes of that were to be treated
as file text (with the extra bytes, if any, being ignored -- this
allows one to shorten lines without rewriting the file). (Each
line might also be numbered, so that lines could be lengthened
without rewriting the entire file, by marking the original line as
deleted -- zero valid bytes -- and placing the new text into whatever
existing space could be found, or at the end of the file.)

VMS's RMS took care of dealing with all the various file-formats
for you; you just told it to open a "text" file and it would map
out the magic. Open the same file as "binary", however, and all
the magic encoding shows up. It was not until VMS version 5 that
"stream-LF" text files appeared; before then, *all* text files had
magic encoding. (The encoding for a "stream-LF" file is basically
the same as that used on Unix systems, i.e., no encoding at all,
just a sequence of bytes with "lines" indicated by newline bytes.)

One interesting consequence of byte-count-encoded (and optionally
numbered) lines is that there is no such thing as a final line that
does not end with a newline. That is:

FILE *somefile = fopen("somefile.txt", "w");
... check for errors as needed ...
fprintf(somefile, "ab\nc\nd");
fclose(somefile);

is faced with a problem: should it write the three lines saying
"line 1: two bytes, ab; line 2: one byte, c; line 3: one byte, d"
-- which is "ab\nc\nd\n", which is not what you wrote -- or should
it write "line 1: two bytes, ab; line 2: one byte, c" -- which is
"ab\nc\n", which is *also* not what you wrote? The file format is
such that it is physically impossible to reproduce what you *did*
write. A file is a sequence of complete lines; there is no such
thing, in this file format, as an incomplete, not-newline-terminated,
line.

The C standard allows the runtime library to have either of the
two above behaviors, and different C libraries did different things.
If you want "ab\nc\nd\n" to appear in the file, you must write that
final newline yourself; only then is line 3, consisting of the
letter "d", sure to make its way into the file. (Assuming no disk
errors or other similar problems, of course.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 15 '05 #27

Telling an empty binary file from a "full" one

Similar topics