Programming in standard c

jacob navia

In my "Happy Christmas" message, I proposed a function to read
a file into a RAM buffer and return that buffer or NULL if
the file doesn't exist or some other error is found.

It is interesting to see that the answers to that message prove that
programming exclusively in standard C is completely impossible even
for a small and ridiculously simple program like the one I proposed.

1 I read the file contents in binary mode, what should allow me
to use ftell/fseek to determine the file size.

No objections to this were raised, except of course the obvious
one, if the "file" was some file associated with stdin, for
instance under some unix machine /dev/tty01 or similar...

I did not test for this since it is impossible in standard C:
isatty() is not in the standard.

2) There is NO portable way to determine which characters should be
ignored when transforming a binary file into a text file. One
reader (CB Falconer) proposed to open the file in binary mode
and then in text mode and compare the two buffers to see which
characters were missing... Well, that would be too expensive.

3) I used different values for errno defined by POSIX, but not by
the C standard, that defines only a few. Again, error handling
is not something important to be standardized, according to
the committee. errno is there but its usage is absolutely
not portable at all and goes immediately beyond what standard C
offers.

We hear again and again that this group is about standard C *"ONLY"*.
Could someone here then, tell me how this simple program could be
written in standard C?

This confirms my arguments about the need to improve the quality
of the standard library!

You can't do *anything* in just standard C.
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Dec 26 '07

Subscribe Reply

270

9220

« First
<
3
4
5
6
>

Randy Howard

On Sun, 30 Dec 2007 18:26:02 -0600, jacob navia wrote
(in article <fl**********@aioe.org>):

Randy Howard wrote:
>On Sun, 30 Dec 2007 17:50:58 -0600, Serve Lau wrote
(in article <68***************************@cache5.tilbu1.nb.ho me.nl>):

>>"Richard Heathfield" <rj*@see.sig.invalidschreef in bericht
news:dO******************************@bt.com.. .
Serve Lau said:

<snip>

You win a toaster with the software written in embedded
java! They tried C at first but The Management decided to make the
toaster XML configurable over the internet and the developers could not
get the size of the xml files and no network connection with standard C
so they opted for embedded java.
Java faces the same issues as C with regard to establishing the size of a
file. These are software engineering issues, not language issues.
Java already faced them you mean because such an operation is included in
*tadam* the standard library :)

Enlighten as to how this Java function handles real time changes to
file size between the time you make the call and the time you use the
result.

It doesn't. This is up to the programmer.

He said that java had face the software engineering issues. Apparently
that assertion was incorrect.

And when I do

fseek(f,0,SEEK_END);
fread(...);

If the file grows and I am not reading at the end this
is not the language problem but the programmer's problem.

Should we discard fseek/fread too?

Is that what passes for logic now?

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Dec 31 '07 #201

Golden California Girls

Serve Lau wrote:

>
"Harald van DÄ³k" <tr*****@gmail.comschreef in bericht
news:b3***************************@cache4.tilbu1.n b.home.nl...
>On Sun, 30 Dec 2007 22:29:32 +0100, Serve Lau wrote:
>>"Army1987" <ar******@NOSPAM.itschreef in bericht
news:fl**********@tdi.cu.mi.it...
Serve Lau wrote:
fopen("LPT1:", "r");

Yeah, reading from a printer is veeeeery non-portable...

Not sure what you mean with this, does fopen("LPT1", "r") return a valid
FILE * on Linux for instance?

If a file named LPT1 exists, sure. It's just another name.

But on those systems where LPT1 is a printer device, you should probably
be writing to it, not reading from it.

usenet is funny lol, I'm sure you are smarter irl as you make it out be
right now

But ok, let me rephrase!

If you want to write a program that opens the printer for reading (yes
reading because you just want to get the hypothetical filesize) and you
use fopen("LPT1") will you expect the program to actually open the
printer on Linux?

Is it clear now what I meant? rofl

Until you get some printer that sends back status info and you really can open
it for reading.

Dec 31 '07 #202

Gordon Burditt

>No, you malloc() enough memory for the size of the file *AFTER* the

>time of interest. Every operation in C takes at least two moments,
and on a multi-tasking system, someone else can make changes the
moment before.

So? Like Jacob has indeed said 1000 times already, by that same logic you
can't use fread or fgets anymore either.
Somebody else could've made the file smaller when you started fgets on the
last line.

In which case you get an end-of-file indication, or an error, not a buffer
overflow which can let someone execute arbitrary code.

Dec 31 '07 #203

Gordon Burditt

>Use of filesize() to allocate a buffer, and then assuming the file size

>has not changed encourages virus-friendly programming. You seem to be
advocating buffer overflows. The other problems you mention seem to be
limited to loss of data.

I allocate the buffer given by filesize, then I read exactly that with
fread.

You do, perhaps, but others in the thread seem to think, and gave
procedures or code demonstrating that, that indicates otherwise.

>If the file has grown I will ignore what goes beyond, and if the file
shrinks, I will have a short read and a too large buffer.

There is NO WAY I COULD HAVE A BUFFER OVERFLOW!

For the nth time:

You seem to have the idea that USENET is instantaneous. Chances are
all the posts you are complaining about were posted before you posted
your first rebuttal.

>You are telling just NONSENSE!

No, it's nonsense, and just because YOU wouldn't do it doesn't mean
others won't. Some of them have described doing just that, with
pseudo-code or code, in this discussion.

Dec 31 '07 #204

Gordon Burditt

>>malloc enough memory for the size of the file at the time of

>>interest. it really doesn't get much simpler.

No, you malloc() enough memory for the size of the file *AFTER* the
time of interest. Every operation in C takes at least two moments,
and on a multi-tasking system, someone else can make changes the
moment before.

Do you get paid by the buffer overflow?

There is no buffer overflow as I told you in another message.

I repeat it now since you went silent instead of acknowledging it.

On USENET, it takes at least a week to "go silent". It hasn't
been anywhere near that long. Some people have lives outside
of USENET. And article transmission is not instantaneous.
Messages "cross in the mail" all the time.

>1) I call filesize
2) I allocate a buffer with the result+1.
3) I read with fread into that buffer.

You still haven't said anything up to this point about error checking,
or what size you use for fread().

>If the file is bigger, the fread argument will ensure I read ONLY
what I allocated. I ignore the rest. If the file is smaller I get
a short read and the buffer is too big.

You still haven't stated what you're using as the argument to fread().

>THERE IS NO POSSIBILITY OF A BUFFER OVERFLOW.

I suppose you will stay silent now, instead of acknowledging your
error.

Lack of any error checking is an improvement over the possibility of
a buffer overflow, but not by much.

Dec 31 '07 #205

Richard Heathfield

jacob navia said:

Ben Pfaff wrote:
>jacob navia <ja***@nospam.comwrites:

>>Why do you ignore what I am saying?

The usual reason in Usenet would be that you are in his killfile.
(I don't know whether that is actually the case.)

Of course I am not in his killfile when it comes to
attacking me.

My point is proved. The silence of those people speaks for itself.

If you talk nonsense and we say so, you claim we are attacking you. If you
talk nonsense and we don't say so, you claim we are ignoring you.

Either way, your claim is specious and without foundation. When you talk
nonsense, all it means is that you're talking nonsense. Since you rarely
talk anything else, you should not be surprised when people either say so
(and risk being accused of "attacking" you) or don't say so (and risk
being accused of "ignoring" you).

What are you after? Intelligent discourse about C with your peers? Fine, so
start talking intelligently about C. You may think you are already doing
so but, alas, this is rarely the case.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Dec 31 '07 #206

Richard Heathfield

Serve Lau said:

>
"Richard Heathfield" <rj*@see.sig.invalidschreef in bericht
news:dO******************************@bt.com...
>Serve Lau said:

<snip>

>>You win a toaster with the software written in embedded
java! They tried C at first but The Management decided to make the
toaster XML configurable over the internet and the developers could not
get the size of the xml files and no network connection with standard C
so they opted for embedded java.

Java faces the same issues as C with regard to establishing the size of
a file. These are software engineering issues, not language issues.

Java already faced them you mean because such an operation is included in
*tadam* the standard library :)

Fine. All this means is that the Java standard library designers have taken
some arbitrary decisions and enshrined them in a standard library. Because
they are not fools, we can presume that these arbitrary decisions were
taken as intelligently as possible, but nevertheless the decisions they
have made will necessarily ignore some of the issues discussed in this
thread.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Dec 31 '07 #207

Paul Hsieh

On Dec 26, 12:54 pm, jacob navia <ja...@nospam.comwrote:

In my "Happy Christmas" message, I proposed a function to read
a file into a RAM buffer and return that buffer or NULL if
the file doesn't exist or some other error is found.

It is interesting to see that the answers to that message prove that
programming exclusively in standard C is completely impossible even
for a small and ridiculously simple program like the one I proposed.

1 I read the file contents in binary mode, what should allow me
to use ftell/fseek to determine the file size.

No objections to this were raised, except of course the obvious
one, if the "file" was some file associated with stdin, for
instance under some unix machine /dev/tty01 or similar...

I did not test for this since it is impossible in standard C:
isatty() is not in the standard.

I would expect fseek() to return with an error if the file is not
"seekable", such as stdin.

2) There is NO portable way to determine which characters should be
ignored when transforming a binary file into a text file. One
reader (CB Falconer) proposed to open the file in binary mode
and then in text mode and compare the two buffers to see which
characters were missing... Well, that would be too expensive.

Oh, well, if you want to do this stuff in text mode, you need to
iterate character by character. This distinction of "text mode"
versus "binary mode" is nothing but performance problems no matter
what. Personally, log files are the only thing I ever open in text
mode any more.

3) I used different values for errno defined by POSIX, but not by
the C standard, that defines only a few. Again, error handling
is not something important to be standardized, according to
the committee. errno is there but its usage is absolutely
not portable at all and goes immediately beyond what standard C
offers.

Yeah, its just one of the many ways the C language standard encourages
vendors to be non-portable. In any event, its not friendly to re-
entrancy anyways, as the error value needs to be copied out to make
way for other errors to be reported. If you are going to hang onto
error values anyways, you might as well just return them, and make up
your own anyways. That's what I do. (OTOH, I don't consider mutable
static context, such as errno, to be a valid interface for anything.)

We hear again and again that this group is about standard C *"ONLY"*.
Could someone here then, tell me how this simple program could be
written in standard C?

Personally, I think trying to do something like this according to the
standard is a complete waste of time. I have files on disk today that
are in excess of ULONG_MAX in size. intmax_t (which is int64_t on my
system) is plenty large enough to hold the size, so its not like it
couldn't or shouldn't be representable. And of course, my system also
has some nice 64 bit extensions to fseek and ftell, so we know its
possible.

If we ignore file sizes and, say, allow INT_MAX-1 to be good enough
for the size of files, then I would recommend downloading "The Better
String Library" and just calling bread() (or breada()). It
iteratively allocates more and more memory to fit the data that is
read -- you can then call ballocmin() on the result if you want to
keep the memory usage tight. Its just an extra O(n) anyways. Meh.

This confirms my arguments about the need to improve the quality
of the standard library!

Well for fseek and ftell, the case is very clear and obvious.
"unsigned long" is not an appropriate type for file offsets. This was
"corrected" with fgetpos() and fsetpos() except that they took all
arithmetic away from you for some reason, and there is no way to
simply seek to the end of the file with them (without simply reading
it all.) Clearly we need to add something like:

intmax_t fgetfilesize (FILE * fp);
int faddpos (fpos_t * pos, intmax_t offset);

You can't do *anything* in just standard C.

Well that might be an exaggeration. But what can definitively be
said, is that as technology and ideas improve, what the C standard
specifies becomes applicable to a smaller and smaller percentage of
that space. And certainly portable C is diminishing to zero.

What the standard specifies does not scale, and does not take
improving technology into account, except in that it allows platforms
to extend away and do their own thing with respect to implementation
specific behavior. (Which makes the idea of a "standard"
meaningless.)

A fairly straight forward application like Bittorrent cannot be
written in anything resembling portable C code, not even just the file
system back end. It just an embarrassment that a tool that should be
seen as a system level tool is best written in Python on Java.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Dec 31 '07 #208

Serve Lau

"Gordon Burditt" <go***********@burditt.orgschreef in bericht
news:13*************@corp.supernews.com...

>>>malloc enough memory for the size of the file at the time of
interest. it really doesn't get much simpler.

No, you malloc() enough memory for the size of the file *AFTER* the
time of interest. Every operation in C takes at least two moments,
and on a multi-tasking system, someone else can make changes the
moment before.

Do you get paid by the buffer overflow?

There is no buffer overflow as I told you in another message.

I repeat it now since you went silent instead of acknowledging it.

On USENET, it takes at least a week to "go silent". It hasn't
been anywhere near that long. Some people have lives outside
of USENET. And article transmission is not instantaneous.
Messages "cross in the mail" all the time.

>>1) I call filesize
2) I allocate a buffer with the result+1.
3) I read with fread into that buffer.

You still haven't said anything up to this point about error checking,
or what size you use for fread().

Let me answer for you. He uses the value returned by filesize as the size
argument. Is it that hard to understand to see what he's trying to say or am
I just smart? Let me explain some more. He says "1 I call filesize". Then he
says "2) allocate the result + 1". In my unending wisdom I could deduce that
"the result" means The Result from filesize! Applause for me.

And uh, lets just assume for the sake of argument that all functions are
checked for errors, or we could just go on arguing about that it was
omitted.
I thought people here were mature?

>
>>If the file is bigger, the fread argument will ensure I read ONLY
what I allocated. I ignore the rest. If the file is smaller I get
a short read and the buffer is too big.

You still haven't stated what you're using as the argument to fread().

>>THERE IS NO POSSIBILITY OF A BUFFER OVERFLOW.

I suppose you will stay silent now, instead of acknowledging your
error.

Lack of any error checking is an improvement over the possibility of
a buffer overflow, but not by much.

Dec 31 '07 #209

Serve Lau

"Gordon Burditt" <go****@hammy.burditt.orgschreef in bericht
news:13*************@corp.supernews.com...

>>No, you malloc() enough memory for the size of the file *AFTER* the
time of interest. Every operation in C takes at least two moments,
and on a multi-tasking system, someone else can make changes the
moment before.

So? Like Jacob has indeed said 1000 times already, by that same logic you
can't use fread or fgets anymore either.
Somebody else could've made the file smaller when you started fgets on the
last line.

In which case you get an end-of-file indication, or an error, not a buffer
overflow which can let someone execute arbitrary code.

Explain to me how this code will cause a buffer overflow when during the
fread the file gets smaller or bigger.

file = fopen("multi_accessed_file", "rb");
size = filesize(file);
buffer = malloc(size+1);
numread = fread(buffer, 1, size, file);

And dont start about not checking for errors, in real code the appropriate
functions would not be called when a function would fail and the user would
get reported. Now please tell us where's the buffer overflow in those 2
situations?

>

Dec 31 '07 #210

Serve Lau

"Richard Heathfield" <rj*@see.sig.invalidschreef in bericht
news:6q*********************@bt.com...

Serve Lau said:

>>
"Richard Heathfield" <rj*@see.sig.invalidschreef in bericht
news:dO******************************@bt.com...
>>Serve Lau said:

<snip>

You win a toaster with the software written in embedded
java! They tried C at first but The Management decided to make the
toaster XML configurable over the internet and the developers could not
get the size of the xml files and no network connection with standard C
so they opted for embedded java.

Java faces the same issues as C with regard to establishing the size of
a file. These are software engineering issues, not language issues.

Java already faced them you mean because such an operation is included in
*tadam* the standard library :)

Fine. All this means is that the Java standard library designers have
taken
some arbitrary decisions and enshrined them in a standard library. Because
they are not fools, we can presume that these arbitrary decisions were
taken as intelligently as possible, but nevertheless the decisions they
have made will necessarily ignore some of the issues discussed in this
thread.

That is probably true. Making arbitrary decisions as intelligently as
possible is sometimes called "software engineering". Of course this class
cant handle all files in all situations ever possible. They apparently chose
to include a class that handles many common cases and its up to the
programmer to understand that when he now has "sparse files" that
File.length() might not fit his needs. I found a thread where a programmer
wanted to use File.length() on Windows to get the real bytes occupied in
clusters and sectors on disk and was surprised it didnt work. The answer was
to write a Windows dependant JNI function if you wanted to do something like
that. Like jacob has said, you dont have to use it, it is optional. When the
function doesnt fit your needs there's probably a function that does and use
that.

Dec 31 '07 #211

Serve Lau

"Randy Howard" <ra*********@FOOverizonBAR.netschreef in bericht
news:00*****************************@news.verizon. net...

They might be funny to someone that doesn't realize that C runs on a
lot of platforms other than those they he/she is familiar with. I'm
not laughing, however.

C runs on more platforms than I'm familiar with? Is there more than windows
then? The point I'm trying to make that a function like filesize could be
put into the standard easily regardless if there are situations where this
function wont work well. The standard already has many functions that wont
work on many systems.

>
>There are BTW more uses for a filesize function than just get the size to
malloc memory with. How about for your DBMS and you want to create an
administration tool where you can constantly monitor how big the files
are.
Just an extra thingie you would be able to do in standard C.

So you could make sure that the normal I/O to the files happens slower
than it would otherwise. Yes, I could see how some DMBS admins might
think that's actually a good thing. I am laughing now, btw.

Could be interesting to see a graph over time when the files grow or shrink
most. But yeah funny haha, who needs management and marketing information
anyways. What DBMS is it that you work on?

>
>This is really getting silly, a function like filesize could be put into
the
standard and it would be useful. But as pointed out elsewhere in this
thread
this discussion wont help to get the function into it.

Or, you could write one to do what you desire, in far less time than
you have spent so far on this thread, which does whatever it is you
think is appropriate for your platform.

Dont worry about my time I'm in "the usenet zone" at the moment. I would've
used fstat long ago if a situation called for getting a file's size and
where I expected this function to work. I would've kept it secret to clc of
course because now I have written unportable code.

Dec 31 '07 #212

Chris Torek

I am mostly trying to stay out of this thread at this point, but
I will add a note or two here.

In article <99**************************@cache6.tilbu1.nb.hom e.nl>,
Serve Lau <ni@hao.comwrote:

>[The Java library designers] apparently chose to include a class
that handles many common cases and its up to the programmer to
understand that when he now has "sparse files" that File.length()
might not fit his needs. I found a thread where a programmer wanted
to use File.length() on Windows to get the real bytes occupied in
clusters and sectors on disk and was surprised it didnt work. The
answer was to write a Windows dependant JNI function if you wanted
to do something like that. Like jacob has said, you dont have to
use it, it is optional. When the function doesnt fit your needs
there's probably a function that does and use that.

It is worth considering these things here, too:

- The Java "base system" is larger (*much* larger) than that for C.
I suspect it handily exceeds even that for C++.

- Java is not really expected to work on the tiniest embedded
systems (although it *is* expected to work on what I would
consider "surprisingly small" systems). For instance, one
would probably not use Java to run the CPU inside a disk drive.
C is in fact used for such things (although C already splits
these off into "freestanding" systems, so increasing the size
of the "hosted" library, as C99 did, is not as fatal as it
might seem at first blush).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Dec 31 '07 #213

jacob navia

Serve Lau wrote:

>
"Richard Heathfield" <rj*@see.sig.invalidschreef in bericht
news:6q*********************@bt.com...
>Fine. All this means is that the Java standard library designers have
taken
some arbitrary decisions and enshrined them in a standard library.
Because
they are not fools, we can presume that these arbitrary decisions were
taken as intelligently as possible, but nevertheless the decisions they
have made will necessarily ignore some of the issues discussed in this
thread.

That is probably true. Making arbitrary decisions as intelligently as
possible is sometimes called "software engineering".

Gosh do not use that kind of words with the regulars!

They can't understand anything complicated like that, anyway that
doesn't appear in the C89 standard...

Of course this
class cant handle all files in all situations ever possible. They
apparently chose to include a class that handles many common cases and
its up to the programmer to understand that when he now has "sparse
files" that File.length() might not fit his needs.

The people designing Java were actually interested in providing a
good library, not just making everybody go to C++.

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Dec 31 '07 #214

Kenny McCormack

In article <13*************@corp.supernews.com>,
Gordon Burditt <go***********@burditt.orgwrote:
....

>On USENET, it takes at least a week to "go silent". It hasn't
been anywhere near that long. Some people have lives outside
of USENET. And article transmission is not instantaneous.
Messages "cross in the mail" all the time.

That's true in the theoretical, but not in the practical, sense.

Here, if Jacob posts something and heathfield doesn't post an attack
within the hour, it is probably a good idea to send an ambulance over to
heathfield's place.

Dec 31 '07 #215

Syren Baran

jacob navia schrieb:

Wait for a certain time is impossible in standard C. (As I said,
you can't do much in standard C). But that could be a portable
action if the interface level of C wouldn't be so low.

One logical conclusion to adding such function to C would be "You shalt
not use C for an 8bit microcontroller without a timer!" Amen.

I´m not sure if youre to focused on PC-style hardware to understand how
daft your standardisation propositions are. The same libraries on
everythig from a small microcontroller with a couple of K of memory to a
large supercomputer?
Even Java, which is highly portable, has different libraries for PC´s
and small devices. And even other ones for smartcards.

Some things just dont belong in the language standard. If its part of
the language standard it has to be implementable on every device that
can be programmed in that language, simple as that.

Dec 31 '07 #216

user923005

On Dec 30, 11:17*pm, gordonb.po...@burditt.org (Gordon Burditt) wrote:

>malloc enough memory for the size of the file at the time of
interest. it really doesn't get much simpler.

No, you malloc() enough memory for the size of the file *AFTER* the
time of interest. *Every operation in C takes at least two moments,
and on a multi-tasking system, someone else can make changes the
moment before.

Do you get paid by the buffer overflow?

There is no buffer overflow as I told you in another message.

I repeat it now since you went silent instead of acknowledging it.

On USENET, it takes at least a week to "go silent". *It hasn't
been anywhere near that long. *Some people have lives outside
of USENET. *And article transmission is not instantaneous.
Messages "cross in the mail" all the time. *

1) I call filesize
2) I allocate a buffer with the result+1.
3) I read with fread into that buffer.

You still haven't said anything up to this point about error checking,
or what size you use for fread().

If the file is bigger, the fread argument will ensure I read ONLY
what I allocated. I ignore the rest. If the file is smaller I get
a short read and the buffer is too big.

You still haven't stated what you're using as the argument to fread().

THERE IS NO POSSIBILITY OF A BUFFER OVERFLOW.

I suppose you will stay silent now, instead of acknowledging your
error.

Lack of any error checking is an improvement over the possibility of
a buffer overflow, but not by much. *

As we recall, the original goal of this thread was to read a file into
memory.
Jacob's algorithm fails to do that, if someone has extended the file.

Dec 31 '07 #217

Walter Roberson

In article <47***********************@newsspool4.arcor-online.net>,
Syren Baran <sy***@gmx.dewrote:

>Some things just dont belong in the language standard. If its part of
the language standard it has to be implementable on every device that
can be programmed in that language, simple as that.

I don't see how that follows. For example, Appendix F of C99,
describing behaviours of floating point arithmetic, is part of
the C standard and yet specifically says that the rules there
only apply if a certain macro is defined. It is completely conforming
to have floating point that behaves differently than described in
Appendix F, as long as that macro is not defined. Thus, there are
parts of the C language standard which need not be implementable
on every device that can be programmed in C.
--
"History is a pile of debris" -- Laurie Anderson

Dec 31 '07 #218

jacob navia

Walter Roberson wrote:

In article <47***********************@newsspool4.arcor-online.net>,
Syren Baran <sy***@gmx.dewrote:

>Some things just dont belong in the language standard. If its part of
the language standard it has to be implementable on every device that
can be programmed in that language, simple as that.

I don't see how that follows. For example, Appendix F of C99,
describing behaviours of floating point arithmetic, is part of
the C standard and yet specifically says that the rules there
only apply if a certain macro is defined. It is completely conforming
to have floating point that behaves differently than described in
Appendix F, as long as that macro is not defined. Thus, there are
parts of the C language standard which need not be implementable
on every device that can be programmed in C.

And in machines with no file system files do not have any sense.

In a 8 bit micro controller maybe there is no sense in implementing long
longs...

Etc.

filesize() could very well be restricted to a certain subset of all
the machines that implement C.

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Dec 31 '07 #219

Keith Thompson

jacob navia <ja***@nospam.comwrites:
[...]

And in machines with no file system files do not have any sense.

And of course the standard allows for this by permitting freestanding
implementations, which don't have to provide any of the standard
library (apart from a few headers that don't declare any functions).

(jacob, the English idiom is "do not make any sense".)

In a 8 bit micro controller maybe there is no sense in implementing long
longs...

But the standard doesn't permit a conforming implementation, either
hosted or freestanding, to omit support for long long, which must be
at least 64 bits. This leaves an implementer for such a target with
two choices: either emulate 64-bit arithmetic in software (which
probably isn't much harder than emulating 32-bit arithmetic, and
needn't affect programs that don't use it), or produce a
non-conforming implementation.

Etc.

filesize() could very well be restricted to a certain subset of all
the machines that implement C.

*If* filesize() were added to the standard, it should probably be
required for all hosted implementations; it can always return an error
indication if the size of a particular file can't be determined. This
is what fseek() does; many files are not seekable.

There is at least one sticking point. What if the only way to
determine the exact size of a file is to read the entire file and
count the bytes? Should filesize() for, say, a 2-gigabyte file
actually be required to do that, or may it immediately return an error
indication? Probably it should be implementation-defined -- which
means there will be cases where the implementation *could* determine
the exact size of a file, but refuses to do so. The alternative is
for a seemingly innocuous query to tie up the program for a long time.

I wouldn't object to seeing filesize() added to the standard, as long
as all these issues are resolved and the description includes
sufficient warnings. But I don't actually expect it to happen.

Note that, in many implementations, the trick of fseek()ing to the end
of a file and calling ftell() to get the offset actually works, at
least for files whose size doesn't exceed LONG_MAX. It's not
portable, of course, but if you happen to know that your program will
run only on systems where that works, it's probably no worse than
invoking some system-specific function like fstat().

--
Keith Thompson (The_Other_Keith) <ks***@mib.org>
[...]
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Dec 31 '07 #220

Walter Roberson

In article <87************@kvetch.smov.org>,
Keith Thompson <ks***@mib.orgwrote:

>There is at least one sticking point. What if the only way to
determine the exact size of a file is to read the entire file and
count the bytes? Should filesize() for, say, a 2-gigabyte file
actually be required to do that, or may it immediately return an error
indication? Probably it should be implementation-defined -- which
means there will be cases where the implementation *could* determine
the exact size of a file, but refuses to do so. The alternative is
for a seemingly innocuous query to tie up the program for a long time.

The amount of work to put into getting the size could be a parameter.

--
"History is a pile of debris" -- Laurie Anderson

Dec 31 '07 #221

Mark McIntyre

On Mon, 31 Dec 2007 22:34:41 +0000, Walter Roberson wrote:

In article <87************@kvetch.smov.org>, Keith Thompson
<ks***@mib.orgwrote:

>>refuses to do so. The alternative is for a seemingly innocuous query to
tie up the program for a long time.

The amount of work to put into getting the size could be a parameter.

interesting idea. Some new macros in limits.h, to assist:

#define LOOK_LIKE_YOURE_TRYING 1
#define UNDERGRADUATE_EFFORT_LEVEL 2
.....
#define UTTERLY_ANAL_CLC_MODE -1LL

Dec 31 '07 #222

CBFalconer

jacob navia wrote:

Keith Thompson wrote:

.... snip ...

>>
Obviously if you blindly read the entire file into your 1-megabyte
buffer, you risk an overrun. So don't do that.

The above was written by Keith Thompson <ks***@mib.org>. Gordon,
if you post a followup, please quote this paragraph. I do not
give permission to quote my words without attribution.

Why do you ignore what I am saying?

I posted an identical message in this thread proving I could not
possibly have a buffer overflow.

You say the same thing without quoting me.

This is Usenet. There is no guaranteed delivery. That is why
articles should stand by themselves, and is the basic reason for
quotation mechanism. I would have thought you already knew this.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee, Frohe Weihnachten
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Jan 1 '08 #223

CBFalconer

Serve Lau wrote:

"Army1987" <ar******@NOSPAM.itschreef in bericht

.... snip ...

>
>Yeah, reading from a printer is veeeeery non-portable...

Not sure what you mean with this, does fopen("LPT1", "r") return
a valid FILE * on Linux for instance?

It would normally depend on what the file LPT1 is made, and the
details of the system interface to it. Note this does not preclude
(nor enable) use of LPT1 as a printer device.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee, Frohe Weihnachten
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Jan 1 '08 #224

CBFalconer

Gordon Burditt wrote:

>

>>Use of filesize() to allocate a buffer, and then assuming the
file size has not changed encourages virus-friendly programming.
You seem to be advocating buffer overflows. The other problems
you mention seem to be limited to loss of data.

I allocate the buffer given by filesize, then I read exactly that
with fread.

You do, perhaps, but others in the thread seem to think, and gave
procedures or code demonstrating that, that indicates otherwise.

>If the file has grown I will ignore what goes beyond, and if the
file shrinks, I will have a short read and a too large buffer.

There is NO WAY I COULD HAVE A BUFFER OVERFLOW!

For the nth time:

You seem to have the idea that USENET is instantaneous. Chances
are all the posts you are complaining about were posted before
you posted your first rebuttal.

>You are telling just NONSENSE!

No, it's nonsense, and just because YOU wouldn't do it doesn't
mean others won't. Some of them have described doing just that,
with pseudo-code or code, in this discussion.

I have no idea who you are quoting. This leaves off any possible
explanations for idiotic comments. Kindly include proper
attribution lines in your posts.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee, Frohe Weihnachten
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Jan 1 '08 #225

SM Ryan

# >>>AlphaServer DS10L Noname pc clone (my machine) 466Mhz CPU
# >>>EV6 2GHZ Dual core AMD
# >>>
# >>Remember, CISC vs RISC !
# >>
# >The RISC idea was to reduce the instructions and speed up the clock.
# >
# No. Suggest you read up on this.
#
# Pedantic.

Program to cope with or avoid endian and integer size issues.
Then you don't have to worry about cisc vs risc.

It turns out the only time I know if I'm running on Intel Macs
or PPC Macs is if I have a program error (like parameter type
mismatches) that show different errors on the machines. Apple
can fight with IBM and Intel if they want. My code runs the
same either way.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
I'm not even supposed to be here today.

Jan 1 '08 #226

James Kuyper

On 2007-12-26, user923005 wrote:

On Dec 26, 3:10 pm, jacob navia <ja...@nospam.comwrote:
>user923005 wrote:
>>On Dec 26, 2:12 pm, jacob navia <ja...@nospam.comwrote:
jameskuy...@verizon.net wrote:
jacob navia wrote:
Ah well. I am dreaming then all the time.

I write

dir

and the multi-user file system tells me the size of each file.

ANd in unix I do

ls

and (WONDER) I get a meaningless result with the file size of
each file.

And if you do ls again, one second later, all of those files might be
gone or different sized. If you relied on the answer that you got,
you would (at best) get the wrong answer on occasion. If you expected
that the size you got would hold all of the file, and if someone added
a record, then your memory allocation to hold it is too small and when
you read the data, the operation will overwrite memory. If you can
come up with a simple work-around for this obvious and fundamental
problem, I would like to hear of it.

....

>If we define filesize as the number of bytes that would
be returned when reading the file we do not have to special case
anything.

I am surprised that you do not understand the ramifications of not
being the only one allowed to access a file.

I've stayed out of this part of the discussion, but I've decided that I
have to defend at least this aspect of Jacob's code. Sure, it's possible
for a file to change it's contents while you are working on it.
However, whether or not it's necessary or even appropriate to write code
which copes with that possibility depends upon the application.

I only remember one time in the last 3 decades where I wrote a program
which was actually intended to produce it's desired results despite the
possibility that it's input file might change while reading it. That
program worked by making sure the file didn't change, by using POSIX
file and record locking (I remember that I needed both file and record
locking in the same program - obviously not for the same file). For most
of the other programs I've ever written, failing gracefully by producing
an informative error message and avoiding undefined behavior is all that
they are expected to do in such a situation. For most of those programs,
it's the only thing that they could do.

It's completely normal for a program to require that it's input files be
changed, if at all, only by the program itself. The overwhelming
majority of the programs I've written, and most of the programs I use,
have precisely that requirement. It is entirely routine and normal to
take the precautions needed to ensure that a program's input files are
not subject to change during the running of the program: we use file
permissions to prevent unauthorized writes. We write scripts which make
sure that the programs which create the input files are complete, before
starting the programs which read them. W set aside working areas, and
count on other users who do have group-write permissions to those
working areas to be polite enough not to exercise those permissions
while we're using those areas.

There's obviously programs that do need to deal with input files that
might change while they are being read. I frequently use mail readers
and version control systems that must be able to deal with such
situations. However I think it's inappropriate to impose such a
requirement on all programs, regardless of their purpose.

Jan 1 '08 #227

Army1987

Serve Lau wrote:

"Army1987" <ar******@NOSPAM.itschreef in bericht
news:fl**********@tdi.cu.mi.it...
>Serve Lau wrote:

[snip] In fact writing

>>fopen("LPT1:", "r"); is already not portable anymore in the sense that
your
code wont work on multiple systems. Same goes for most of the other
system
dependant files mentioned.
Yeah, reading from a printer is veeeeery non-portable...

Not sure what you mean with this, does fopen("LPT1", "r") return a valid
FILE * on Linux for instance?

It was a sarcastic way of saying 'You meant fopen("LPT1:", "w"), not "r",
right?'.

--
Army1987 (Replace "NOSPAM" with "email")

Jan 1 '08 #228

Serve Lau

"Army1987" <ar******@NOSPAM.itschreef in bericht
news:fl**********@tdi.cu.mi.it...

Serve Lau wrote:

>"Army1987" <ar******@NOSPAM.itschreef in bericht
news:fl**********@tdi.cu.mi.it...
>>Serve Lau wrote:

[snip] In fact writing

>>>fopen("LPT1:", "r"); is already not portable anymore in the sense that
your
code wont work on multiple systems. Same goes for most of the other
system
dependant files mentioned.
Yeah, reading from a printer is veeeeery non-portable...

Not sure what you mean with this, does fopen("LPT1", "r") return a valid
FILE * on Linux for instance?

It was a sarcastic way of saying 'You meant fopen("LPT1:", "w"), not "r",
right?'.

with "r" fopen returns a valid pointer on windows. Why would you want to
open for writing when you just want to ask file size?

>
--
Army1987 (Replace "NOSPAM" with "email")

Jan 1 '08 #229

Syren Baran

jacob navia schrieb:

And in machines with no file system files do not have any sense.

Completely wrong.
Even on a machine without any persistant storage it would be wrong.
Actually the whole file system is a file, e.g. /dev/sda2 in my case. The
whole disk is a file, even if there is no filesystem in the media.
Actually even stdin and stdout are files. Same holds true for a mouse
and a joystick.

>
In a 8 bit micro controller maybe there is no sense in implementing long
longs...

Care to explain why?

Jan 1 '08 #230

=?UTF-8?q?Harald_van_D=C4=B3k?=

On Tue, 01 Jan 2008 22:17:57 +0100, Syren Baran wrote:

jacob navia schrieb:
>And in machines with no file system files do not have any sense.
Completely wrong.

Completely right.

Even on a machine without any persistant storage it would be wrong.
Actually the whole file system is a file, e.g. /dev/sda2 in my case.

That's because your /dev/ is a filesystem, or a directory on a
filesystem, providing the sda2 file.

The
whole disk is a file, even if there is no filesystem in the media.

True, there's no file system on the media, but there's a file system in
memory, and your operating system is what provides the layer required to
access data by name.

Actually even stdin and stdout are files. Same holds true for a mouse
and a joystick.

Again they're files on a filesystem.

When an implementation doesn't support any file system, the concept of
files doesn't make sense. When an implementation does support file
systems, files make sense, even if the implementation can only arrange to
store the contents of files in memory.

>In a 8 bit micro controller maybe there is no sense in implementing
long longs...
Care to explain why?

Because while it's possible to emulate 64-bit arithmetic on a processor
that doesn't natively support it, depending on the processor and the
program, it may be too slow to be useful.

Jan 1 '08 #231

jacob navia

Syren Baran wrote:

jacob navia schrieb:

>And in machines with no file system files do not have any sense.
Completely wrong.
Even on a machine without any persistant storage it would be wrong.
Actually the whole file system is a file, e.g. /dev/sda2 in my case. The
whole disk is a file, even if there is no filesystem in the media.

You have a file system, or I do not know where /dev/sda2 comes from...
You have no file system in that particular disk, but there is a file
system to access that device as a single file...

Actually even stdin and stdout are files. Same holds true for a mouse
and a joystick.
>>

Yes, because you have a file system in your OS. I am speaking about
a machine for controlling some hardware where there is just a program,
some flash memory for data, and that's all.

>In a 8 bit micro controller maybe there is no sense in implementing long
longs...
Care to explain why?

If you can't understand that it means that you are deliberately trying
to start a polemic because "jacob is always wrong and always speak
nonsense", to please your friends...

In an 8 bit microcontroller, there is with high probablity no NEED
for 64 bit arithmetic, and if you would need it it would be so
slow that it would be useless. Besides, those machines have little
memory to go around, and moving 64 bit integers and loading all the
software needed for that would explode the requirements for the
program.

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Jan 1 '08 #232

Syren Baran

Harald van DÄ³k schrieb:

On Tue, 01 Jan 2008 22:17:57 +0100, Syren Baran wrote:
>jacob navia schrieb:
>>And in machines with no file system files do not have any sense.
Completely wrong.

Completely right.

>Even on a machine without any persistant storage it would be wrong.
Actually the whole file system is a file, e.g. /dev/sda2 in my case.

That's because your /dev/ is a filesystem, or a directory on a
filesystem, providing the sda2 file.

Perhaps we have differing opinions on what a file system is.
Ext2/3,Reiserfs,ZFS,NTFS and FAT are surely filesystems.
A simply hierarchical structure is not necesarily a file system. Would
you consider a linked list a file system?
sda is placed within the filesystem hierachy, but that is not necesarry.
On a device without a filesystem it could be called just "sda".
On a device without an OS there could only one file, call it "storage"
or whatever, which could be opened normaly without requiring a file system.

>
>The
whole disk is a file, even if there is no filesystem in the media.

True, there's no file system on the media, but there's a file system in
memory, and your operating system is what provides the layer required to
access data by name.

Right, "access data by name". Thats why, e.g. open has its char*
argument. This does not necesarily imply a file system.

>
>Actually even stdin and stdout are files. Same holds true for a mouse
and a joystick.

Again they're files on a filesystem.

Stdin an stdout are files on a filesystem? They have file handles
associated with them. Where is "1" on my filesystem, e.g. "somecommand
2>&1"?

>
When an implementation doesn't support any file system, the concept of
files doesn't make sense. When an implementation does support file
systems, files make sense, even if the implementation can only arrange to
store the contents of files in memory.

As long as there is something that is at least either readable or
writable the concept of a file can make sense.

>

>>In a 8 bit micro controller maybe there is no sense in implementing
long longs...
Care to explain why?

Because while it's possible to emulate 64-bit arithmetic on a processor
that doesn't natively support it, depending on the processor and the
program, it may be too slow to be useful.

The keyword here is "may". I will fully agree to "There may be no sense
in supporting long longs on a 8 bit microcontroller", but not to the
unconditional sentance above.

Jan 1 '08 #233

jacob navia

Syren Baran wrote:

Harald van DÄ³k schrieb:
>On Tue, 01 Jan 2008 22:17:57 +0100, Syren Baran wrote:
>>jacob navia schrieb:
In a 8 bit micro controller maybe there is no sense in implementing
long longs...
Care to explain why?

Because while it's possible to emulate 64-bit arithmetic on a
processor that doesn't natively support it, depending on the processor
and the program, it may be too slow to be useful.

The keyword here is "may". I will fully agree to "There may be no sense
in supporting long longs on a 8 bit microcontroller", but not to the
unconditional sentance above.

You are completely BLIND. You have just CITED this
sentence:

<quote>
In a 8 bit micro controller maybe there is no sense in implementing
long longs...
<end quote>

I see a clear "maybe" there. But YOU do not see it. Your brain doesn't
want to see it because jacob is always wrong.

Since you do not see it, it doesn't exist of course. And from
a conditional sentence you make an "unconditional sentence"
and there you go...
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Jan 1 '08 #234

Syren Baran

jacob navia schrieb:

You are completely BLIND. You have just CITED this
sentence:

<quote>
In a 8 bit micro controller maybe there is no sense in implementing
long longs...
<end quote>

Please except my apology. I overlooked that word.

Jan 1 '08 #235

Keith Thompson

jacob navia <ja***@nospam.comwrites:

Syren Baran wrote:
>Harald van DÄ³k schrieb:
>>On Tue, 01 Jan 2008 22:17:57 +0100, Syren Baran wrote:
jacob navia schrieb:
In a 8 bit micro controller maybe there is no sense in implementing
long longs...
Care to explain why?

Because while it's possible to emulate 64-bit arithmetic on a
processor that doesn't natively support it, depending on the
processor and the program, it may be too slow to be useful.

>The keyword here is "may". I will fully agree to "There may be no
sense in supporting long longs on a 8 bit microcontroller", but not
to the unconditional sentance above.

You are completely BLIND. You have just CITED this
sentence:

<quote>
In a 8 bit micro controller maybe there is no sense in implementing
long longs...
<end quote>

I see a clear "maybe" there. But YOU do not see it. Your brain doesn't
want to see it because jacob is always wrong.

Since you do not see it, it doesn't exist of course. And from
a conditional sentence you make an "unconditional sentence"
and there you go...

And it doesn't even occur to you that it might have been an honest
mistake.

Please take a moment to stop and think before assuming that everything
is part of the Vast Anti-jacob Conspiracy. Trust me, Syren Baran
hasn't even shown up at any of the meetings. 8-)}

--
Keith Thompson (The_Other_Keith) <ks***@mib.org>
[...]
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Jan 1 '08 #236

Randy Howard

On Tue, 1 Jan 2008 17:33:23 -0600, Keith Thompson wrote
(in article <87************@kvetch.smov.org>):

Please take a moment to stop and think before assuming that everything
is part of the Vast Anti-jacob Conspiracy. Trust me, Syren Baran
hasn't even shown up at any of the meetings. 8-)}

When/where is the next meeting being held?

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Jan 2 '08 #237

dj3vande

In article <87************@kvetch.smov.org>,
Keith Thompson <ks***@mib.orgwrote:

>Note that, in many implementations, the trick of fseek()ing to the end
of a file and calling ftell() to get the offset actually works, at
least for files whose size doesn't exceed LONG_MAX. It's not
portable, of course, but if you happen to know that your program will
run only on systems where that works, it's probably no worse than
invoking some system-specific function like fstat().

I disagree that it's no worse.
If you use fseek() and ftell(), and try to build and run your program
on a system where the assumptions don't hold, it will silently accept
it until you run into a file that breaks one or more of the
assumptions. If you use fstat() and try to do the same thing, you'll
get a warning or error at compile time, and shouldn't have too much
trouble working out what it's trying to do and how to do it on the
system you're porting to, which ends up taking rather less total effort
to get the port working.

So unless you have an automated way to track the assumptions you're
making and verify that they hold on all the systems you try to build
on, you're probably better off with the explicitly nonportable code
than with code that makes nonportable assumptions about standard
interfaces.
dave

Jan 2 '08 #238

Kelsey Bjarnason

[snips]

On Sun, 30 Dec 2007 21:34:51 +0100, jacob navia wrote:

>So, explain to us how this file size you've calculated means anything
relevant to the C code issues discussed here - notably, trying to read the
entire file into a buffer.

Yes, this is 100% clear logic. I will apply it then.

I doubt it, but let's see.

1) You can't open a file with
fopen("name","a+")
since somebody else could grow the file after the file is positioned at
EOF, so you would overwrite his data.

You could, or the system could prevent one user writing, or it could to
the equivalent of a copy-on-write with subsequent merge, or it could
append your data to whatever end-of-file is at the exact time of write and
do the same for his, or...

The mechanics of this are beyond the scope of the language, for one thing,
and are beyond the scope of trivial file management for another. If you
need protection against this sort of thing, you need to be using at the
very least a locking mechanism, and perhaps a database instead of a simple
file. None of which has bugger all to do with the problem at hand.

2) ftell/fseek/ and in general all file primitives should be dropped
from the standard.

Oh, indeed. After all, if they're useless in _one_ case, they must, of
course, be useless in _all_ cases. You can't really be that simple.

I have answered this thousand times but you still come back with the
same nonsense!

If we keep coming back with the same nonsense, why can't you come up with
a single, valid response to any of it?

Jan 2 '08 #239

Kelsey Bjarnason

[snips]

On Sat, 29 Dec 2007 11:37:56 +0100, jacob navia wrote:

>Many C programs will need to get by on a lot less memory than this! I
know that for Jacob every computer is a 32-bit Windows box with
a couple of gigabytes of RAM, but out there in the real world C programs
often control things like toasters or kettles, where memory is severely
limited.

Look "CJ" whoever you are:

You know NOTHING of where I have programmed, or what I am doing.
Versions of lcc-win run in DSPs with 80k of memory, and only
20 usable.

And Win98 on a 486?

Jan 2 '08 #240

Kelsey Bjarnason

[snips]

On Fri, 28 Dec 2007 13:40:57 +0100, Richard wrote:

>Then the file size changes one microsecond later. What do they do
with this very expensive measurement that they have made?

How ridiculous. This is true for many things. Always "file size" in
"portable C" is useless. This is what Jacob is alluding to.

Yes, but Jacob persists in missing the obvious: this is not a limitation
of C, but of the very notion of determining file sizes. It doesn't matter
what language or OS you use, if there is *any* possibility of the file
being modified other than by the singular instance of the application in
question, you still face the same problems.

For some reason, Jacob seems to want to focus on supposed limitations of
C, without bothering to examine the underlying problem at all.

You seem to miss the point. filesize(fp) is portable if it's in the
standard. How it is implemented per platform is another issue.

And what actual use it is has yet to be determined.

>a generic function cannot produce a reliable answer. I guess that
other people have thought about this question and figured out that a
reliable file size function cannot be written in a generic manner.

Rubbish. All files have a byte size.

Good. So tell me the size of, oh, /var/log/apache/access.log. Oh,
whoops, it changed - a new log entry was written. Oh, whoops, it changed,
logrotate just archived it and set it to an empty file. The "size" you
got has absolutely no relation to the size now, so tell us what size the
file is, in bytes, in any meaningful sense: is it the 100K your function
reported? The 102K from a second ago? Or the 0K it is right now? Oh,
but you're going to allocate 100K, try to read 100K and get zero bytes
read - is that an error condition? A read failure? Or is that correct
behaviour, just completely inconsistent with the size you recorded?

>Noone has ever said this. A portable file size function that gives a
correct answer is clearly impossible.

How silly.

"correct" would be platform specific. The API need not be.

So which is the correct size of a partially compressed sparse file? The
uncompressed size it would be if it was actually "full"? The compressed
size, based on what's actually in it? The size it currently occupies on
the disk? If the latter, keep in mind that it bears absolutely no
relationship to the actual number of data bytes in the file.

So do tell, which is the "correct" size.

Jan 2 '08 #241

Keith Thompson

dj******@csclub.uwaterloo.ca.invalid writes:

In article <87************@kvetch.smov.org>,
Keith Thompson <ks***@mib.orgwrote:
>>Note that, in many implementations, the trick of fseek()ing to the end
of a file and calling ftell() to get the offset actually works, at
least for files whose size doesn't exceed LONG_MAX. It's not
portable, of course, but if you happen to know that your program will
run only on systems where that works, it's probably no worse than
invoking some system-specific function like fstat().

I disagree that it's no worse.
If you use fseek() and ftell(), and try to build and run your program
on a system where the assumptions don't hold, it will silently accept
it until you run into a file that breaks one or more of the
assumptions. If you use fstat() and try to do the same thing, you'll
get a warning or error at compile time, and shouldn't have too much
trouble working out what it's trying to do and how to do it on the
system you're porting to, which ends up taking rather less total effort
to get the port working.

So unless you have an automated way to track the assumptions you're
making and verify that they hold on all the systems you try to build
on, you're probably better off with the explicitly nonportable code
than with code that makes nonportable assumptions about standard
interfaces.

You're right.

--
Keith Thompson (The_Other_Keith) <ks***@mib.org>
[...]
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Jan 2 '08 #242

Kelsey Bjarnason

[snips]

On Wed, 02 Jan 2008 03:13:18 +0100, Richard wrote:

>Fine. All this means is that the Java standard library designers have taken
some arbitrary decisions and enshrined them in a standard library. Because
they are not fools, we can presume that these arbitrary decisions were
taken as intelligently as possible, but nevertheless the decisions they
have made will necessarily ignore some of the issues discussed in this
thread.

No. Not ignore them. Discard them as the kind of pedantic oneupsmanship
that has place in the real world where real, practical solutions are
required.

So what's the real, practical size of /var/log/apache/access.log?
Calculate the file size _now_, you get, say, 100K. Wait half a second,
it's 102K. Wait another second while logrotate reaps it, it's now 0K. So
which size is it - the 100K you read, the 102K it was almost immediately
after, or the 0K it is now?

Presumably the Java functions will do the same thing the equivalent C
functions would do - report size at time of inspection. They, however,
still face the same problem, that the size reported has absolutely bugger
all to do with the size _now_.

You call this - real-world problems faced by real-world programs -
"pedantic oneupsmanship"; we look at it as simply one more problem which
has not been dealt with.

One way to deal with it is to do what Java apparently did: accept that you
can only get a value for file size at time of querying, which is a
reasonable position to take, but doing so also ignores the fact that the
size may change between querying the size and making use of the size.
Some applications might need to track the changes; others won't. That
some don't does not mean no application needs to, nor does it mean that
the issue is simply one of pedanticism; it is, rather, a question of
determining what the actual problem to be solved is.

Jacob's exemplar of reading a file into memory is a good example. He uses
whatever function to calculate the size, so far so good. He allocates a
buffer, still all good. So his size recorded is, say, 100K and he
allocated 100K, all's well.

When he actually reads the file, though, and discovers zero bytes to read
- because the file has been rotated and emptied - is this an error, or is
this as expected?

The reality is that it is as expected... but it is also reality that
unless his code is particularly smart - *and* knows the details of the
system upon which it is being executed - it is very likely to see this as
an error condition and fail in some manner.

Is this an error? Not really. Is his code going to be smart enough to
recognise this as being something other than an error? Possibly, if he
knows about this exact file, but probably not in the general case,
particularly if his code is used on multiple disparate systems. You want
to write off such issues as simple pedanticism, yet they're not - they're
real-world issues which come up in real-world systems, which need to be
dealt with by real-world code. Simply hand-waving them away doesn't work.

So rather than hand-waving them away, how about explaining how, exactly,
you're going to deal with them? How do you plan to determine the "size"
of a file, when that size keeps changing? How do you plan to determine
the "size" of a compressed, or partially compressed, or sparse file? Or
worse, sparse, partially compressed files? What actual "size" matters,
and how do you determine that this is the size that matters?

Determining the size - if you plan to - can be critical. For example,
take a sparse file. One size value says the file is 10GB, another says
it's 200K. One reports "theoretical" size, the other reports actual
usage. Here's the kicker: if you want to copy this file to another file
system, you _may_ be able to do it onto a file system with 200K free, if
it supports sparse files, or you may require 10GB, if it doesn't. So
which size is the proper one - the 200K actually allocated on disk, or the
10GB which is the "data size" required to store it on another file system
without risking losing data?

This doesn't even address the issue of translation - is that 100K size
that you read still going to be 100K if you want to read the file as a
text file? Or is it only applicable if you read it in binary mode?
Probably the latter, but if it is, in fact, a text file and you wish to
process it as such, how much space do you need to allocate to handle it?
You can't tell this from the size on disk, now can you?

The whole concept of "file size" is one where the pat answer sounds good
and even is good in many cases, but it is also one where the pat answer
falls on its face in far too many cases to naively rely upon it, something
the more experienced folks keep trying to point out, yet some folks, for
some reason, seem to want to ignore this.

Perhaps the simplest way to approach the issue is to ponder those
real-world cases where the pat answer doesn't work well and consider what,
if any, solution you would propose, a solution which actually covers those
cases.

I suggested one such, but it involves reporting not a single answer, but
several: size on disk, size reserved (eg how big a sparse file "actually
is"), size of contained compressed data, size of contained data after
decompression. That's four; perhaps we need to add another four,
basically the same values but after translation to text mode - though I
suspect this is going to make file management awfully inefficient. Yet
that's 8 different size values to describe *one* file, and still doesn't
deal with the fact the file size - any of those sizes - can change at a
moment's notice.

So which of the eight is "the size of the file"? Size on disk has little
bearing on what you'd need to allocate to store the contents in memory.
Size after decompression would be closer, but doesn't tell you how much
overhead is involved due to translation. Neither of these tells you how
much space will be required if you want to copy the file to a file system
which doesn't support sparse files.

Oh, and then there are "forks". Can't forget those; several file systems
include them. Those make things even more interesting. With forks, the
reported size on disk may well represent the total data in all forks, but
if you allocate a buffer that size and try to read the file, you're liable
to get somewhat less than you'd expect, since you're liable to only be
reading one fork, not all of them. So is that an error? Or is that an
expected situation? Does your code know the difference between reported
size on disk and expected size when reading?

There are so many issues to consider, yet some folks want to treat it as
if the only consideration in the world is getting a singular value
representing "the size of the file", without ever stopping to consider -
or worse, writing off as "pedanticism" - the fact that "file size" is a
meaningless concept in all but a few cases.

So explain to us, you - or the others who think this issue is so trivial -
which of the 8 values I mentioned, the ones which don't even deal with
forked files, is "the size of a file".

Chances are, the best answer you'll come up with is "size on disk", but
this value is useless for virtually all purposes, other than determining -
in *some* cases - whether you have enough room on another disk to store a
copy of the file, and can fail miserably even there.

C doesn't have a standardized filelength that does anything useful? Okay,
great, it doesn't. What would *you* put in its place, though? Which of
the umpteen possible size values would you report for a given file? How
would you have it determine - and report - whether that size meant size on
disk, size required to store another copy, size before or after
decompression, etc, etc, etc?

Java designers apparently decided to pick one particular value and report
that. Fine, great, wonderful and all, but it doesn't deal with all the
other case. It is one possible answer, and presumably a perfectly
acceptable one - for the cases to which it applies. It is going to be
considerably less useful for other cases.

Jan 2 '08 #243

Kelsey Bjarnason

[snips]

On Sun, 30 Dec 2007 20:28:50 +0000, Gordon Burditt wrote:

>>There are file systems supporting sparse files, which in most cases appear
to be "normal" files, except that while they're reported to be of one
size, they're actually another size entirely - they claim to be 2GB, but
only occupy 200K on disk, for example.

But they take 2GB to read into memory, if that is the number of
interest.

So which size is reported as "file size"? 2GB, or 200K? Suppose it's
200K - actual space consumed. Reading 200K of consecutive data is going
to get you bogus data, most likely, if the writes were done in chunks less
than 200K and strewn about the sparse space.

>>Other systems allow for compressed or partially compressed files, where
the actual size of the data bears little, if any, relation to the size
of the file on disk.

But the uncompressed size is the relevant number for space to read it
into memory.

Sure. So which gets reported, size on disk, or size after decompression?

Mandatory file locking is one way to accomplish this, but it has other
problems. It opens the system up to denial-of-service attacks by a
program that locks lots of important system files to keep administrators
out, then proceeds to do something evil.

Yeah, and that's sorta the point to all this, some folks seem to want to
hand-wave away such issues.

Jan 2 '08 #244

Kelsey Bjarnason

On Sun, 30 Dec 2007 22:19:02 +0100, Serve Lau wrote:

"Kelsey Bjarnason" <kb********@gmail.comschreef in bericht
news:ar************@spanky.localhost.net...
>What is a "normal" file?

There are file systems supporting sparse files, which in most cases appear
to be "normal" files, except that while they're reported to be of one
size, they're actually another size entirely - they claim to be 2GB, but
only occupy 200K on disk, for example.

Other systems allow for compressed or partially compressed files, where
the actual size of the data bears little, if any, relation to the size of
the file on disk.

And what do you expect that fread will put into the buffer on such a
filesystem? The compressed or decompressed data?

Depends on the system, now don't it?

Take something like, oh, drivespace. Or compressed directories. Or other
variations on the theme. What gets read is decompressed data. Yet there
are now two distinct "size" values in play: size "apparent" on disk
(compressed size) and size of data in file. Which is reported? If you
get one when you were expecting the other, have fun.

Jan 2 '08 #245

jacob navia

Kelsey Bjarnason wrote:

[snips]

On Sat, 29 Dec 2007 11:37:56 +0100, jacob navia wrote:

>>Many C programs will need to get by on a lot less memory than this! I
know that for Jacob every computer is a 32-bit Windows box with
a couple of gigabytes of RAM, but out there in the real world C programs
often control things like toasters or kettles, where memory is severely
limited.

Look "CJ" whoever you are:

You know NOTHING of where I have programmed, or what I am doing.
Versions of lcc-win run in DSPs with 80k of memory, and only
20 usable.

And Win98 on a 486?

lcc-win runs perfectly in that environment. The debugger however, has
problems, probably because of bugs in Win98. By the way those
systems are no longer supported by Microsoft. I do not support
them either.
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Jan 2 '08 #246

Kelsey Bjarnason

[snips]

On Wed, 02 Jan 2008 11:11:28 +0100, jacob navia wrote:

>And Win98 on a 486?

lcc-win runs perfectly in that environment. The debugger however, has
problems, probably because of bugs in Win98. By the way those
systems are no longer supported by Microsoft. I do not support
them either.

MS has a reason to not support Win98: by not supporting it, they encourage
people to spend money buying newer versions, thus increasing the profits.

By contrast, unless you have some compelling reason to use features
specific to later versions of Windows, there's no reason for you to not
support such a configuration.

Or, put differently, by sticking with the maximally usable set of Windows
functionality, you gain the ability to market to anyone who has need to
use such a system and likely cannot find a competing compiler that will
work for them. Even if it's a small market, you could be the most
significant player in it, unless there is some compelling benefit to using
the new functionality which simply isn't available on such machines.

Jan 2 '08 #247

jacob navia

Kelsey Bjarnason wrote:

[snips]

On Wed, 02 Jan 2008 11:11:28 +0100, jacob navia wrote:

>>And Win98 on a 486?
lcc-win runs perfectly in that environment. The debugger however, has
problems, probably because of bugs in Win98. By the way those
systems are no longer supported by Microsoft. I do not support
them either.

MS has a reason to not support Win98: by not supporting it, they encourage
people to spend money buying newer versions, thus increasing the profits.

By contrast, unless you have some compelling reason to use features
specific to later versions of Windows, there's no reason for you to not
support such a configuration.

Or, put differently, by sticking with the maximally usable set of Windows
functionality, you gain the ability to market to anyone who has need to
use such a system and likely cannot find a competing compiler that will
work for them. Even if it's a small market, you could be the most
significant player in it, unless there is some compelling benefit to using
the new functionality which simply isn't available on such machines.

You are right, but I have a limited budget.
Supporting win98 needs a system where I can test it, a machine with that
system, time to set it up, time to debug, etc.

I tried last year to setup a system with a virtual machine but the
installation of windows 98 needs a DOS disquette, and I do not have a
floppy any more... It is quite a lot of work really...

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Jan 2 '08 #248

Kenny McCormack

In article <fl**********@aioe.org>, jacob navia <ja***@nospam.orgwrote:
....

>I tried last year to setup a system with a virtual machine but the
installation of windows 98 needs a DOS disquette, and I do not have a
floppy any more... It is quite a lot of work really...

It doesn't (require a DOS disk), and never has. In fact, using VMWare,
you don't even need a CD drive (you can make an ISO file from the CD -
say on another machine) - and then install from the ISO file.

All very off-topic here, but if you're interested in discussing this,
shoot me an email.

Jan 2 '08 #249

Bart C

"Kelsey Bjarnason" <kb********@gmail.comwrote in message
news:9v************@spanky.localhost.net...

[snips]

So what's the real, practical size of /var/log/apache/access.log?
Calculate the file size _now_, you get, say, 100K. Wait half a second,
it's 102K. Wait another second while logrotate reaps it, it's now 0K. So

Taking the size of a rapidly changing file like that is asking for problems.
But they need not be serious. Ask the OS to copy that file to a unique
filename. Then read that new file using any method you like. If there are
discrepancies then they are the OS's fault.

Jacob's exemplar of reading a file into memory is a good example. He uses
whatever function to calculate the size, so far so good. He allocates a
buffer, still all good. So his size recorded is, say, 100K and he
allocated 100K, all's well.

When he actually reads the file, though, and discovers zero bytes to read
- because the file has been rotated and emptied - is this an error, or is
this as expected?

It's a discrepancy: if the file should have been static, then an error can
be raised. If it's known that possible live files could be being read, then
write some different code. Dealing with such files raises some difficulties
but getting rid of simplistic (but normally very useful) file functions
won't help.

How do you plan to determine the "size"
of a file, when that size keeps changing?

How do you do *anything* with the file when it keeps changing? I gave one
idea above.

Determining the size - if you plan to - can be critical. For example,
take a sparse file. One size value says the file is 10GB, another says
it's 200K. One reports "theoretical" size, the other reports actual

Actually I thought files (on Windows for example) were already sparse; if I
create an empty file, write the first byte, then write the ten billionth,
will the OS really fill in all those intermediate blocks?

If the OS is responsible for sparse/compressed files, then I would expect
them to be transparent. It should report the full size. After all it
shouldn't take long to read in non-existent blocks! And it wouldn't do me
much good to have a sparse/compressed file of unknown format in my memory
space.

(Someone said the OS may not know the full size of compressed files. I
would call that a broken OS)

usage. Here's the kicker: if you want to copy this file to another file
system, you _may_ be able to do it onto a file system with 200K free, if
it supports sparse files, or you may require 10GB, if it doesn't. So

OK, so it might need 10GB. This is an OS not a C issue.

This doesn't even address the issue of translation - is that 100K size
that you read still going to be 100K if you want to read the file as a
text file? Or is it only applicable if you read it in binary mode?

Text mode files are a pecularity of C; if there is a filesize() function
then it may need to be told whether the size is wanted in binary or text
mode, and to do the extra work to find out. Ideally one would forget text
mode.

I suggested one such, but it involves reporting not a single answer, but
several: size on disk, size reserved (eg how big a sparse file "actually
is"), size of contained compressed data, size of contained data after
decompression. That's four; perhaps we need to add another four,

The most useful is the number of data bytes seen by the application.
Compressed files, total bytes allocated, that's all OS stuff, of no interest
unless writing the OS, or doing some clever manipulations, then you wouldn't
be using the standard file functions.

Oh, and then there are "forks". Can't forget those; several file systems
include them. Those make things even more interesting. With forks, the

....

There are so many issues to consider, yet some folks want to treat it as
if the only consideration in the world is getting a singular value
representing "the size of the file", without ever stopping to consider -
or worse, writing off as "pedanticism" - the fact that "file size" is a
meaningless concept in all but a few cases.

The world could do with simplifying. Why not have a concept of 'filesize',
then define what it might mean under all your extreme examples?

I don't know what forks are, but if people had been happy dealing with files
their way before they came along, why can't they continue to do so? The
introduction of forks will not break existing code surely?

Whatever benefits 'forks' bestow surely can be reaped without affecting
naive applications that know nothing about them.

>
So explain to us, you - or the others who think this issue is so trivial -
which of the 8 values I mentioned, the ones which don't even deal with
forked files, is "the size of a file".

As mentioned, the one reported by a typical OS on file listings. Some OSs
apparently only know about complete blocks, in the case, the total of all
those blocks. To software that is aware of this, not a problem.

Chances are, the best answer you'll come up with is "size on disk", but

Correct.

I did a little test in Windows: slowly writing file A while, in a command
window, asking the OS to copy A to B. The result: I got a partial copy of A
in B, which represented where it had got to in writing A.

Another test where the copying was done by an appl calling C functions. The
same result. Was this an error? If it was then the OS is in error too.

Using 'naive' file functions like this works 99% of the time when used
sensibly. In a few cases, where there is unexpected/malicious write access
to an appl's support files, they could fail; but the appl would stop anyway.

Perhaps they should be protected from the complexities of modern file
systems instead of simply eliminating them, which is not a solution.

Bart

Jan 3 '08 #250

Similar topics