Programming in standard c

jacob navia

In my "Happy Christmas" message, I proposed a function to read
a file into a RAM buffer and return that buffer or NULL if
the file doesn't exist or some other error is found.

It is interesting to see that the answers to that message prove that
programming exclusively in standard C is completely impossible even
for a small and ridiculously simple program like the one I proposed.

1 I read the file contents in binary mode, what should allow me
to use ftell/fseek to determine the file size.

No objections to this were raised, except of course the obvious
one, if the "file" was some file associated with stdin, for
instance under some unix machine /dev/tty01 or similar...

I did not test for this since it is impossible in standard C:
isatty() is not in the standard.

2) There is NO portable way to determine which characters should be
ignored when transforming a binary file into a text file. One
reader (CB Falconer) proposed to open the file in binary mode
and then in text mode and compare the two buffers to see which
characters were missing... Well, that would be too expensive.

3) I used different values for errno defined by POSIX, but not by
the C standard, that defines only a few. Again, error handling
is not something important to be standardized, according to
the committee. errno is there but its usage is absolutely
not portable at all and goes immediately beyond what standard C
offers.

We hear again and again that this group is about standard C *"ONLY"*.
Could someone here then, tell me how this simple program could be
written in standard C?

This confirms my arguments about the need to improve the quality
of the standard library!

You can't do *anything* in just standard C.
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Dec 26 '07

Subscribe Post Reply

270

9186

<
1
2
3
4
5
>
Last »

user923005

On Dec 27, 9:33*pm, "Ravishankar S" <ravishanka...@in.bosch.com>
wrote:

You can't do *anything* in just standard C.

This is quite true in case of embedded systems. The standard does not seem
to have the notion of
object files, sections, addressing modes and linking.

Those difficulties exist whether or not C is for embedded systems.
Tools like editors, debuggers, and linkers are not discussed in the
standard. However, both the translation environment and the execution
environment are discussed, and so compiling and linking are somehow
implied, at least. This is especially so during the last pass of the
translation environment:

"8. All external object and function references are resolved. Library
components are linked to satisfy external references to functions and
objects not defined in the current translation. All such translator
output is collected into a program image which contains information
needed for execution in its execution environment."

and section:
"5.1.2 Execution environments"
which covers both hosted and freestanding versions.

Dec 28 '07 #101

Richard Heathfield

jacob navia said:

Richard Heathfield wrote:

<snip>

>You might be able to say "I wouldn't be seen dead using
such a stupid system", but other people do have to use systems that you
wouldn't be seen dead using. C does not abandon them.

This is exactly the levelling by the worst.

Of course "C does not abandon them". They have just to
1) open the file
2) Read until they hit...
3) EOF!

Either you are trying hard not to understand, or you forgot to read my
previous reply.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Dec 28 '07 #102

CBFalconer

ym******@gmail.com wrote:

>

.... snip ...

>
One could say that C shouldn't provide any file operations at
all, and let vendors provide extensions for that, since it may
not make sense to "open a file" on some systems. What exactly
is the difference between fopen() and fgetfilesizeforJN()
(except the obvious one, that one is standard, and the other
one is imaginary)? Sure thing, getting a thing like stat() into
the standard would be an incredibly hard task, but it's not the
kind of things you are talking about, is it?

What is wrong with a system that provides stdin, stdout, and
stderr? fopen always returns NULL, as does fclose. You can access
fgetc, getc, fputc, putc. Cuts the library size down like magic.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee, Frohe Weihnachten
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Dec 28 '07 #103

CBFalconer

jacob navia wrote:

>
Look at that:

AlphaServer DS10L Noname pc clone (my machine)
466Mhz CPU EV6 2GHZ Dual core AMD
256MB Memory 2GB memory
30GB IDE 7200RPM Disk 1000 GB disk storage
Dual Serial Port 2 USB
Dual 10/100 Ethernet Dual 100 Ethernet
1 Open PCI Slot 3 PCI slot
Power Cord Power cord, keyboard, mouse
1 Year Island Warranty 1 year warranty

$699 600 Euros

Well, you might buy the DS10L if you want a quick upgrade. You
will probably find it much more reliable. I have a strong
suspicion you can mount almost all the disk drives you want. I
note it has real serial ports, which are missing on many machines.
It probably also has a real floppy drive available.

It is actually cheaper than your machine. Euros are worth about
$1.40 today.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee, Frohe Weihnachten
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Dec 28 '07 #104

Randy Howard

On Thu, 27 Dec 2007 14:21:37 -0600, jacob navia wrote
(in article <fl**********@aioe.org>):

Erik Trulsson wrote:
>jacob navia <ja***@nospam.comwrote:
>>I just can't imagine a file system that doesn't provide a way
of knowing the length of a file.

Your imagination is obviously not very good.

Take for example CP/M.

This system is a good example of a currently widely
used system isn't it?

The solution for that system is very simple.
filesize reports number of blocks times the size of each block.

Period. Of course this is not the size of the useful bytes but...
who cares? CP/M users know that!
>>
Another example would be a file stored on a magnetic tape. There it might
not be
possible to find out how large the file is without reading the file until
you reach an end-of-file marker. I am fairly certain that such devices
are still in use.

so what?

filesize searches till the end of the file is found or if the device is
not supported returns an error.

See also resource forks on mac file systems.

What is obvious is that there are two alternatives:

1) Take the worst possible file system. Then standardize only
what that file system supports.

2) Take the most common situation for current machines and file
systems and standardize what those systems support.

Note that if you choose two, you pretty much guarantee that this
mythical language isn't portable. As that seems to be your end goal in
all these threads, I'm not surprised you propose it.

For instance, and to go on with CP/M you could store the number
of used bytes in each block at the last 2 bytes of each block
isn't it?

Not very difficult to do.

Yeah, let's go modify an OS that's been around for decades, along with
several others of both older and new vintage, along with all code for
them impacted by your changes, so that you can manage to write a
program portably, where others can do so without cracking open OS
internals to get there.

*sigh*

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Dec 28 '07 #105

Randy Howard

On Thu, 27 Dec 2007 15:11:21 -0600, Bart C wrote
(in article <Zx******************@text.news.blueyonder.co.uk>) :

>
"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:f4******************************@bt.com...
>jacob navia said:
>>>>I just can't imagine a file system that doesn't provide a way
of knowing the length of a file.

...

>>>Take for example CP/M.

...

>>I would propose that those obsolete systems aren't even considered.

...
>Fortunately, those people who are responsible for the Standard are less
ready than you to dismiss systems that are in current use.

Someone creates a new OS with a file system having the innovative idea of
storing the byte-size of each file.

The question is, why would they do that? Because nobody would be allowed to
make use of it; not in Standard C anyway. Or other languages because they
are likely also implemented in C.

How then would a C programmer take advantage of this feature even though
it's only available on a billion or so computers?

The same way a C programmer (or any other language programmer takes
advantage of platform-specific extensions which are, by definition, not
portable, and don't appear in any standardized cross-platform languages
as a core interface.

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Dec 28 '07 #106

Randy Howard

On Thu, 27 Dec 2007 16:01:25 -0600, jacob navia wrote
(in article <fl**********@aioe.org>):

Anyway, I always avoided the VAX, and I am glad I did!

I suspect the VAX developer community is similarly glad.
--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Dec 28 '07 #107

Robert Latest

jacob navia wrote:

There is no point in discussing with somebody that doesn't want
to go into the arguments of the other side.

There is no point in discussing a problem that nobody but one person seems
to have, especially if that one person has already solved it for himself...

All multi user OSes provide a "stat" function.

....whereas most others can rely on a proven out-of-the-box solution.

Everybody else can reliably solve the problem (gobble up an entire file in
RAM) in a ten-liner. If that many.

robert

Dec 28 '07 #108

jacob navia

Randy Howard wrote:

On Thu, 27 Dec 2007 14:21:37 -0600, jacob navia wrote
(in article <fl**********@aioe.org>):

>What is obvious is that there are two alternatives:

1) Take the worst possible file system. Then standardize only
what that file system supports.

2) Take the most common situation for current machines and file
systems and standardize what those systems support.

Note that if you choose two, you pretty much guarantee that this
mythical language isn't portable. As that seems to be your end goal in
all these threads, I'm not surprised you propose it.

I am not surprised you say this either. Till now, there hasn't been
ANY system where there wasn't an operation to get the file size.
VAX/VMS included. And if there was one, it could ALWAYS do
1) Open the file
2) Read until EOF

to get the file size. Granted, it wouldn't be efficient in some weird
systems but so what? It would be possible.

>For instance, and to go on with CP/M you could store the number
of used bytes in each block at the last 2 bytes of each block
isn't it?

Not very difficult to do.

Yeah, let's go modify an OS that's been around for decades, along with
several others of both older and new vintage, along with all code for
them impacted by your changes, so that you can manage to write a
program portably, where others can do so without cracking open OS
internals to get there.

Nobody needs to modify the OS. But if those systems support C, they
MUST support

FILE *f = fopen("foo","a+");

And they HAVE to know where the end of the file is somehow. I am
amazed how you and the others just ignore the basic facts.

In C, any file is conceptually a sequence of bytes. Some file systems
do not support this well. But if they support it, THEN they must
ALREDY support this abstraction so that filesize wouldn't mean any effort.

You are making an alternative where only two bad possibilities exist
just to show your peers that you support the party line.

*sigh*

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Dec 28 '07 #109

Randy Howard

On Wed, 26 Dec 2007 16:01:34 -0600, jacob navia wrote
(in article <fk**********@aioe.org>):

Eric Sosman wrote:
>jacob navia wrote:
>>[...]
You can't do *anything* in just standard C.

Then why do you bother with this newsgroup? Why do
you waste your time on a powerless language? Why don't
you go away and become a regular on comp.lang.mumps or
comp.lang.apl or any newsgroup devoted to a language you
consider more useful than C? Since C has zero utility
(in your stated estimation), even comp.lang.cobol would
be a forum of more value. Go! Spend your talent on
something more useful than the torment of us poor old
dinosaurs! Go!

Stop whining and see the sentence in my message:
<quote>
This confirms my arguments about the need to improve the quality
of the standard library!
<end quote>

The solution for me is to improve what is there.

Ok, then. If you are correct, then you should be able to release and
sell a "All the World's an Intel" library for "real C programmers" to
use, along with those that have tried and failed to write anything in
standard C, if they exist. If they exist in the numbers you believe
they do, then you should clean up financially. Stop encouraging others
to compete with you, and go do it.

Hint: Most of us that have been writing portable C programs for ages
already have "kitchen sink" libraries of our own, that do many of the
things that frequently need doing, but are not done the same way on a
dozen or more platforms that we have had to support, still support, or
will need to support in the near future.

Every time I point out something that needs to be improved
the regulars are unable to put any coherent argumentation.

Factually incorrect. Every time you propose something, you propose a
solution that:
a) works only on your pseudo-C compiler and on no other compiler
b) something that is not portable at all, and works on win32, if you're
lucky.
c) is hopelessly naive, failing to recognize the existence of anything
outside of your immediate experience.

That's not very interesting, and lots of coherent arguments have been
put forth by many of the regulars. Your inability to realize it is
/not/ their fault.

There's an old joke, which is sadly not actually funny in your case,
hitting too close to home... "I can explain it to you, but I can't
comprehend it for you."
--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Dec 28 '07 #110

Robert Latest

jacob navia wrote:

Nobody needs to modify the OS. But if those systems support C, they
MUST support

FILE *f = fopen("foo","a+");

And they HAVE to know where the end of the file is somehow. I am
amazed how you and the others just ignore the basic facts.

They also have to support fseek, ftell and friends. What's not to like?

robert

Dec 28 '07 #111

Robert Latest

Randy Howard wrote:

Factually incorrect. Every time you propose something, you propose a
solution that:

a) works only on your pseudo-C compiler and on no other compiler
b) something that is not portable at all, and works on win32, if you're
lucky.
c) is hopelessly naive, failing to recognize the existence of anything
outside of your immediate experience.

d) already exists as a two-liner.

robert

Dec 28 '07 #112

Randy Howard

On Fri, 28 Dec 2007 05:27:18 -0600, jacob navia wrote
(in article <fl**********@aioe.org>):

Randy Howard wrote:
>On Thu, 27 Dec 2007 14:21:37 -0600, jacob navia wrote
(in article <fl**********@aioe.org>):

>>What is obvious is that there are two alternatives:

1) Take the worst possible file system. Then standardize only
what that file system supports.

2) Take the most common situation for current machines and file
systems and standardize what those systems support.

Note that if you choose two, you pretty much guarantee that this
mythical language isn't portable. As that seems to be your end goal in
all these threads, I'm not surprised you propose it.

I am not surprised you say this either. Till now, there hasn't been
ANY system where there wasn't an operation to get the file size.

So you have verified this on every single system?

VAX/VMS included. And if there was one, it could ALWAYS do
1) Open the file
2) Read until EOF

to get the file size. Granted, it wouldn't be efficient in some weird
systems but so what? It would be possible.

You really should look at resource forks, then explain how a single
"answer" for file size would meet the "common" desired answer, and the
technically accurate one at the same time for files using them.

Not all the world sees files as you seem to think it does, hence your
confusion.

>>For instance, and to go on with CP/M you could store the number
of used bytes in each block at the last 2 bytes of each block
isn't it?

Not very difficult to do.

Yeah, let's go modify an OS that's been around for decades, along with
several others of both older and new vintage, along with all code for
them impacted by your changes, so that you can manage to write a
program portably, where others can do so without cracking open OS
internals to get there.

Nobody needs to modify the OS. But if those systems support C, they
MUST support

FILE *f = fopen("foo","a+");

And they HAVE to know where the end of the file is somehow. I am
amazed how you and the others just ignore the basic facts.

What I am /not/ ignoring is your comment above that implies that CP/M
should store its file data differently than it in actuality does.
--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Dec 28 '07 #113

Chris Torek

In article <ea******************************@bt.com>
Malcolm McLean <re*******@btinternet.comwrote:

>/*
function to slurp in an ASCII file
Params: path - path to file
Returns: malloced string containing whole file
*/

I think we can improve this a great deal, with the result being
a function that is written entirely in Standard C and works in
every case in which it is possible for it to work, and -- by
calling a system-dependent function that the user is to supply,
but which may be replaced with a #define that simply returns 0
if desired -- is "reasonably efficient" as well.

>char *loadfile(char *path)
{
FILE *fp;
int ch;
long i = 0;
long size = 0;
char *answer;

fp = fopen(path, "r");
if(!fp)
{
printf("Can't open %s\n", path);
return 0;
}

In a posting I read earlier, Julienne Walker wrote a version
that used a user-supplied "FILE" instead of a name. I think
this is superior, since it allows one to skip over some initial
portion of the file. (It also eliminates the question of what
to do if the file cannot be opened.) So let us do that:

char *loadfile(FILE *fp, size_t *sizep) {
size_t n; /* number of bytes read so far */
size_t space; /* amount of space allocated */
char *buf; /* the buffer we are working with */
char *new; /* for realloc()ing */

Now we come to what I see as the real "point of argument" here.
We would like to get an "estimate" of the size of the file, so that
we can do a single malloc() to hold the contents. Of course at
this particular point, we might like to subtract any initial offset
as well -- but we do not know how to convert the result of ftell()
or fgetpos() into such a number, so I will just proceed as if the
"initially-skipped count" is always zero. (If loadfile() is to
open the file, this is correct; if loadfile takes an already-open
file as above, we could always add a "skipped bytes count" argument.)

Here is where the system-dependent function comes in:

size_t estimate = estimate_file_size(fp);

The "estimate_file_size" function can use fstat() (on POSIX
systems), or "SYS$GETFILEMETADATA" on some other system, or
we can just do:

#define estimate_file_size(fp) 0

because the result is only assumed to be an *estimate*, rather than
an exact answer. As a nice bonus, this means that, e.g., on POSIX
systems, where fstat() returns a handy exact answer that is entirely
wrong if the file is being modified as we read it, the code still
works.

answer = malloc(size + 100);
if(!answer)
{
printf("Out of memory\n");
fclose(fp);
return 0;
}

Instead of adding 100, we just use the estimate (plus 1 for the
'\0'). In case the estimate is zero, though, we use 1 (plus the
same 1).

To indicate failure, we return NULL (as you do here) but
do not print an error message, since general-purpose library routines
usually should not do so (the error may need to be logged rather
than printed, for instance):

if (estimate == 0)
estimate = 1;
space = estimate;
buf = malloc(space + 1);
if (buf == NULL)
return NULL;

(Since we eliminated the issue of opening the file, a NULL return
always means "unable to read", either due to an I/O error or due
to malloc() failure.)

while( (ch = fgetc(fp)) != EOF)
answer[i++] = ch;

Now we get to the part that deals with the fact that the estimate
is merely an estimate:

for (n = 0;;) {
size_t nsuccess, nattempt;
int c;

/*
* Attempt to fill in the rest of the buffer.
* We have read "n" bytes so far and we have
* space for "space" bytes (plus 1 extra, not
* counted in "space").
*
* If the read fails, we get a short count or
* zero. A short count could indicate normal EOF
* or an I/O error, so we must use feof() and
* ferror() to tell them apart.
*
* If we get everything we asked for, we hope
* that we are now at EOF, but we may not be,
* so check.
*/
nattempt = space - n;
nsuccess = fread(buf + n, 1, nattempt, fp);
n += nsuccess; /* we now have read this many */

if (nsuccess < nattempt) /* normal EOF or I/O error */
break;
c = getc(fp);
if (c == EOF) /* normal EOF or I/O error */
break;

/*
* Our estimate must have been too low. We actually
* have room to save c, so do that now, then enlarge
* the buffer and try again.
*/
buf[n] = c;
estimate *= 2; /* or any other suitable increase */
new = malloc(estimate + 1);
if (new == NULL) {
free(buf);
return NULL;
}
buf = new;
space = estimate;
}

We exit the above loop only on normal EOF or error (or, in a sense,
if the malloc() fails, but in that case we return to the caller,
rather than ending the loop). So now we check which is the case:

/*
* It might make for better flow to move the ferror() test
* here and let the non-ferror(), i.e., feof(), case be the
* last code in the function. I thought it was interesting
* to use feof() correctly in comp.lang.c for once, though. :-)
*/
if (feof(fp)) {
/*
* All is well -- we successfully read the entire file.
* Optionally, we can realloc down here:

new = realloc(buf, n + 1);
if (new != NULL)
buf = new;

* This usually saves a few bytes when the estimate is poor,
* and always saves one byte when the estimate was 0
* (including for actually-empty files), but tends to
* cost runtime. (Of course, it would also make sense
* to see if n differs from the estimate first.)
*/
buf[n] = '\0';
if (sizep != NULL)
*sizep = n;
return buf;
}

/*
* Since the loop terminated, but feof(fp) was not set, we
* must have had some kind of error (bad floppy disk?) while
* reading the file. Here, I choose to discard the data read
* so far and return NULL, but there may be reasons to do
* other things. In general, however, error recovery is
* extremely system-dependent.
*/
free(buf);
return NULL;
}

At the risk of redundancy, here is the complete function, with
a leading comment added, and the extensive internal commenting
shrunken down. I also removed the "estimate" variable as it is
essentially the same as the "space" variable.

The following is entirely untested. :-)

#include <stdio.h>
#include <stdlib.h>

/*
* Load from an existing opened file into memory, adding a
* terminating '\0' to make the result a valid C string. If
* sizep is non-NULL, set it to the number of bytes loaded
* (not including the terminating '\0').
*
* Returns NULL on failure, in which case *sizep is not useful.
*/
char *loadfile(FILE *fp, size_t *sizep) {
size_t n; /* number of bytes read so far */
size_t space; /* amount of space allocated */
char *buf; /* the buffer we are working with */
char *new; /* for realloc()ing */

space = estimate_file_size(fp);
if (space == 0)
space = 1; /* must attempt to read, even if empty file */
buf = malloc(space + 1);
if (buf == NULL)
return NULL;

/*
* If the estimate is 100% accurate or over-estimates,
* this loop runs only once. (If the estimate *is*
* accurate and all goes well, the getc() returns EOF.)
*/
for (n = 0;;) {
size_t nsuccess, nattempt;
int c;

/*
* Attempt to fill in the rest of the buffer. Note that
* fread() returns a short count or 0 on EOF or error.
*/
nattempt = space - n;
nsuccess = fread(buf + n, 1, nattempt, fp);
n += nsuccess;

/*
* Terminate loop on EOF or error. If the estimate was
* right, we have to attempt one more byte to see the EOF.
*/
if (nsuccess < nattempt || (c = getc(fp)) == EOF)
break;

buf[n] = c; /* under-estimated -- save c and expand */
space *= 2;
new = malloc(space + 1);
if (new == NULL) {
free(buf);
return NULL;
}
buf = new;
}

if (ferror(fp)) {
/* I/O error -- not dealt with very well here. */
free(buf);
return NULL;
}

/* Loop ended, and not due to error, so must be normal EOF. */
#ifdef OPTIONAL
if (n < space) {
new = realloc(buf, n + 1);
if (new != NULL)
buf = new;
}
#endif
buf[n] = '\0';
if (sizep != NULL)
*sizep = n;
return buf;
}
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Dec 28 '07 #114

Bart C

"Chris Torek" <no****@torek.netwrote in message
news:fl********@news4.newsguy.com...

In article <ea******************************@bt.com>
Malcolm McLean <re*******@btinternet.comwrote:
>>/*
function to slurp in an ASCII file
Params: path - path to file
Returns: malloced string containing whole file
*/

I think we can improve this a great deal, with the result being
a function that is written entirely in Standard C and works in
every case in which it is possible for it to work, and -- by
calling a system-dependent function that the user is to supply,
but which may be replaced with a #define that simply returns 0
if desired -- is "reasonably efficient" as well.

....

At the risk of redundancy, here is the complete function, with

....

char *loadfile(FILE *fp, size_t *sizep) {

.....

Of course in practice one would write:

char *loadfile(FILE *fp, size_t *sizep) {
if (thisiswindows) /* or other capable OS */
/*do it in a dozen lines */
else
/*do it the hard way*/
}

The return value in *sizep is still a little worrying because according to
user923005 this information is useless, if it's assumed to bear any relation
to the size of the file just read, because that size could change any time.

Bart

Dec 28 '07 #115

Richard Heathfield

Bart C said:

<snip>

The return value in *sizep is still a little worrying because according
to user923005 this information is useless, if it's assumed to bear any
relation to the size of the file just read, because that size could
change any time.

No, it's useful information because it represents the amount of data
actually read from the file into main storage. Presumably, one reads the
data from file for a reason - either to manipulate it or at least to
enquire it. Either way, one will need to know how much data there is to
manipulate (or enquire).

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Dec 28 '07 #116

Eric Sosman

Richard wrote:

user923005 <dc*****@connx.comwrites:
>[...]
a generic function cannot produce a reliable answer. I guess that
other people have thought about this question and figured out that a
reliable file size function cannot be written in a generic manner.

Rubbish. All files have a byte size.

How many bytes does /dev/tty hold? How about /dev/zero?
/dev/ptyp0? /dev/random? /dev/poll? CON:? LPT:?

>Noone has ever said this. A portable file size function that gives a
correct answer is clearly impossible.

How silly.

"correct" would be platform specific. The API need not be.

One could assign an arbitrary pseudo-size -- minus one,
perhaps -- to the troublesome files, and then declare that
this pseudo-size was the "correct by decree" size of the
file. However, the value would then be useless for the
purpose to which Jacob's "Happy christmas" code wanted: to
determine how many bytes to allocate to hold the file's
content.

And even for an ordinary file whose length can be known,
many systems permit the length to change between the moment
it's queried and the moment the result is used. How many bytes
should you allocate for an in-memory copy of /var/adm/messages?

--
Eric Sosman
es*****@ieee-dot-org.invalid

Dec 28 '07 #117

Bart C

"Eric Sosman" <es*****@ieee-dot-org.invalidwrote in message
news:kM******************************@comcast.com. ..

Richard wrote:

>Rubbish. All files have a byte size.

How many bytes does /dev/tty hold? How about /dev/zero?
/dev/ptyp0? /dev/random? /dev/poll? CON:? LPT:?

....

And even for an ordinary file whose length can be known,
many systems permit the length to change between the moment
it's queried and the moment the result is used. How many bytes
should you allocate for an in-memory copy of /var/adm/messages?

These are all very interesting examples which should be kept in mind when
writing mission-critical code or code for life-support systems.

But there is a distinct class of well-behaved files (input files of a
compiler for example), which are unlikely to be huge and unlikely to change.
For a lot of applications and their files, this will be the case.

With shared files on multi-user/multi-process systems there can be pitfalls,
but size of the file suddenly changing would be the least of the problems.

I don't know how to deal with /var/adm/messages or similar. Suppose I read
byte-by-byte as recommended then someone updates the beginning of the file?
Maybe it's foolhardy to even attempt making a copy of such a file. I'm not
allowed to lock it because that's also frowned upon. What exactly can one do
with such a file?
Bart

Dec 28 '07 #118

Robert Latest

Bart C wrote:

I don't know how to deal with /var/adm/messages or similar. Suppose I read
byte-by-byte as recommended then someone updates the beginning of the file?
Maybe it's foolhardy to even attempt making a copy of such a file. I'm not
allowed to lock it because that's also frowned upon. What exactly can one do
with such a file?

You need to KNOW what you can do with a file before starting to do stuff
with it. That's an excellent reason why a generic gobble-into-RAM or
filesize function just doesn't make any sense.

robert

Dec 28 '07 #119

Ben Bacarisse

Chris Torek <no****@torek.netwrites:

In article <ea******************************@bt.com>
Malcolm McLean <re*******@btinternet.comwrote:
>>/*
function to slurp in an ASCII file
Params: path - path to file
Returns: malloced string containing whole file
*/

<snip>

/*
* Our estimate must have been too low. We actually
* have room to save c, so do that now, then enlarge
* the buffer and try again.
*/
buf[n] = c;
estimate *= 2; /* or any other suitable increase */
new = malloc(estimate + 1);

Presumably you meant realloc here?

Nice example of hiving off the minimum of system specific code, BTW.

--
Ben.

Dec 28 '07 #120

dj3vande

In article <47***************@yahoo.com>,
CBFalconer <cb********@maineline.netwrote:

>Remember that ALL computers act just like Winders. Anything else
is not to be considered.

Interestingly, my current hobby project is sidetracked while I figure
out how to deal with multiple incompatible implementations[1] of the
Windows API.
So even when all the computers in your universe act just like Winderz,
a fanatical devotion to using portable constructs where you can and
cleanly separating the non-portable constructs that you find you must
use is still a net benefit.

ObTopic: Dealing with this has been made a lot easier by having written
big chunks of the code in portable standard C; when I find something in
the system interface that's broken I can ignore those parts entirely
and have much less code to worry about fixing.
dave

[1] "Native" Win32, Wine on Linux, and Wine on MacOS. I've run into at
least one not-all-that-weird construct that's broken on one of
those and works sensibly (but differently) on the other two.

Dec 28 '07 #121

user923005

On Dec 28, 4:32*am, Chris Torek <nos...@torek.netwrote:

In article <eaednRrgxJMrC-7anZ2dnUVZ8h2dn...@bt.com>

Malcolm McLean <regniz...@btinternet.comwrote:
/*
*function to slurp in an ASCII file
*Params: path - path to file
*Returns: malloced string containing whole file
*/

I think we can improve this a great deal, with the result being
a function that is written entirely in Standard C and works in
every case in which it is possible for it to work, and -- by
calling a system-dependent function that the user is to supply,
but which may be replaced with a #define that simply returns 0
if desired -- is "reasonably efficient" as well.

[snip]

I work for a database company, and most of our customers are large
customers. (E.g. huge US company, large university, government of
country x, etc.)

It is not at all unusual for a single file to be 20-100 GB. Needless
to say, you would not want to put this file into memory even if you
could do it.
The files I work with are also being modified constantly (though there
are occasionally windows of inactivity for some of them).
I am quite sure that the goal of reliably reading these sorts of files
into memory and doing something useful with them is literally
infeasible (not impossible, but the cost would make it so stupid that
nobody would want to do it).

I think that usually, unless you lock the entire file, reading an
entire file into memory is begging for trouble. Perhaps the
probability of the file being extended while you have it open is less
than 0.1%. But would you rely on a strcpy() that was only 99.9%
reliable? Similarly, someone modifying the contents in the middle
while you have it in memory might have one chance in 10,000. But that
one instance can spell disaster. Now, some kinds of files beg to be
mapped into memory (e.g. executable files). So most operating systems
will have a provision for this. The memory mapping APIs will have
already worked some of the the issues out that arise when trying to
read files into memory (e.g. 'What if it won't fit?'). For many file
types it makes literally no sense to try to map them into memory.

I think it can be possible to do what ACE (by D. Schmidt) has done
portably across a huge number of systems (because *cough* they have
already done it). This includes highly portable memory mapping,
threads, semaphores, etc.. What they have accomplished could also be
accomplished in C (or even if it is too difficult, it could be done
via a shared library to that existing project). But memory mapping is
more than reading a file into memory, and it only makes sense for
files that we know are totally static or which we have total control
over (single user).

My opinion is that Jacob chose what is probably one of the most
difficult possible projects to ridicule what is possible in standard
C, and also that he probably knew it before hand. He acts like a
dummy, but he isn't one.

Dec 28 '07 #122

jacob navia

user923005 wrote:

>
I work for a database company, and most of our customers are large
customers. (E.g. huge US company, large university, government of
country x, etc.)

It is not at all unusual for a single file to be 20-100 GB. Needless
to say, you would not want to put this file into memory even if you
could do it.

Great!

So don't do it!

Nobody forces you to do that. The usage of all functions in standard
library is OPTIONAL, you are NOT forced to use any of them.

For instance for those files fseek or ftell will not work if a long
is just 32 bits.

Your requirements are unusual I would say. Many applications use
configuration files, or other data files of size of a few K or at
most 1MB. In those situations a portable function to read the
whole file into memory would be useful.

The files I work with are also being modified constantly (though there
are occasionally windows of inactivity for some of them).

In many applications, the data files can be considered static, i.e.
they do not change during the applications lifetime. This is specially
true of configuration files, C source files, object files, makefiles,
and all developer related files.

I am quite sure that the goal of reliably reading these sorts of files
into memory and doing something useful with them is literally
infeasible (not impossible, but the cost would make it so stupid that
nobody would want to do it).

For 20GB files this would be a bit difficult with today's memory sizes.
For 100GB files this makes no sense at all.

I think that usually, unless you lock the entire file, reading an
entire file into memory is begging for trouble.

It depends. Not everyone is working with the files you are working.
You say very often that "not all the world is a windows PC". Well,
not everybody is working with such huge files all the time isn't it?

And even YOUR application, I would bet that it has some configuration
files it reads at startup. Those aren't 100GB files but probably
20K-100K files.

Perhaps the
probability of the file being extended while you have it open is less
than 0.1%. But would you rely on a strcpy() that was only 99.9%
reliable? Similarly, someone modifying the contents in the middle
while you have it in memory might have one chance in 10,000. But that
one instance can spell disaster. Now, some kinds of files beg to be
mapped into memory (e.g. executable files). So most operating systems
will have a provision for this. The memory mapping APIs will have
already worked some of the the issues out that arise when trying to
read files into memory (e.g. 'What if it won't fit?'). For many file
types it makes literally no sense to try to map them into memory.

The problem with memory mapping is that it is very system specific.
A single STANDARD function that would do that in all supported
platforms would be quite useful. Normally you are not so interested
in performance for those relatively small files, but it would be nice
if you wouldn't have to write it again in each system you go to.

I think it can be possible to do what ACE (by D. Schmidt) has done
portably across a huge number of systems (because *cough* they have
already done it). This includes highly portable memory mapping,
threads, semaphores, etc.. What they have accomplished could also be
accomplished in C (or even if it is too difficult, it could be done
via a shared library to that existing project). But memory mapping is
more than reading a file into memory, and it only makes sense for
files that we know are totally static or which we have total control
over (single user).

My opinion is that Jacob chose what is probably one of the most
difficult possible projects to ridicule what is possible in standard
C, and also that he probably knew it before hand. He acts like a
dummy, but he isn't one.

No, I just wanted to make a point about the difficulty of using
standard C for a simple task like reading a file into memory.

And forward some solutions, like the standard error handling, or
a filesize() function.
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Dec 28 '07 #123

jacob navia

Chris Torek wrote:

[snip]

This is possible in standard C.

You are forced to read character by character of data, until you reach
the end of the file. This is maybe ok (disk I/O could be the limiting
factor here) but ignores the problem I tried to solve concerning
abstracting the text/binary difference.

If we had a function called filsize() in the standard (one of my main
complaints) your program could be written in a few lines, and read
all the size of the file into memory.

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Dec 28 '07 #124

Serve La

"user923005" <dc*****@connx.comschreef in bericht
news:98**********************************@l1g2000h sa.googlegroups.com...
On Dec 27, 2:01 pm, jacob navia <ja...@nospam.comwrote:

user923005 wrote:
Here is a machine (HP RX/1600 Itanium running OpenVMS) that reports
its file sizes in blocks.

[big snip]

Total of 28 files, 4404 blocks.
Next Cmd:

each of those file sizes is 512 byte increments. You can buy a new
one today from HP, if you want.
http://h71000.www7.hp.com/openvms/hw_supportchart.html

If you have one of that systems and your trash is full so you can't get
rid of it, then you can do the following

http://h71000.www7.hp.com/wizard/wiz_5424.html
Obtaining RMS file size?
Hello Mr Wizard

I am looking for a method of obtaining the size of a file, NOT the
allocation
from within a Fortran routine. I thought this would be a simple
trivial task
but alas I am proved wrong.

Can you help.

Regards Jim
Answer
RMS does NOT keep track of the number of user data bytes in a file.
The only reliable way to obtain that, is to read the file and count!

So there you have the answer!

If you are using that system expect that filesize() will take some time.

So What?

Should we ALL suffer because some brain-dead system exists somewhere?

remove() is in the standard while there are lots of systems without a
filesystem and where remove doesnt have any meaning. I happen to work on on
such a system now but I dont know what remove does because I just dont need
it and never checked. I'm guessing it returns -1
So why cant a function like filesize() be added? On systems where its
meaningless it could return an error and the programmer can take over with
the byte by byte read or whatever works for him when filesize returned an
error. Whats the logic for adding remove and not adding filesize?

Dec 28 '07 #125

Ben Pfaff

user923005 <dc*****@connx.comwrites:

I work for a database company, and most of our customers are large
customers. (E.g. huge US company, large university, government of
country x, etc.)

It is not at all unusual for a single file to be 20-100 GB. Needless
to say, you would not want to put this file into memory even if you
could do it.

Sometimes you do. For some time, I worked for a company that had
a gigantic Perforce repository[*]. Every developer made heavy
use of this repository. Unfortunately, Perforce doesn't scale to
anything that big or that busy. The solution turned out to be to
put 128 GB of RAM in the Perforce server. Then the whole
database was cached. Performance was then tolerable, if still
not all that great.
[*] Perforce is a version control system.
--
"Programmers have the right to be ignorant of many details of your code
and still make reasonable changes."
--Kernighan and Plauger, _Software Tools_

Dec 28 '07 #126

Serve La

"Eric Sosman" <es*****@ieee-dot-org.invalidschreef in bericht
news:kM******************************@comcast.com. ..

Richard wrote:
>user923005 <dc*****@connx.comwrites:
>>[...]
a generic function cannot produce a reliable answer. I guess that
other people have thought about this question and figured out that a
reliable file size function cannot be written in a generic manner.

Rubbish. All files have a byte size.

How many bytes does /dev/tty hold? How about /dev/zero?
/dev/ptyp0? /dev/random? /dev/poll? CON:? LPT:?

Programmers should be able to understand that doing filesize("LPT1:") or
something would be silly and an incorrect value would be returned or rather
an errorcode. That there are some files of which the size cant be determined
should not be a reason to not add such a function to the standard. I dont
expect remove() to work on *every* file that exists on a system

Dec 28 '07 #127

user923005

On Dec 28, 1:01*pm, "Serve La" <n...@hao.comwrote:

"user923005" <dcor...@connx.comschreef in berichtnews:98**********************************@l 1g2000hsa.googlegroups.com...
On Dec 27, 2:01 pm, jacob navia <ja...@nospam.comwrote:

user923005 wrote:
Here is a machine (HP RX/1600 Itanium running OpenVMS) that reports
its file sizes in blocks.

[big snip]

Total of 28 files, 4404 blocks.
Next Cmd:

each of those file sizes is 512 byte increments. You can buy a new
one today from HP, if you want.
>http://h71000.www7.hp.com/openvms/hw_supportchart.html

If you have one of that systems and your trash is full so you can't get
rid of it, then you can do the following

http://h71000.www7.hp.com/wizard/wiz_5424.html
Obtaining RMS file size?
Hello Mr Wizard

I am looking for a method of obtaining the size of a file, NOT the
allocation
from within a Fortran routine. I thought this would be a simple
trivial task
but alas I am proved wrong.

Can you help.

Regards Jim
Answer
RMS does NOT keep track of the number of user data bytes in a file.
The only reliable way to obtain that, is to read the file and count!

So there you have the answer!

If you are using that system expect that filesize() will take some time.

So What?

Should we ALL suffer because some brain-dead system exists somewhere?

remove() is in the standard while there are lots of systems without a
filesystem and where remove doesnt have any meaning. I happen to work on on
such a system now but I dont know what remove does because I just dont need
it and never checked. I'm guessing it returns -1
So why cant a function like filesize() be added? On systems where its
meaningless it could return an error and the programmer can take over with
the byte by byte read or whatever works for him when filesize returned an
error. Whats the logic for adding remove and not adding filesize?

Because filesize can never be anything but an estimate {in the generic
case}. For special types of files, it has meaning. For other file
types it is begging for trouble.

Back to the original question (reading a file into memory), for
systems where this is sensible to do, there is going to exist a memory
map method. Memory mapping can be made relatively portable. That is
a far more sensible solution than simply trying to read a file into
memory (note: memory mapping does a lot more than just reading a file
into memory).

I think that adding a function to the standard library that produces
an unreliable answer has limited utility. The cases where we need the
absolute number will also require lots of guarantees about the file
properties. It is starting to smell operating system specific, n'est
ce pas? For example, someone might use that number to try to read the
file into memory. That is clearly a mistake. Instead, they should
memory map the file if their system supports it.

I guess that filesize() is not in the standard because the C language
implementors thought about it and realized that people might try to
use it. Then, they will have thousands of support calls to handle
when things go wrong.

On the other hand, I think that there was a call somewhere in this
thread for fileinformation() which I think might be a very nice
addition. I guess that even filesize() might not be so bad if they
renamed it currentfilesize(). Then (at least) it would be obvious
that it only contains an estimate.

Dec 28 '07 #128

Bart C

"Eric Sosman" <es*****@ieee-dot-org.invalidwrote in message
news:jq******************************@comcast.com. ..

Bart C wrote:
>[...]
I don't know how to deal with /var/adm/messages or similar. Suppose I
read byte-by-byte as recommended then someone updates the beginning of
the file? Maybe it's foolhardy to even attempt making a copy of such a
file. I'm not allowed to lock it because that's also frowned upon. What
exactly can one do with such a file?

There are at least two problems to be faced when dealing
with /var/adm/messages. First, the file grows as the system

Well, we can class that sort of file as Special and deal with it With Care.
Or pehaps choose not to deal with it at all. The point is most input files
(it is mostly input ones you want to read!) are known to be well-behaved (eg
config files/support files of an application) and can be dealt with less
strictly.

If we have to worry about multi-process access than we may have to consider
our entire application may suddenly disappear in a puff of smoke. It's best
not to worry about it.

Personally, I don't think much of the read-it-all-in approach
to handling data that's in files. That's because I'm from the
Old School, and learned my craft in the days when memory was
scarce and expensive.

Same here.

>You
read the enormous CAD model and build data structures while reading
it; all the data winds up in memory, but you never have or need
an "image" of the entire file as it looks on disk.

Yes, I've written CAD-type software for 8-bit and MSDOS. That was fun.

Now, however, you've got 1GB RAM or whatever sitting there and you need to
do *something* with it. I'm talking about at least desktop PCs of course.

Bart

Dec 28 '07 #129

Flash Gordon

jacob navia wrote, On 28/12/07 20:50:

user923005 wrote:
>>
I work for a database company, and most of our customers are large
customers. (E.g. huge US company, large university, government of
country x, etc.)

It is not at all unusual for a single file to be 20-100 GB. Needless
to say, you would not want to put this file into memory even if you
could do it.

Great!

So don't do it!

Nobody forces you to do that. The usage of all functions in standard
library is OPTIONAL, you are NOT forced to use any of them.

True.

For instance for those files fseek or ftell will not work if a long
is just 32 bits.

This is why C provides other functions. It is also why systems that can
do it often provide other methods.

Your requirements are unusual I would say. Many applications use
configuration files, or other data files of size of a few K or at
most 1MB. In those situations a portable function to read the
whole file into memory would be useful.

I can't say I've needed (or wanted) to do it even with small files.

>The files I work with are also being modified constantly (though there
are occasionally windows of inactivity for some of them).

In many applications, the data files can be considered static, i.e.
they do not change during the applications lifetime. This is specially
true of configuration files, C source files, object files, makefiles,
and all developer related files.

I would not go as far as all developer related files, but some files
yes. However I would actually do the first pass on files in to an
internal format whilst reading them normally rather than reading them
and then passing them.

>I am quite sure that the goal of reliably reading these sorts of files
into memory and doing something useful with them is literally
infeasible (not impossible, but the cost would make it so stupid that
nobody would want to do it).

For 20GB files this would be a bit difficult with today's memory sizes.
For 100GB files this makes no sense at all.

>I think that usually, unless you lock the entire file, reading an
entire file into memory is begging for trouble.

It depends. Not everyone is working with the files you are working.
You say very often that "not all the world is a windows PC". Well,
not everybody is working with such huge files all the time isn't it?

And even YOUR application, I would bet that it has some configuration
files it reads at startup. Those aren't 100GB files but probably
20K-100K files.

I've never needed to read an entire configuration file in to memory at
one time. I pass it as I read it since what I need in memory is *not*
the file but the information it provides.

<snip>

>My opinion is that Jacob chose what is probably one of the most
difficult possible projects to ridicule what is possible in standard
C, and also that he probably knew it before hand. He acts like a
dummy, but he isn't one.

No, I just wanted to make a point about the difficulty of using
standard C for a simple task like reading a file into memory.

It is only difficult to do it with the method you used. Doing it growing
a buffer as required is easy. If you want to do something easily in
standard C you have to actually try to do it in the ways standard C allows.

And forward some solutions, like the standard error handling, or
a filesize() function.

Some of which may actually be nice, but since you put forward your
suggestions by attacking everyone and standard C you should not be
surprised by lack of sympathy for your position. Also claiming that any
systems where it is a problem to do what you want do not matter does not
help your case.
--
Flash Gordon

Dec 28 '07 #130

user923005

On Dec 28, 1:45*pm, "Bart C" <b...@freeuk.comwrote:

"Eric Sosman" <esos...@ieee-dot-org.invalidwrote in message

news:jq******************************@comcast.com. ..

Bart C wrote:
[...]
I don't know how to deal with /var/adm/messages or similar. Suppose I
read byte-by-byte as recommended then someone updates the beginning of
the file? Maybe it's foolhardy to even attempt making a copy of such a
file. I'm not allowed to lock it because that's also frowned upon. What
exactly can one do with such a file?

* * There are at least two problems to be faced when dealing
with /var/adm/messages. *First, the file grows as the system

Well, we can class that sort of file as Special and deal with it With Care..
Or pehaps choose not to deal with it at all. The point is most input files
(it is mostly input ones you want to read!) are known to be well-behaved (eg
config files/support files of an application) and can be dealt with less
strictly.

If we have to worry about multi-process access than we may have to consider
our entire application may suddenly disappear in a puff of smoke. It's best
not to worry about it.

* * Personally, I don't think much of the read-it-all-in approach
to handling data that's in files. *That's because I'm from the
Old School, and learned my craft in the days when memory was
scarce and expensive.

Same here.

You
read the enormous CAD model and build data structures while reading
it; all the data winds up in memory, but you never have or need
an "image" of the entire file as it looks on disk.

Yes, I've written CAD-type software for 8-bit and MSDOS. That was fun.

Now, however, you've got 1GB RAM or whatever sitting there and you need to
do *something* with it. I'm talking about at least desktop PCs of course.

Contrast the:
1. Find file size
2. Allocate that size
3. Read file into RAM
4. Modify file
5. Write file to disk

Approach to:
1 Use memory mapping.

Seem identical? Now imagine if the file is on a shared drive or if
the file won't fit into physical RAM (or would consume more RAM than
desired).

Dec 28 '07 #131

user923005

On Dec 28, 1:05*pm, Ben Pfaff <b...@cs.stanford.eduwrote:

user923005 <dcor...@connx.comwrites:
I work for a database company, and most of our customers are large
customers. *(E.g. huge US company, large university, government of
country x, etc.)

It is not at all unusual for a single file to be 20-100 GB. *Needless
to say, you would not want to put this file into memory even if you
could do it.

Sometimes you do. *For some time, I worked for a company that had
a gigantic Perforce repository[*]. *Every developer made heavy
use of this repository. *Unfortunately, Perforce doesn't scale to
anything that big or that busy. *The solution turned out to be to
put 128 GB of RAM in the Perforce server. *Then the whole
database was cached. *Performance was then tolerable, if still
not all that great.
[*] Perforce is a version control system.

Tell me, was the file read into a fixed memory buffer or memory
mapped?
Actually, since Perforce uses a database, the answer is obvious.

Dec 28 '07 #132

dj3vande

In article <eb**************************@cache3.tilbu1.nb.hom e.nl>,
Serve La <ni@hao.comwrote:

>
"Eric Sosman" <es*****@ieee-dot-org.invalidschreef in bericht
news:kM******************************@comcast.com ...
>Richard wrote:

>>Rubbish. All files have a byte size.

How many bytes does /dev/tty hold? How about /dev/zero?
/dev/ptyp0? /dev/random? /dev/poll? CON:? LPT:?

Programmers should be able to understand that doing filesize("LPT1:") or
something would be silly and an incorrect value would be returned or rather
an errorcode.

--------
int write_log_entry(const char *filename,struct log_entry *msg)
{
FILE *logfile;
int ret;
int sz;

sz=filesize(filename);
if(sz<0)
return sz;
if(sz >= globals.max_logfile_size)
rotate_logfiles();

logfile=fopen(filename,"a");
ret=log_to_stdio_stream(logfile,msg);
fclose(logfile);
return ret;
}
--------
Now what happens when somebody decides to log to the printer? Does it
still sound silly to be passing "LPT1" to filesize?
dave
(maintains code that does something remarkably similar to this)

Dec 28 '07 #133

Eric Sosman

jacob navia wrote:

Chris Torek wrote:

[snip]

This is possible in standard C.

You are forced to read character by character of data, until you reach
the end of the file. This is maybe ok (disk I/O could be the limiting
factor here) but ignores the problem I tried to solve concerning
abstracting the text/binary difference.

... a difference the Standard C library already handles,
and handles better.

--
Eric Sosman
es*****@ieee-dot-org.invalid

Dec 28 '07 #134

CBFalconer

Chris Torek wrote:

>

.... snip ...

>
/* Loop ended, and not due to error, so must be normal EOF. */
#ifdef OPTIONAL
if (n < space) {
new = realloc(buf, n + 1);
if (new != NULL)
buf = new;
}
#endif
buf[n] = '\0';
if (sizep != NULL)
*sizep = n;
return buf;
}

Minor error. If the realloc fails, buf[n] = '\0' will write beyond
the buffer. Try:

if (n >= space) buf[n] = '\0';
else if (new = realloc(buf, n + 1)) {
buf = new; buf[n] = '\0';
}
else buf[n - 1] = '\0'; /* minor loss */

if (sizep) *sizep = n;
return buf;
}

I haven't bothered to examine the meaning of OPTIONAL, which may
make a difference. However this code makes me suspicious of the
relationship between n and space.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee, Frohe Weihnachten
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Dec 28 '07 #135

Flash Gordon

Serve La wrote, On 28/12/07 21:01:

>
"user923005" <dc*****@connx.comschreef in bericht
news:98**********************************@l1g2000h sa.googlegroups.com...
On Dec 27, 2:01 pm, jacob navia <ja...@nospam.comwrote:
>user923005 wrote:
Here is a machine (HP RX/1600 Itanium running OpenVMS) that reports
its file sizes in blocks.

[big snip]

Total of 28 files, 4404 blocks.
Next Cmd:

each of those file sizes is 512 byte increments. You can buy a new
one today from HP, if you want.
http://h71000.www7.hp.com/openvms/hw_supportchart.html

If you have one of that systems and your trash is full so you can't get
rid of it, then you can do the following

http://h71000.www7.hp.com/wizard/wiz_5424.html
Obtaining RMS file size?
Hello Mr Wizard

I am looking for a method of obtaining the size of a file, NOT the
allocation
from within a Fortran routine. I thought this would be a simple
trivial task
but alas I am proved wrong.

Can you help.

Regards Jim
Answer
RMS does NOT keep track of the number of user data bytes in a file.
The only reliable way to obtain that, is to read the file and count!

So there you have the answer!

If you are using that system expect that filesize() will take some time.

So What?

Should we ALL suffer because some brain-dead system exists somewhere?

remove() is in the standard while there are lots of systems without a
filesystem and where remove doesnt have any meaning. I happen to work on
on such a system now but I dont know what remove does because I just
dont need it and never checked.

All such systems I've come across are freestanding implementations, a
class which is not required to provide most of the standard C library.
If there is a hosted implementation without anything like a file system
and where remove() makes no sense I would be surprised.

I'm guessing it returns -1
So why cant a function like filesize() be added? On systems where its
meaningless it could return an error and the programmer can take over
with the byte by byte read or whatever works for him when filesize
returned an error. Whats the logic for adding remove and not adding
filesize?

One reason is probably existing practice when C was standardised.
Another is that if there is a file system remove() is *probably* easy to
implement, but filesize() is *definitely* problematic on some systems,
even to the point of defining what you mean by filesize being an
interesting problem.
--
Flash Gordon

Dec 28 '07 #136

Serve La

<dj******@csclub.uwaterloo.ca.invalidschreef in bericht
news:fl**********@rumours.uwaterloo.ca...

In article <eb**************************@cache3.tilbu1.nb.hom e.nl>,
Serve La <ni@hao.comwrote:
>>
"Eric Sosman" <es*****@ieee-dot-org.invalidschreef in bericht
news:kM******************************@comcast.co m...
>>Richard wrote:

>>>Rubbish. All files have a byte size.

How many bytes does /dev/tty hold? How about /dev/zero?
/dev/ptyp0? /dev/random? /dev/poll? CON:? LPT:?

Programmers should be able to understand that doing filesize("LPT1:") or
something would be silly and an incorrect value would be returned or
rather
an errorcode.

--------
int write_log_entry(const char *filename,struct log_entry *msg)
{
FILE *logfile;
int ret;
int sz;

sz=filesize(filename);
if(sz<0)
return sz;
if(sz >= globals.max_logfile_size)
rotate_logfiles();

logfile=fopen(filename,"a");
ret=log_to_stdio_stream(logfile,msg);
fclose(logfile);
return ret;
}
--------
Now what happens when somebody decides to log to the printer? Does it
still sound silly to be passing "LPT1" to filesize?

yes
if one decides that i;m sure there will be other unportable means to do it

Dec 28 '07 #137

Chris Torek

>Chris Torek wrote:
[snippage occurred here]

>#ifdef OPTIONAL
if (n < space) {
new = realloc(buf, n + 1);
if (new != NULL)
buf = new;
}
#endif
buf[n] = '\0';

In article <47***************@yahoo.com>,
CBFalconer <cb********@maineline.netwrote:

>Minor error. If the realloc fails, buf[n] = '\0' will write beyond
the buffer.

No -- the buffer's size is space+1 bytes, not n+1. This is the
whole point of the realloc(): to shrink the buffer from space+1
bytes long to n+1 bytes long, so that buf[n] is the last byte,
rather than somewhere before the last byte.

(As someone else noted, though, there was a malloc() that should
have been a realloc().)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Dec 29 '07 #138

CBFalconer

user923005 wrote:

>

.... snip ...

>
My opinion is that Jacob chose what is probably one of the most
difficult possible projects to ridicule what is possible in
standard C, and also that he probably knew it before hand. He
acts like a dummy, but he isn't one.

I agree. However he does have a very limited view of the software
and computer industry. That means that he systematically ignores
many real problems.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee, Frohe Weihnachten
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Dec 29 '07 #139

CBFalconer

jacob navia wrote:

>

.... snip ...

>
You are forced to read character by character of data, until you
reach the end of the file. This is maybe ok (disk I/O could be the
limiting factor here) but ignores the problem I tried to solve
concerning abstracting the text/binary difference.

If we had a function called filsize() in the standard (one of my
main complaints) your program could be written in a few lines, and
read all the size of the file into memory.

I concede that there may be rare cases when you want the whole file
in memory. However, I can't really think of any for now. Remember
that you can't count on the availability of more than 64 Kbytes of
memory (although most systems provide more).

If you can think of cases where this function (full read-in) is
really useful, do specify a few in detail.

--
Merry Christmas, Happy Hanukah, Happy New Year
Joyeux Noel, Bonne Annee, Frohe Weihnachten
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Dec 29 '07 #140

Bart C

"CBFalconer" <cb********@yahoo.comwrote in message
news:47***************@yahoo.com...

I concede that there may be rare cases when you want the whole file
in memory. However, I can't really think of any for now. Remember
that you can't count on the availability of more than 64 Kbytes of
memory (although most systems provide more).

I can't count on more than 0KB, when out of memory. If writing an actual
user *application*, put the minimum ram on the box.

If you can think of cases where this function (full read-in) is
really useful, do specify a few in detail.

Sometimes it is not necessary but might be faster: Reading a block into the
memory is fast. Scanning bytes in that memory block is also fast.

But scanning bytes while needing to negotiate with the file system *per
byte* could be slower; in fact the file device might well be slow.

And sometimes random access is required to the whole file: data of various
kinds (eg uncompressed bitmap), executable data (eg. native code, byte
code). Or some small application likes to store all it's persistent
variables as disk files. When it starts again, it naturally wants to re-load
those files.

Sometimes you want to just grab the file in case something happens to it
(someone unplugs the flashdrive for example).

These are typical small-medium sized files. Huge files (large database,
video etc) are not suitable for this but would anyway use a different
approach or are designed for serial access.

I wouldn't call these examples rare cases, not on typical desktop computers
anyway. Common enough that a reading-entire-file-into-memory function would
be useful.

Bart

Dec 29 '07 #141

Malcolm McLean

"Bart C" <bc@freeuk.comwrote in message

"Chris Torek" <no****@torek.netwrote in message

Of course in practice one would write:

char *loadfile(FILE *fp, size_t *sizep) {
if (thisiswindows) /* or other capable OS */
/*do it in a dozen lines */
else
/*do it the hard way*/
}

No, we want the code to work anywhere. In practise you need too many #ifdefs
for each compiler, and then you can't test the code easily.

My function would slurp in an ASCII file, reasonably portably. As Chris
Torek pointed out, it could be improved to make it both more reusable and to
support weirder systems. It wasn't written with very much thought.

In practise I don't expect the MiniBasic script executor to break anytime
soon, on any system anyone will actually want to run it on. In fact the
people who have used it seriously have given it a total rewrite - to not use
the standard IO streams, to use integer only arithmetic and take out math
library calls, and so on, because it seems to have found favour with small
embedded systems. There was no way I could have anticipated those
requirements.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Dec 29 '07 #142

Joachim Schmitz

"jacob navia" <ja***@nospam.comschrieb im Newsbeitrag
news:fl**********@aioe.org...

Mark McIntyre wrote:
>On Fri, 28 Dec 2007 01:31:12 +0100, jacob navia wrote:

>>Look at that:

AlphaServer DS10L Noname pc clone (my machine) 466Mhz CPU
EV6 2GHZ Dual core AMD

Remember, CISC vs RISC !

The RISC idea was to reduce the instructions and speed up the clock.
Here we have a reduced instructions with a slower clock by a
factor of 4 ...

No. The idea is to reduce the instruction set and make these reduced set
execute in less CPU cycles. That way a RISC CPU doesn't need to speed up the
clock, it just gets more work done in less cycles regardless it's slower
clock.
Comparing the clocks of different type CPUs isn't helpfull at all,
benchmarks are needed for a meaningfull comparison.

Bye, Jojo

Dec 29 '07 #143

Walter Roberson

In article <fl**********@aioe.org>, jacob navia <ja***@nospam.orgwrote:

>Mark McIntyre wrote:
>On Fri, 28 Dec 2007 01:31:12 +0100, jacob navia wrote:

>>Look at that:

>>AlphaServer DS10L Noname pc clone (my machine)
466Mhz CPU EV6 2GHZ Dual core AMD

>Remember, CISC vs RISC !

>The RISC idea was to reduce the instructions and speed up the clock.
Here we have a reduced instructions with a slower clock by a
factor of 4 ...

And?

The main desktop machines I use are 200 MHz / 128 Mb (home) and
250 MHz / 256 Mb (work) -- and their desktop is still more responsive
than my 2.0 GHz / 1.5 Gb Windows PC. I won't deny that the browser is
noticably faster on the 2.0 GHz Windows PC, but start one task on
the Windows PC and everything else crawls even when there
is plenty of memory, whereas my RISC boxes stay peppy until you
run something big enough to swap to disk. (And when they do swap
to disk, they still record the input events, unlike the high speed
Opeteron Linux cluster at work, which loses most keypresses if it
is busy swapping to disk!)

A small clarification, by the way: the 200 MHz and 250 MHz are
the internal clock speeds, which are double the external clock
speeds on the system. The CPUs are being externally clocked at
only 100 MHz and 125 MHz respectively, and the memory is only 100 ns
on the boxes. In theory my Windows PC should be able to run rings
around those decade old boxes, but that's not what the user
experiences.

--
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth

Dec 29 '07 #144

On 28 Dec 2007 at 23:44, CBFalconer wrote:

Remember that you can't count on the availability of more than 64
Kbytes of memory (although most systems provide more).

Many C programs will need to get by on a lot less memory than this! I
know that for Jacob every computer is a 32-bit Windows box with
a couple of gigabytes of RAM, but out there in the real world C programs
often control things like toasters or kettles, where memory is severely
limited.

Dec 29 '07 #145

jacob navia

CJ wrote:

On 28 Dec 2007 at 23:44, CBFalconer wrote:
>Remember that you can't count on the availability of more than 64
Kbytes of memory (although most systems provide more).

Many C programs will need to get by on a lot less memory than this! I
know that for Jacob every computer is a 32-bit Windows box with
a couple of gigabytes of RAM, but out there in the real world C programs
often control things like toasters or kettles, where memory is severely
limited.

Look "CJ" whoever you are:

You know NOTHING of where I have programmed, or what I am doing.
Versions of lcc-win run in DSPs with 80k of memory, and only
20 usable.

You (like all the "regulars") repeat the same lies about me again
and again but that doesn't makes them true.

You feel like insulting someone?

Pick up another target or (much better) try to stop kissing ass ok?
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Dec 29 '07 #146

Richard

CBFalconer <cb********@yahoo.comwrites:

jacob navia wrote:
>>
... snip ...
>>
You are forced to read character by character of data, until you
reach the end of the file. This is maybe ok (disk I/O could be the
limiting factor here) but ignores the problem I tried to solve
concerning abstracting the text/binary difference.

If we had a function called filsize() in the standard (one of my
main complaints) your program could be written in a few lines, and
read all the size of the file into memory.

I concede that there may be rare cases when you want the whole file
in memory. However, I can't really think of any for now. Remember
that you can't count on the availability of more than 64 Kbytes of
memory (although most systems provide more).

You appear to have zero experience of using C in any REAL applications.

Files are used to store data all the time.

This data is often required in matrix operations, for example, where ALL
the data is required at one time.

This is one of a million similar scenarios.

Why do you have to be so contrary all the time? You never seem happy
unless your are prancing around trying to get one over on someone. You
are like RH but without the C skills.

As for your comments about the memory ... get real. From that ridiculous
statement are you saying that no "standard C" program can assume more
than 64k will be available? That would mean a hell of a lot of ISO C
programs bugging out with "malloc didn't work" errors.

Why do you do this?

Dec 29 '07 #147

Richard

CJ <no****@nospam.invalidwrites:

On 28 Dec 2007 at 23:44, CBFalconer wrote:
>Remember that you can't count on the availability of more than 64
Kbytes of memory (although most systems provide more).

Many C programs will need to get by on a lot less memory than this! I

And many do. What is your point? The point here is that many don't.

know that for Jacob every computer is a 32-bit Windows box with
a couple of gigabytes of RAM, but out there in the real world C programs
often control things like toasters or kettles, where memory is severely
limited.

Your comments have nothing to do with the subject.

Dec 29 '07 #148

Richard

"Bart C" <bc@freeuk.comwrites:

"CBFalconer" <cb********@yahoo.comwrote in message
news:47***************@yahoo.com...

>I concede that there may be rare cases when you want the whole file
in memory. However, I can't really think of any for now. Remember
that you can't count on the availability of more than 64 Kbytes of
memory (although most systems provide more).

I can't count on more than 0KB, when out of memory. If writing an actual
user *application*, put the minimum ram on the box.

>If you can think of cases where this function (full read-in) is
really useful, do specify a few in detail.

Sometimes it is not necessary but might be faster: Reading a block into the
memory is fast. Scanning bytes in that memory block is also fast.

But scanning bytes while needing to negotiate with the file system *per
byte* could be slower; in fact the file device might well be slow.

And sometimes random access is required to the whole file: data of various
kinds (eg uncompressed bitmap), executable data (eg. native code, byte
code). Or some small application likes to store all it's persistent
variables as disk files. When it starts again, it naturally wants to re-load
those files.

Sometimes you want to just grab the file in case something happens to it
(someone unplugs the flashdrive for example).

These are typical small-medium sized files. Huge files (large database,
video etc) are not suitable for this but would anyway use a different
approach or are designed for serial access.

I wouldn't call these examples rare cases, not on typical desktop computers
anyway. Common enough that a reading-entire-file-into-memory function would
be useful.

Good reply. The clique wont like it, because programming on *smaller*
boards here immediately signifies that you are a "real C user"
...... garbage I know.

Dec 29 '07 #149

Serve La

"Bart C" <bc@freeuk.comschreef in bericht
news:Na******************@text.news.blueyonder.co. uk...

>
"Eric Sosman" <es*****@ieee-dot-org.invalidwrote in message
news:kM******************************@comcast.com. ..
>Richard wrote:

>>Rubbish. All files have a byte size.

How many bytes does /dev/tty hold? How about /dev/zero?
/dev/ptyp0? /dev/random? /dev/poll? CON:? LPT:?
...
> And even for an ordinary file whose length can be known,
many systems permit the length to change between the moment
it's queried and the moment the result is used. How many bytes
should you allocate for an in-memory copy of /var/adm/messages?

These are all very interesting examples which should be kept in mind when
writing mission-critical code or code for life-support systems.

But there is a distinct class of well-behaved files (input files of a
compiler for example), which are unlikely to be huge and unlikely to
change. For a lot of applications and their files, this will be the case.

With shared files on multi-user/multi-process systems there can be
pitfalls, but size of the file suddenly changing would be the least of the
problems.

I don't know how to deal with /var/adm/messages or similar. Suppose I read
byte-by-byte as recommended then someone updates the beginning of the
file?

How about let filesize("/var/adm/messages") and others UB?? On some systems
it could return correct values on others an error

fflush(stdin) is UB so i dont see a reason why filesize should have every
single filetype perfectly defined.
And here in clc people will have 1 more reason to tell others that demons
will fly out of their nose when they try filesize() on /dev/random or
something!

Dec 29 '07 #150

Similar topics