Malcolm's new book

Malcolm McLean

The webpages for my new book are now up and running.

The book, Basic Algorithms, describes many of the fundamental algorithms
used in practical programming, with a bias towards graphics. It includes
mathematical routines from the basics up, including floating point
arithmetic, compression techniques, including the GIF and JPEG file formats,
hashing, red black trees, 3D and 3D graphics, colour spaces, machine
learning with neural networks, hidden Markov models, and fuzzy logic,
clustering, fast memory allocators, and expression parsing.

(Follow the links)

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Jul 24 '07

Subscribe Post Reply

263

9026

« First
<
2
3
4
5
6
>

Harald van =?UTF-8?B?RMSzaw==?=

Peter J. Holzer wrote:

On 2007-08-19 17:09, Kelsey Bjarnason <kb********@gmail.comwrote:
>You can't? News to me. Last I checked, the requirement was that the
resultant value be representable as EOF or an unsigned char; this does
not prevent passing in a char, it simply requires that the value of the
char be in a particular range - which isn't terribly surprising, as if
you passed in a value of 679 to a function expecting a value in the range
-1..255, you're doing something wrong.

On many systems char has a range of -128 .. 127, and much of that range
is actually used by common text files (e.g. -96 .. 126 on ISO-8859
systems), so no, you cannot portably pass a char to one of the isxxx
functions unless you have checked that the value is non-negative. But
then what do you do with the negative values? The correct way is almost
always to cast the char to unsigned char. (unless you have multibyte
strings, but there are different functions for that)

Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined for a correct program? On most systems,
there will be no difference, and I sincerely hope there will be no
difference on any system (in other words, I hope that on any system where
signed char has less representable values than unsigned char, plain char is
unsigned), but I don't believe that's required, so I'm curious.

Aug 20 '07 #151

pete

Harald van =?UTF-8?B?RMSzaw==?= wrote:

Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined
for a correct program? On most systems,
there will be no difference, and I sincerely hope there will be no
difference on any system
(in other words, I hope that on any system where
signed char has less representable values than unsigned char,
plain char is unsigned),
but I don't believe that's required, so I'm curious.

My feeling is that whether an implementation
uses signed magnitude or one's complement,
to represent negative integers,
shouldn't come into play wtih ctype functions.
I prefer a cast.

--
pete

Aug 20 '07 #152

Harald van =?UTF-8?B?RMSzaw==?=

pete wrote:

Harald van =?UTF-8?B?RMSzaw==?= wrote:
>Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined
for a correct program? On most systems,
there will be no difference, and I sincerely hope there will be no
difference on any system
(in other words, I hope that on any system where
signed char has less representable values than unsigned char,
plain char is unsigned),
but I don't believe that's required, so I'm curious.

My feeling is that whether an implementation
uses signed magnitude or one's complement,
to represent negative integers,
shouldn't come into play wtih ctype functions.
I prefer a cast.

Both forms use a cast, so I'm not completely sure which form you believe is
correct.

Aug 20 '07 #153

pete

Harald van =?UTF-8?B?RMSzaw==?= wrote:

>
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined
for a correct program? On most systems,
there will be no difference, and I sincerely hope there will be no
difference on any system
(in other words, I hope that on any system where
signed char has less representable values than unsigned char,
plain char is unsigned),
but I don't believe that's required, so I'm curious.
My feeling is that whether an implementation
uses signed magnitude or one's complement,
to represent negative integers,
shouldn't come into play wtih ctype functions.
I prefer a cast.

Both forms use a cast,
so I'm not completely sure which form you believe is correct.

"cast the char to unsigned char"

--
pete

Aug 20 '07 #154

Harald van =?UTF-8?B?RMSzaw==?=

pete wrote:

Harald van =?UTF-8?B?RMSzaw==?= wrote:
>pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined
for a correct program? On most systems,
there will be no difference, and I sincerely hope there will be no
difference on any system
(in other words, I hope that on any system where
signed char has less representable values than unsigned char,
plain char is unsigned),
but I don't believe that's required, so I'm curious.

My feeling is that whether an implementation
uses signed magnitude or one's complement,
to represent negative integers,
shouldn't come into play wtih ctype functions.
I prefer a cast.

Both forms use a cast,
so I'm not completely sure which form you believe is correct.

"cast the char to unsigned char"

Thanks. Does this also imply that if you want to have helper functions that
operate on arrays of unsigned char (that contain text), you should not pass
them a converted char *, but you should use an array of unsigned char right
from the start?

Aug 20 '07 #155

pete

Harald van =?UTF-8?B?RMSzaw==?= wrote:

>
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined
for a correct program? On most systems,
there will be no difference, and I sincerely hope there will be no
difference on any system
(in other words, I hope that on any system where
signed char has less representable values than unsigned char,
plain char is unsigned),
but I don't believe that's required, so I'm curious.

My feeling is that whether an implementation
uses signed magnitude or one's complement,
to represent negative integers,
shouldn't come into play wtih ctype functions.
I prefer a cast.

Both forms use a cast,
so I'm not completely sure which form you believe is correct.
"cast the char to unsigned char"

Thanks.
Does this also imply that if you want to have helper functions that
operate on arrays of unsigned char (that contain text),
you should not pass them a converted char *,
but you should use an array of unsigned char right
from the start?

I just changed my mind. Sorry.
It's more complicated than that.

I don't have a simple answer.

I was thinking of the way that the "Comparison functions"
functions work in the standard library, and I got it backwards.
And after that, what I think gets even more complicated.

N869
7.21.4 Comparison functions
[#1] The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.

int str_cmp(const char *s1, const char *s2)
{
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

for (;;) {
if (*p1 != *p2) {
return *p2 *p1 ? -1 : 1;
}
if (*p1 == '\0') {
return 0;
}
++p1;
++p2;
}
}

However, I recall another discussion with Eric Sosman,
(let's see if I remember it correctly) in which we were discussing
what the functionality of case insensitive variations
on string comparison functions should be,
and he suggested that perhaps it should not be so
closely modeled on the standard library functions,
as to consider the bytes as unsigned char.

int str_ccmp(const char *s1, const char *s2)
{
for (;;) {
if (*s1 != *s2) {
const int c1 = tolower((unsigned char)*s1);
const int c2 = tolower((unsigned char)*s2);

if (c2 != c1) {
return c2 c1 ? -1 : 1;
}
} else {
if (*s1 == '\0') {
return 0;
}
}
++s1;
++s2;
}
}

I don't know anything about the way that ctype functions
work in any other locale, besides the C locale.
And so, I don't really know if the ctype functions are really useful
when the arguments are converted to (unsigned char),
I just know that making the conversion, avoids undefined behavior.

--
pete

Aug 20 '07 #156

Malcolm McLean

"Eric Sosman" <Er*********@sun.comwrote in message
news:1186440511.431078@news1nwk...

There's an fgets() replacement that seems never to
have been tested. Aside from the diagnostics the compiler
is required to emit, there's a backwards test that results
in double-freeing or in dereferencing NULL. Then there are
some baffling stylistic decisions: Why use fgetc() instead
of getc(), why invite bugs by writing buffer sizes and size
increments twice instead of once, why spell NULL as 0, why
put up with the slowness on very long lines when effective
countermeasures are easy?

I still haven't been able to find this bug. I've looked and looked and I
think it must be a false bug report. Except the slowness on long lines.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 20 '07 #157

Richard Heathfield

Malcolm McLean said:

>
"Eric Sosman" <Er*********@sun.comwrote in message
news:1186440511.431078@news1nwk...
> There's an fgets() replacement that seems never to
have been tested. Aside from the diagnostics the compiler
is required to emit, there's a backwards test that results
in double-freeing or in dereferencing NULL. Then there are
some baffling stylistic decisions: Why use fgetc() instead
of getc(), why invite bugs by writing buffer sizes and size
increments twice instead of once, why spell NULL as 0, why
put up with the slowness on very long lines when effective
countermeasures are easy?

I still haven't been able to find this bug. I've looked and looked and
I think it must be a false bug report.

Look harder. Eric Sosman is better at spotting bugs than you are.

If you still can't find the bug, post the code here, and we'll find it
for you.

Except the slowness on long lines.

Avoiding that is trivial, and it has been discussed here often.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 20 '07 #158

pete

Richard Heathfield wrote:

>
Malcolm McLean said:

"Eric Sosman" <Er*********@sun.comwrote in message
news:1186440511.431078@news1nwk...
There's an fgets() replacement that seems never to
have been tested. Aside from the diagnostics the compiler
is required to emit, there's a backwards test that results
in double-freeing or in dereferencing NULL. Then there are
some baffling stylistic decisions: Why use fgetc() instead
of getc(), why invite bugs by writing buffer sizes and size
increments twice instead of once, why spell NULL as 0, why
put up with the slowness on very long lines when effective
countermeasures are easy?

I still haven't been able to find this bug. I've looked and looked and
I think it must be a false bug report.

Look harder. Eric Sosman is better at spotting bugs than you are.

If you still can't find the bug, post the code here, and we'll find it
for you.

I added and deleted some white space characters.
I could make a bunch of improvements to the code,
but I tested the function some, and didn't notice anything unusual.

/*
readine() – read aline from an input file
params: fp – pointer to an opened file
returns: line read up to newline, 0 on out-of-memory or EOF.
notes: trailing newline stripped. Return allocated with malloc().
*/

char *readline(FILE *fp)
{
char *buff;
int nread = 0;
int buffsize = 128;
int ch;
char *temp;

buff = malloc(128);
if (!buff)
return 0;
while ((ch = fgetc(fp)) != '\n') {
if (ch == EOF) {
if (nread == 0) {
free(buff);
return 0;
}
break;
}
buff[nread] = (char) ch;
nread++;
if (nread == buffsize - 1) {
temp = realloc(buff, buffsize + 64);
if (!temp) {
free(buff);
return 0;
}
buff = temp;
buffsize += 64;
}
}
buff[nread] = 0;
return buff;
}

Except the slowness on long lines.

Avoiding that is trivial, and it has been discussed here often.

--
pete

Aug 20 '07 #159

Kelsey Bjarnason

On Mon, 20 Aug 2007 22:03:22 +0100, Malcolm McLean wrote:

"Eric Sosman" <Er*********@sun.comwrote in message
news:1186440511.431078@news1nwk...
> There's an fgets() replacement that seems never to
have been tested. Aside from the diagnostics the compiler
is required to emit, there's a backwards test that results
in double-freeing or in dereferencing NULL. Then there are
some baffling stylistic decisions: Why use fgetc() instead
of getc(), why invite bugs by writing buffer sizes and size
increments twice instead of once, why spell NULL as 0, why
put up with the slowness on very long lines when effective
countermeasures are easy?

I still haven't been able to find this bug. I've looked and looked and I
think it must be a false bug report. Except the slowness on long lines.

Hmm. Let's see.

Bug 1: if realloc fails, instead of returning whatever has been read so
far, you simply discard it. Apparently it's better to get _no_ data than
partial data.

Bug 2: 16-bit implementation, long lines. Let's see what happens:

buffsize starts at 128, adding 64 bytes each time. The last allocation
would have been some 32704 bytes which, combined with the previous 32640
is still viable in a 64K memory, so presumably one can allocate the space
without issue on such a system[1].

Except... buffsize is 32704. Add 64 to this, it becomes 32768 - which, as
I recall, overflows the range of a signed 16-bit int, which is only
required to store -32767 to 32767.

Hmm.

You use realloc.

And integer overflow.

Can said integer overflow result in a buffsize value of zero? Let's see
what happens with realloc on this here implementation, when the size is
zero. According to the docs...

void *realloc(void *ptr, size_t size);

if size is equal to zero, the call is equivalent to free(ptr).
....
If size was equal to 0, either NULL or a pointer suitable to be passed
to free() is returned.

So...

temp = realloc (buff, buffsize + 64);

You've just freed buff and (possibly) set temp to a pointer which is
suitable to be passed to free, but which has _absolutely no usability_
otherwise. Which you promptly try to stuff data into.

Boom for signed integer overflow, boom again for stuffing data into bogus
pointers. And smack at least once for discarding data because _you_ think
it just doesn't matter, instead of letting the user make such a call.

I have no idea whether Eric saw these or something else, or whether he'd
even classify these as bugs or not. They're just some obvious points of
what I'd consider questionable coding.

[1] The 16-bit systems I'm used to tend to use one single 64K memory block
for all allocations, unless explicitly told otherwise in the code,
allowing such allocations as long as they all fit in one 64K block. YMMV.
No coupons with sale prices. 10% off for left-handed redheads.

Aug 20 '07 #160

pete

Kelsey Bjarnason wrote:

>
On Mon, 20 Aug 2007 13:46:02 +0200, Peter J. Holzer wrote:

I see it the other way around:
(signed) int is the default type I use for integral
variables unless I have a good reason to use something else

Exactly.

I don't really need a very good reason to use type unsigned.

I need a good reason to use types
which are subject to integer promotions.
If I was doing a bunch of operations
and all my operands and all my results
were algorithmically locked into being between 0 and 127 inclusive,
I wouldn't use type char,
but I wouldn't have any problem using type unsigned for that.

--
pete

Aug 20 '07 #161

Ben Bacarisse

Richard Heathfield <rj*@see.sig.invalidwrites:

pete said:

> if (!temp) {
free(buff);
return 0;
}

This code gives up too easily. If you can't get a big wodge of extra
space, try for a smaller wodge. And still a smaller one, in a loop that
decreases the demand in some sensible way. All you really *need* at
this point is one extra character.

Curiously, no! The character has been read and stored, the next one
may be '\n' or fgetc may return EOF so the function is allocating
storage it does not (certainly) need. Your real point is entirely
valid, of course, I'm just pointing out something else I don't like
about this particular readline -- it allocates too early.

--
Ben.

Aug 21 '07 #162

Richard Heathfield

<snip>

In a long reply, pete said a great deal on which I have no particular
comment to make, but one point certainly bears further discussion:

Richard Heathfield wrote:
>>
FTR, the required headers are:

<stdio.h>
<stdlib.h>

It depends on whether or not you want to consider the
published function definition as a code fragment.
Code fragments don't always have to be compilable.

Absolutely right. But sometimes they *do*. I want to move straight into
the general case here - and do I hear a sigh of relief from MM?

Code fragments come in all manner of shapes and sizes, and are provided
for all sorts of reasons. I think the smallest (visible) code fragment
I've ever written is <code>.</code- when describing the structure
member operator. Clearly, that fragment is not intended to be compiled
separately!

Generally, if a code fragment is written inline in a paragraph, it is
unreasonable to expect it to compile. For example, I have recently
written the following in an HTML page: "The declarator, <code>int
main(void)</code>, tells the compiler that <code>main</codeis a
function that returns a value of type <code>int</code>." I do not
expect people to expect that int main(void) will compile all by itself!

When a particular line (or short group of lines) of code is being
discussed, again it is unreasonable to expect it to compile:

"But we can use a <code>for</codeloop to make the job a lot easier:
</p>
<pre><code>
for(i = 0; i < 64; i++)
{
putchar('-');
}
</code></pre>
<p>
This is of course the equivalent of:
</p>
<pre><code>
i = 0;
while(i < 64)
{
putchar('-');
i++;
}
</code></pre>"

If an entire function is given, however, then it /is/ reasonable to
expect it to compile, provided that any relevant information supplied
earlier in the discussion is made available to the compiler (e.g.
headers, declarations for called functions, and so on). The obvious
exception is when we get something like:

int foo(void)
{
auto i = 42; /* perhaps this is what we're trying to talk about */

Aug 21 '07 #163

Richard Heathfield

Ben Bacarisse said:

Richard Heathfield <rj*@see.sig.invalidwrites:

<snip>

>All you really *need*
at this point is one extra character.

Curiously, no! The character has been read and stored, the next one
may be '\n' or fgetc may return EOF so the function is allocating
storage it does not (certainly) need.

Oh, good spot, sir. It seemed a bit upside down to me, but I thought
that was just a harmless style difference between MM and myself. I
should read more closely in future.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 21 '07 #164

pete

Richard Heathfield wrote:

If an entire function is given, however, then it /is/ reasonable to
expect it to compile, provided that any relevant information supplied
earlier in the discussion is made available to the compiler (e.g.
headers, declarations for called functions, and so on). The obvious
exception is when we get something like:

int foo(void)
{
auto i = 42; /* perhaps this is what we're trying to talk about */
.
.
.
return i;
}

Here, the three lines of dot tell an eloquent story
- "stuff goes here"
- and we recognise that this is a device for generalising a function
that, in its present form, is not intended to be compiled.

All of the above are suitable for expositors, didacts, and exegetes to
employ during their expositions, didacticisms, and exegeses.

When one is seeking help, however ("my program is broken, please help,
here's the code"), it is normally essential to provide a complete,
minimal, compilable program
- together with enough information to allow
easy reconstruction of test data.

It is also necessary for authors
- not just book authors but also online tutorial writers
- to provide complete, compilable, compilED, and
TESTED versions of the code in their books. If this is not done via
some kind of side-channel (Web site, CD in the back of the book, or
whatever), then it must be done in the book itself.
If the side-channel
is employed, the abridged book code should be annotated with an
explanation of where the full version can be found
(or a general remark
covering this point should be given near the front of the book).

Did I miss anything obvious?

No.

--
pete

Aug 21 '07 #165

Eric Sosman

Richard Heathfield wrote:

Malcolm McLean said:

>"Eric Sosman" <Er*********@sun.comwrote in message
news:1186440511.431078@news1nwk...
>> There's an fgets() replacement that seems never to
have been tested. Aside from the diagnostics the compiler
is required to emit, there's a backwards test that results
in double-freeing or in dereferencing NULL. Then there are
some baffling stylistic decisions: Why use fgetc() instead
of getc(), why invite bugs by writing buffer sizes and size
increments twice instead of once, why spell NULL as 0, why
put up with the slowness on very long lines when effective
countermeasures are easy?

I still haven't been able to find this bug. I've looked and looked and
I think it must be a false bug report.

Look harder. Eric Sosman is better at spotting bugs than you are.

I'm so good I can even spot them when they're not there.

I posted a retraction a few days later, but as a hint rather
than as an outright "I dun rong" message. I suspected that the
code's author was not reading the criticisms carefully, and the
fact that he didn't give me both barrels in the traditional c.l.c.
fashion -- not even after a hint -- seems to support my suspicion.

The error itself I blame on my own decreasing visual acuity,
exacerbated by the author's decision to use a small proportional
font for his code: the ! character is very narrow, and snuggled
concealingly into the embrace of the preceding ( in a way that
made it invisible to my aging eyes.

Let's see: I posted the original erroneous bug report on
August 6 and the veiled retraction on August 11. Now on August
20, two weeks after the original claim and nine days after the
broad hint, the author at last is actually paying attention.
I congratulate him on finally refuting my error, and I hope he
will pay similar attention to the other criticisms I and other
C users have raised. They deserve more attention than it turns
out this one did.

--
Eric Sosman
es*****@ieee-dot-org.invalid

Aug 21 '07 #166

pete

Philip Potter wrote:

Why is getc() an improvement over fgetc()?

It's a speed issue, in practice.

getc is typically implemented as a macro.
fgetc is typically not implemented as a macro.

If fgetc were to be implemented in C code,
it would probably look like this:

int fgetc(FILE *stream)
{
return getc(stream);
}

--
pete

Aug 21 '07 #167

Philip Potter

pete wrote:

Philip Potter wrote:

>Why is getc() an improvement over fgetc()?

It's a speed issue, in practice.

getc is typically implemented as a macro.
fgetc is typically not implemented as a macro.

fgets() is required to be implemented as a function AFAIK.

If fgetc were to be implemented in C code,
it would probably look like this:

int fgetc(FILE *stream)
{
return getc(stream);
}

Ah that's where my confusion comes from. I had assumed that getc would
be implemented in terms of fgetc rather than the other way round. But of
course your version makes much more sense.

Phil

--
Philip Potter pgp <atdoc.ic.ac.uk

Aug 21 '07 #168

Richard Heathfield

Philip Potter said:

<snip>

Idioms are not universal;

Well said. Nevertheless, the reason that they are idioms is that they
are very common. Were they not very common, they would not be idioms.

it will depend on your project team which
idioms are recognised and common, and which are less common and harder
to read. I find your focus on your own idioms as "good C" to be a
little absolutist.

Well, "your own idioms" is a bit of a stretch - the inclusion of
pre-increment in conditional expressions is far from unique to me. On
the "absolutist" charge, I would agree that I nurse a stylistic
paradigm that is rather less liberal than that used by many people, but
I don't insist that "my" style is good C and theirs is not. When I note
departures from "my" style, I tend to note, too, that this is purely a
matter of style, not of correctness or "goodness".

>

>>I have been using C on and off for a number of years, and I would
probably consider myself "average" and certainly not as good as many
regulars on clc. I can read C expressions such as *to++ = *from++
well because they are idiomatic;

Quite so. As a C programmer, one ought to familiarise oneself with
the ideas of pre- and post-increment.

But there's a difference between knowing what the operators do and
being able to read idiomatic statements using those operators quickly.

Indeed, and there is a trade-off between expressive power (which is
sometimes described as "terseness") and clarity (which is sometimes
described as "verbosity") - and it's sometimes a fine line to tread.

<snip>

I don't think we disagree too much here.

Oh dear. Perhaps we need to try harder. :-)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 21 '07 #169

Keith Thompson

Philip Potter <pg*@see.sig.invalidwrites:

pete wrote:
>Philip Potter wrote:

>>Why is getc() an improvement over fgetc()?
It's a speed issue, in practice.
getc is typically implemented as a macro.
fgetc is typically not implemented as a macro.

fgets() is required to be implemented as a function AFAIK.

[...]

I think you meant fgetc, not fgets.

Nearly all function-like things in the C standard library are required
to be implemented as actual functions (meaning, for example, that you
can take their addresses and call them indirectly).

Any standard function may *in addition* be implemented as a macro, as
long as the macro is well-behaved. Such a macro must evaluate each of
its arguments exactly once, and each reference to an argument must be
fully parenthesized, so it can be used in an expression as if it were
a function. (See C99 7.1.4.)

assert() is an exception to this; it's required to be a macro, not a
function, because it needs to expand __FILE__ and __LINE__ at the
point where it's invoked.

getc() is equivalent to fgetc() *except* that if it's implemented as a
macro, it can evaluate its stream argument more than once. It turns
out that this makes it much easier to implement it efficiently (take a
look at your own system's implementation in stdio.h), and the stream
argument would have side effects only in a very unusual program.

For example, if you had an array of FILE* and you wanted to cycle
through it, reading a single character from each, you might write:
c = fgetc(file_array[i++]);
If you use getc rather than fgetc, 'i' might be incremented more than
once. Even though this is a very unusual usage, a conforming
implementation *must* arrange for it to work properly. Thus getc()
has this special permission so it can be more efficent for the 99.99%
of cases where it doesn't matter that the stream argument may be
evaluated more than once.

Note that even if a getc() macro is provided (as it almost certain
is), there still has to be a getc() function as well.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 21 '07 #170

Malcolm McLean

"Kelsey Bjarnason" <kb********@gmail.comwrote in message
news:hf************@spanky.localhost.net...

On Mon, 20 Aug 2007 22:03:22 +0100, Malcolm McLean wrote:

>"Eric Sosman" <Er*********@sun.comwrote in message
news:1186440511.431078@news1nwk...
>> There's an fgets() replacement that seems never to
have been tested. Aside from the diagnostics the compiler
is required to emit, there's a backwards test that results
in double-freeing or in dereferencing NULL. Then there are
some baffling stylistic decisions: Why use fgetc() instead
of getc(), why invite bugs by writing buffer sizes and size
increments twice instead of once, why spell NULL as 0, why
put up with the slowness on very long lines when effective
countermeasures are easy?

I still haven't been able to find this bug. I've looked and looked and I
think it must be a false bug report. Except the slowness on long lines.

Hmm. Let's see.

Bug 1: if realloc fails, instead of returning whatever has been read so
far, you simply discard it. Apparently it's better to get _no_ data than
partial data.

Enlightenment dawns. Sure it is. No results beats wrong results.

Bug 2: 16-bit implementation, long lines. Let's see what happens:

buffsize starts at 128, adding 64 bytes each time. The last allocation
would have been some 32704 bytes which, combined with the previous 32640
is still viable in a 64K memory, so presumably one can allocate the space
without issue on such a system[1].

Except... buffsize is 32704. Add 64 to this, it becomes 32768 - which, as
I recall, overflows the range of a signed 16-bit int, which is only
required to store -32767 to 32767.

Hmm.

You use realloc.

And integer overflow.

Can said integer overflow result in a buffsize value of zero? Let's see
what happens with realloc on this here implementation, when the size is
zero. According to the docs...

void *realloc(void *ptr, size_t size);

if size is equal to zero, the call is equivalent to free(ptr).
...
If size was equal to 0, either NULL or a pointer suitable to be passed
to free() is returned.

So...

temp = realloc (buff, buffsize + 64);

You've just freed buff and (possibly) set temp to a pointer which is
suitable to be passed to free, but which has _absolutely no usability_
otherwise. Which you promptly try to stuff data into.

Boom for signed integer overflow, boom again for stuffing data into bogus
pointers. And smack at least once for discarding data because _you_ think
it just doesn't matter, instead of letting the user make such a call.

I have no idea whether Eric saw these or something else, or whether he'd
even classify these as bugs or not. They're just some obvious points of
what I'd consider questionable coding.

You're right. Converting to size_t won't necessarily solve the problem. You
can fix it with fancy coding, inappropriate for what the purpose is. That's
why I specify the functions can fail for extreme inputs. I specifically
reiterate that the fucntion is not acceptable as a library function.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 21 '07 #171

santosh

Malcolm McLean wrote:

Richard Heathfield wrote:

>And that's basically what he's trying to do, except that he's trying to
improve on fgets (by making it possible to read arbitrarily long lines
in a single call) - but failing, IMHO. His poor design and poor
implementation doom the attempt from the start.

The function is reasonable drop-in for fgets() in a non-security,
non-safety critical environment. It means the programmer doesn't have to
worry about buffer size. A malicious user can crash things by passing a
massive line to the function, but we don't all have to consider that
possibility.

A very simple check is all that would be required to avoid this DoS issue.
Surely one line is not going to confuse anyone?

What is potentially more confusing is the use of signed types when unsigned
ones are appropriate.

Aug 21 '07 #172

Malcolm McLean

"Eric Sosman" <es*****@ieee-dot-org.invalidwrote in message
news:Qb******************************@comcast.com. ..

I posted a retraction a few days later, but as a hint rather
than as an outright "I dun rong" message. I suspected that the
code's author was not reading the criticisms carefully, and the
fact that he didn't give me both barrels in the traditional c.l.c.
fashion -- not even after a hint -- seems to support my suspicion.

There was so much of it. I decided to wait until the thread had died down,
then list all the errors.

>
The error itself I blame on my own decreasing visual acuity,
exacerbated by the author's decision to use a small proportional
font for his code: the ! character is very narrow, and snuggled
concealingly into the embrace of the preceding ( in a way that
made it invisible to my aging eyes.

In a way a false bug report is a much more damning criticism than a real
one. A real bug is usually easily corrected. A false bug means that the code
isn' sufficiently clear, which is a lot harder to fix. However dynamically
expanding buffers are inherently hard in C. I cut the code down as much as
possible, and I don't think it can really be made any simpler. As it is
there are not really enough error checks - if the buffer overflows the range
of an int there will be problems.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 21 '07 #173

Malcolm McLean

"santosh" <sa*********@gmail.comwrote in message
news:fa**********@aioe.org...

>
What is potentially more confusing is the use of signed types when
unsigned
ones are appropriate.

That's a design decision. Insisting on certain data types means that the
algorithms are harder to port to languages other than C, which may lack
unsigned types.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 21 '07 #174

Kelsey Bjarnason

[snips]

On Tue, 21 Aug 2007 21:14:09 +0100, Malcolm McLean wrote:

>Bug 1: if realloc fails, instead of returning whatever has been read so
far, you simply discard it. Apparently it's better to get _no_ data than
partial data.

Enlightenment dawns. Sure it is. No results beats wrong results.

Not _wrong results_, incomplete data.

If I were trying to make a copy of the data off a flaky disk onto a good
one, for example, I'd much rather get as much as possible than have the
routine dictate to me what I'm allowed and not allowed to read. The data
*has been read*. It's not *your* place to tell me I can't use it, that's
my decision.

You're right. Converting to size_t won't necessarily solve the problem. You
can fix it with fancy coding, inappropriate for what the purpose is. That's
why I specify the functions can fail for extreme inputs. I specifically
reiterate that the fucntion is not acceptable as a library function.

Nor for any other purpose, other than to show people how *not* to write
software. Yes, we're agreed.

Aug 21 '07 #175

Kelsey Bjarnason

On Tue, 21 Aug 2007 21:53:28 +0100, Malcolm McLean wrote:

"santosh" <sa*********@gmail.comwrote in message
news:fa**********@aioe.org...
>>
What is potentially more confusing is the use of signed types when
unsigned
ones are appropriate.

That's a design decision. Insisting on certain data types means that the
algorithms are harder to port to languages other than C, which may lack
unsigned types.

Writing in a sensible pseudocode makes it even easier to port, but you
didn't do that. Thus the only viable conclusion is that you weren't
concerned about ease of porting, you just can't write C code.

Aug 21 '07 #176

Kelsey Bjarnason

[snips]

On Tue, 21 Aug 2007 12:42:21 -0700, Keith Thompson wrote:

Any standard function may *in addition* be implemented as a macro, as
long as the macro is well-behaved. Such a macro must evaluate each of
its arguments exactly once

Exactly once, or _at most_ once?

I ponder the case of, say, strchr, where it might do something akin to:

if ( *haystack == '\0' )
break;

and as a result, never evaluate "needle" at all.

Is it required to evaluate needle anyways? If so, to what end?

Aug 21 '07 #177

CBFalconer

Philip Potter wrote:

Richard Heathfield wrote:

.... snip ...

>
>getc would be an improvement over fgetc, and I'd rather see the
EOF test within the loop control, but again, neither of these is
strictly a correction.

Why is getc() an improvement over fgetc()? I've heard this stated
but never explained.

Because getc can be implemented as a macro, provided its action
doesn't have to evaluate the argument more than once. This can
avoid a good deal of procedure calling, and also avoid having to
assign an input buffer. There are special provisions for getc and
putc in the standard to allow this.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 21 '07 #178

Eric Sosman

Kelsey Bjarnason wrote On 08/21/07 17:28,:

[snips]

On Tue, 21 Aug 2007 12:42:21 -0700, Keith Thompson wrote:

>>Any standard function may *in addition* be implemented as a macro, as
long as the macro is well-behaved. Such a macro must evaluate each of
its arguments exactly once

Exactly once, or _at most_ once?

Exactly once (7.1.4p1).

I ponder the case of, say, strchr, where it might do something akin to:

if ( *haystack == '\0' )
break;

and as a result, never evaluate "needle" at all.

Is it required to evaluate needle anyways? If so, to what end?

Yes, it is required, to ensure that

strchr(lineptr[i++], delim[--j])

work the same way with strchr-the-macro as with strchr-
the-function.

--
Er*********@sun.com

Aug 21 '07 #179

Keith Thompson

Kelsey Bjarnason <kb********@gmail.comwrites:

[snips]
On Tue, 21 Aug 2007 12:42:21 -0700, Keith Thompson wrote:

>Any standard function may *in addition* be implemented as a macro, as
long as the macro is well-behaved. Such a macro must evaluate each of
its arguments exactly once

Exactly once, or _at most_ once?

Exactly once. C99 7.1.4:

Any invocation of a library function that is implemented as a
macro shall expand to code that evaluates each of its arguments
exactly once, fully protected by parentheses where necessary, so
it is generally safe to use arbitrary expressions as arguments.

I ponder the case of, say, strchr, where it might do something akin to:

if ( *haystack == '\0' )
break;

and as a result, never evaluate "needle" at all.

Is it required to evaluate needle anyways? If so, to what end?

Yes, it must evaluate needle anyway, so that any side effects occur
exactly as they would in a function call. This function call:

(strchr)(foo++, bar++);

must increment foo and bar exactly once. This possible macro
invocation:

strchr(foo++, bar++);

must do the same thing.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 21 '07 #180

Flash Gordon

Malcolm McLean wrote, On 21/08/07 21:53:

>
"santosh" <sa*********@gmail.comwrote in message
news:fa**********@aioe.org...
>>
What is potentially more confusing is the use of signed types when
unsigned
ones are appropriate.

That's a design decision. Insisting on certain data types means that the
algorithms are harder to port to languages other than C, which may lack
unsigned types.

Not really. When checking for overflow you still check for overflow,
just as you should with an unsigned type.
--
Flash Gordon

Aug 21 '07 #181

Flash Gordon

CBFalconer wrote, On 21/08/07 22:21:

Philip Potter wrote:
>Richard Heathfield wrote:

... snip ...

>>getc would be an improvement over fgetc, and I'd rather see the
EOF test within the loop control, but again, neither of these is
strictly a correction.
Why is getc() an improvement over fgetc()? I've heard this stated
but never explained.

Because getc can be implemented as a macro, provided its action
doesn't have to evaluate the argument more than once.

You seem to have the above backwards. getc is explicitly allowed to
evaluate its parameter more than once when implemented as a macro,
unlike fgetc.

This can
avoid a good deal of procedure calling, and also avoid having to
assign an input buffer. There are special provisions for getc and
putc in the standard to allow this.

Indeed getc and putc are a special case and it is to save on overheads.
--
Flash Gordon

Aug 21 '07 #182

Peter J. Holzer

On 2007-08-20 16:47, Harald van DÄ³k <tr*****@gmail.comwrote:

Peter J. Holzer wrote:
>On many systems char has a range of -128 .. 127, and much of that range
is actually used by common text files (e.g. -96 .. 126 on ISO-8859
systems), so no, you cannot portably pass a char to one of the isxxx
functions unless you have checked that the value is non-negative. But
then what do you do with the negative values? The correct way is almost
always to cast the char to unsigned char. (unless you have multibyte
strings, but there are different functions for that)

Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined for a correct program?

Good question.

On most systems,
there will be no difference, and I sincerely hope there will be no
difference on any system (in other words, I hope that on any system where
signed char has less representable values than unsigned char, plain char is
unsigned), but I don't believe that's required, so I'm curious.

I hope so, too, but I can't see a requirement, either. I was trying to
construct one from the "interpreted as unsigned char" requirement for
strcmp, but anything I try invokes undefined behaviour.

In any case even if it was allowed, I would consider an implementation
which forced me to cast all strings to a pointer to an incompatible type
to correctly access the individual characters as unacceptably bad.

hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"

Aug 21 '07 #183

pete

Harald van =?UTF-8?B?RMSzaw==?= wrote:

>
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined
for a correct program? On most systems,
there will be no difference, and I sincerely hope there will be no
difference on any system
(in other words, I hope that on any system where
signed char has less representable values than unsigned char,
plain char is unsigned),
but I don't believe that's required, so I'm curious.

My feeling is that whether an implementation
uses signed magnitude or one's complement,
to represent negative integers,
shouldn't come into play wtih ctype functions.
I prefer a cast.

Both forms use a cast,
so I'm not completely sure which form you believe is correct.
"cast the char to unsigned char"

Thanks.
Does this also imply that if you want to have helper functions that
operate on arrays of unsigned char (that contain text),
you should not pass them a converted char *,
but you should use an array of unsigned char right
from the start?

As I said in my other post, it's complicated.

I just remembered what it is that I like about casting the values:
It's because putchar works that way.

If you define these negative int values:
#define NEG_5 ('5' - 1 - (unsigned char)-1)
#define NEG_A ('A' - 1 - (unsigned char)-1)

Then putchar(NEG_5) will equal 5
and putchar(NEG_A) will equal 'A'
(or EOF, but that's not the point)

Also
isupper((unsigned char)NEG_5) is 0
islower((unsigned char)NEG_5) is 0
toupper((unsigned char)NEG_5) is 5
tolower((unsigned char)NEG_5) is 5

isupper((unsigned char)NEG_A) is 1
islower((unsigned char)NEG_A) is 0
toupper((unsigned char)NEG_A) is A
tolower((unsigned char)NEG_A) is a

--
pete

Aug 22 '07 #184

Kelsey Bjarnason

[snips]

On Tue, 21 Aug 2007 18:31:33 -0400, Eric Sosman wrote:

Exactly once (7.1.4p1).

Yes, it is required, to ensure that

strchr(lineptr[i++], delim[--j])

work the same way with strchr-the-macro as with strchr-
the-function.

Er... umm... <smacks forehead of course. 'Scuse me while I brain fart.

Aug 22 '07 #185

Kelsey Bjarnason

[snips]

On Tue, 21 Aug 2007 15:49:38 -0700, Keith Thompson wrote:

Yes, it must evaluate needle anyway, so that any side effects occur
exactly as they would in a function call. This function call:

(strchr)(foo++, bar++);

Zackly.

It's one of those cases of "Yeah, I know it works just like the function...
and that the code will work either way... but I missed the implication of
it."

Aug 22 '07 #186

CBFalconer

Malcolm McLean wrote:

>

.... snip ...

>
The function is reasonable drop-in for fgets() in a non-security,
non-safety critical environment. It means the programmer doesn't
have to worry about buffer size. A malicious user can crash things
by passing a massive line to the function, but we don't all have
to consider that possibility.

That is precisely the purpose of ggets(char **). It is simple and
can't crash. Take a look at:

<http://cbfalconer.home.att.net/download/>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 22 '07 #187

CBFalconer

Malcolm McLean wrote:

>

.... snip ...

>
You're right. Converting to size_t won't necessarily solve the
problem. You can fix it with fancy coding, inappropriate for what
the purpose is. That's why I specify the functions can fail for
extreme inputs. I specifically reiterate that the fucntion is not
acceptable as a library function.

Very simple. Just pass malloc (and realloc) through extenders:

void *xmalloc(size_t sz) {
if (!sz) sz++;
return malloc(sz);
}

and now the NULL test for failure is accurate.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 22 '07 #188

pete

Philip Potter wrote:

>
pete wrote:
Philip Potter wrote:

Why is getc() an improvement over fgetc()?
It's a speed issue, in practice.

getc is typically implemented as a macro.
fgetc is typically not implemented as a macro.

fgets() is required to be implemented as a function AFAIK.

Both of them are required to be implemented as functions.

getc is typically also implemented as a macro.
fgetc is typically not also implemented as a macro.

If fgetc were to be implemented in C code,
it would probably look like this:

int fgetc(FILE *stream)
{
return getc(stream);
}

Ah that's where my confusion comes from. I had assumed that getc would
be implemented in terms of fgetc rather than the other way round.
But of
course your version makes much more sense.

I suspect that fgetc and fputc exist, at in part,
to simplify the standard's description of input and output.

N869
7.19.3 Files
[#11]
The byte input functions
read characters from the stream as if by successive calls to
the fgetc function.

However, programmers are likely to think of getc,
as the real building block, as the description of getchar suggests.

Description
[#2] The getchar function is equivalent to getc with the
argument stdin.
--
pete

Aug 22 '07 #189

CBFalconer

Flash Gordon wrote:

CBFalconer wrote, On 21/08/07 22:21:
>Philip Potter wrote:
>>Richard Heathfield wrote:

... snip ...
>>>>
getc would be an improvement over fgetc, and I'd rather see the
EOF test within the loop control, but again, neither of these is
strictly a correction.

Why is getc() an improvement over fgetc()? I've heard this stated
but never explained.

Because getc can be implemented as a macro, provided its action
doesn't have to evaluate the argument more than once.

You seem to have the above backwards. getc is explicitly allowed to
evaluate its parameter more than once when implemented as a macro,
unlike fgetc.

Yup. Wrote too fast. Thanks for the correction.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 22 '07 #190

Keith Thompson

CBFalconer <cb********@yahoo.comwrites:

Malcolm McLean wrote:
>>
... snip ...
>>
The function is reasonable drop-in for fgets() in a non-security,
non-safety critical environment. It means the programmer doesn't
have to worry about buffer size. A malicious user can crash things
by passing a massive line to the function, but we don't all have
to consider that possibility.

That is precisely the purpose of ggets(char **). It is simple and
can't crash. Take a look at:

<http://cbfalconer.home.att.net/download/>

It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 22 '07 #191

Richard Heathfield

Keith Thompson said:

CBFalconer <cb********@yahoo.comwrites:

<snip>

> <http://cbfalconer.home.att.net/download/>

<snip>

I'm sure you dislike the idea of catering to [broken] systems as much
as I do, but you might consider implementing a way to (optionally)
limit the maximum line length, to avoid attempting to allocate a
gigabyte of memory if somebody feeds your program a file with a
gigabyte-long line of text.

Other suggested improvements:

* take a stream parameter, so that the function can be used on streams
other than stdin
* take a pointer to a size_t - if the pointer is non-null, populate the
size_t with the number of bytes read into the string
* allow the buffer to be re-used (which means taking another pointer to
size_t, so that the buffer size can be tracked)

My own (ostensibly similar) routine supports all these features.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 22 '07 #192

CBFalconer

Richard Heathfield wrote:

Keith Thompson said:
>CBFalconer <cb********@yahoo.comwrites:

<snip>

>> <http://cbfalconer.home.att.net/download/ggets.zip>

<snip>

>I'm sure you dislike the idea of catering to [broken] systems as
much as I do, but you might consider implementing a way to
(optionally) limit the maximum line length, to avoid attempting
to allocate a gigabyte of memory if somebody feeds your program
a file with a gigabyte-long line of text.

Other suggested improvements:

* take a stream parameter, so that the function can be used on
streams other than stdin
* take a pointer to a size_t - if the pointer is non-null, populate
the size_t with the number of bytes read into the string
* allow the buffer to be re-used (which means taking another
pointer to size_t, so that the buffer size can be tracked)

My own (ostensibly similar) routine supports all these features.

The first is already handled, since ggets is a macro in ggets.h
operating fggets. The other suggestions show a difference of
philosophy between us. Both routines allow any size input to be
received, but IMHO yours requires the user to think, worry, etc.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 22 '07 #193

CBFalconer

Keith Thompson wrote:

CBFalconer <cb********@yahoo.comwrites:

.... snip ...

>>
That is precisely the purpose of ggets(char **). It is simple and
can't crash. Take a look at:

<http://cbfalconer.home.att.net/download/>

It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.

I would resist any such change. The chances of running into such a
malloc failure are extremely low, especially since the memory is
used as soon as allocated. The idea is to avoid any limits, rather
than make special adjustments for remote possibilities. A maximum
limit would also complicate the error-returning problem.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 22 '07 #194

Richard Heathfield

CBFalconer said:

Keith Thompson wrote:

<snip>

>I'm sure you dislike the idea of catering to such systems as much as
I do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte
of memory if somebody feeds your program a file with a gigabyte-long
line of text.

I would resist any such change. The chances of running into such a
malloc failure are extremely low, especially since the memory is
used as soon as allocated.

On the contrary, they're extremely high, especially since you continue
to plug your routine to every newbie that passes through clc (or at
least, so it seems!) - and it is very likely indeed that they will leak
the memory they acquire through ggets(), causing considerable strain on
the allocation system.

The idea is to avoid any limits, rather
than make special adjustments for remote possibilities.

Then why do you ignore the limit of a newbie's understanding of memory
management, which is likely to be a rather low limit?

A maximum
limit would also complicate the error-returning problem.

It didn't for me.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 22 '07 #195

Richard Heathfield

CBFalconer said:

Richard Heathfield wrote:

<snip>

>Other suggested improvements [to ggets]:

* take a stream parameter, so that the function can be used on
streams other than stdin
* take a pointer to a size_t - if the pointer is non-null, populate
the size_t with the number of bytes read into the string
* allow the buffer to be re-used (which means taking another
pointer to size_t, so that the buffer size can be tracked)

My own (ostensibly similar) routine supports all these features.

The first is already handled, since ggets is a macro in ggets.h
operating fggets.

Fair enough.

The other suggestions show a difference of
philosophy between us. Both routines allow any size input to be
received, but IMHO yours requires the user to think, worry, etc.

My version does indeed require the programmer to think, but then nobody
should be writing programs without thinking.

It is your version, however, that requires the user to worry. :-)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 22 '07 #196

santosh

Richard Heathfield wrote:

CBFalconer said:

>Keith Thompson wrote:

<snip>

>>I'm sure you dislike the idea of catering to such systems as much as
I do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte
of memory if somebody feeds your program a file with a gigabyte-long
line of text.

I would resist any such change. The chances of running into such a
malloc failure are extremely low, especially since the memory is
used as soon as allocated.

On the contrary, they're extremely high, especially since you continue
to plug your routine to every newbie that passes through clc (or at
least, so it seems!) - and it is very likely indeed that they will leak
the memory they acquire through ggets(), causing considerable strain on
the allocation system.

A newbie shouldn't be using dynamically allocated memory anyway, and by the
time they are ready to do so, it might not be appropriate to label them as
unqualified newbies.

>The idea is to avoid any limits, rather
than make special adjustments for remote possibilities.

IMHO, it's not practical to avoid any limits. It's extremely unlikely that
any system can allocate to your program more than 95% of SIZE_MAX.

Aug 22 '07 #197

Keith Thompson

CBFalconer <cb********@yahoo.comwrites:

Keith Thompson wrote:
>CBFalconer <cb********@yahoo.comwrites:
... snip ...

>>>
That is precisely the purpose of ggets(char **). It is simple and
can't crash. Take a look at:

<http://cbfalconer.home.att.net/download/>

It can't crash if malloc and realloc behave properly. It initially
mallocs 112 bytes, then reallocs more space 128 bytes at a time for
long lines.

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it. Of course, by then
it's too late to indicate the failure via the result of malloc, so the
system kills your process -- or, perhaps worse, some other process.

I'm sure you dislike the idea of catering to such systems as much as I
do, but you might consider implementing a way to (optionally) limit
the maximum line length, to avoid attempting to allocate a gigabyte of
memory if somebody feeds your program a file with a gigabyte-long line
of text.

I would resist any such change. The chances of running into such a
malloc failure are extremely low, especially since the memory is
used as soon as allocated. The idea is to avoid any limits, rather
than make special adjustments for remote possibilities. A maximum
limit would also complicate the error-returning problem.

If a malloc failure is so unlikely, why do you bother to check whether
malloc returns a null pointer?

I just tried your test program, "./tggets /dev/zero". The process
grew to over a gigabyte before it died. I don't think it actually
crashed, but it easily could have (I'm not going to try it on a system
that I share with anybody else). (/dev/zero acts as an endless source
of null characters.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 22 '07 #198

Ed Jensen

Keith Thompson <ks***@mib.orgwrote:

But, as we've discussed here before, malloc doesn't behave properly on
all systems. On some systems, malloc can return a non-null result
even if the memory isn't actually available. The memory isn't
actually allocated until you try to write to it.

Wouldn't simply switching to calloc() solve that problem, since it
allocates memory and writes to all of it?

Aug 22 '07 #199

Peter J. Holzer

On 2007-08-21 23:56, pete <pf*****@mindspring.comwrote:

Harald van =?UTF-8?B?RMSzaw==?= wrote:
>>
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
pete wrote:
Harald van =?UTF-8?B?RMSzaw==?= wrote:
Is the correct way to cast the char to unsigned char, or is it to
reinterpret the char as an unsigned char? In other words,

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char line[100];
if (fgets(line, sizeof line, stdin) && strchr(line, '\n')) {
#ifdef MAYBE
char *p;
for (p = line; *p; p++)
*p = toupper((unsigned char) *p);
#else
unsigned char *p;
for (p = (unsigned char *) line; *p; p++)
*p = toupper(*p);
#endif
fputs(line, stdout);
}
return 0;
}

Should MAYBE be defined or undefined
for a correct program?

My feeling is that whether an implementation
uses signed magnitude or one's complement,
to represent negative integers,
shouldn't come into play wtih ctype functions.
I prefer a cast.

Both forms use a cast,
so I'm not completely sure which form you believe is correct.

"cast the char to unsigned char"

Thanks.
Does this also imply that if you want to have helper functions that
operate on arrays of unsigned char (that contain text),
you should not pass them a converted char *,
but you should use an array of unsigned char right
from the start?

As I said in my other post, it's complicated.

I just remembered what it is that I like about casting the values:
It's because putchar works that way.

Yep. But putchar takes an int argument which can represent all possible
values of unsigned char[0]. However, a char may not be able to represent
all values of an unsigned char (indeed on most systems it can't), and
while an assignment from a char to an unsigned char is well defined, the
reverse isn't. So I'm not convinced that:
FILE *fp = fopen(filename, "wb");
putc(200, fp);
putc('\n', fp);
fclose(fp);
fp = fopen(filename, "rb");
char s[3];
fgets(s, sizeof(s), fp);
unsigned char u = s[0];

must result in a value of 200 of u, although I think each step is
conforming. The missing bit is how does convert fgets the data it reads
into chars? 200 isn't representable in an 8 bit signed char, and on a
sign-magnitude or one's-complement system there are two possible ways to
do the conversion: Just re-interprete the bits or do the reverse of the
signed->unsigned conversion, and neither is the "obvious" way. I would
hope that any implementor who has the misfortune to have to target a
system which doesn't use two's complement arithmetic will sidestep the
issue by making the default char type unsigned, but that's probably too
optimistic.

hp
[0] On most systems, and one can argue that this is required for hosted
implementations.
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hj*@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"

Aug 22 '07 #200

Similar topics