By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,980 Members | 937 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,980 IT Pros & Developers. It's quick & easy.

Requesting advice how to clean up C code for validating string represents integer

P: n/a
I'm working on examples of programming in several languages, all
(except PHP) running under CGI so that I can show both the source
files and the actually running of the examples online. The first
set of examples, after decoding the HTML FORM contents, merely
verifies the text within a field to make sure it is a valid
representation of an integer, without any junk thrown in, i.e. it
must satisfy the regular expression: ^ *[-+]?[0-9]+ *$

If the contents of the field are wrong I want to diagnose as much
as reasonable what's wrong, not just say "syntax error".

Because perl and PHP include support for regular expressions, it
was obvious how to do it, and easy to accomplish:
http://www.rawbw.com/~rem/HelloPlus/...html#h4intperl
http://www.rawbw.com/~rem/HelloPlus/....html#h4intphp

Because Common Lisp has good utilities for scanning strings, mostly
using position, position-if, and position-if-not, it was equally
easy, and equally obvious, how to do it:
http://www.rawbw.com/~rem/HelloPlus/...html#h4intlisp

The Java API is missing some of the functions available in Common
Lisp, so I had to augment the API, but then it was as easy as in
Common Lisp, with nearly the same algorithm:
http://www.rawbw.com/~rem/HelloPlus/...html#h4intjava

Now we come to C: I presently have a horrible mess:
http://www.rawbw.com/~rem/HelloPlus/...4s.html#h4intc
I'm thinking of pulling out all the character-case testing into a
function that converts a character into a class-number (such as 1
for space, 2 for digit, 3 for sign, etc.), calling that all over
the place, and the using a SELECT statement on the result, which
won't change the logic of the code but might make it tidier.
Alternately I might hand-code replacements for the Lisp/Java
utilities for scanning strings, or find something in one of the C
libraries that would help, and then translate the Lisp or Java code
to C. Do any of you have any other ideas what I might do to clean
up the C code? Don't write my code for me, but just give hints what
library routines might do 90% of the work for me, or suggest
re-design of the algorithm? One thing I don't want to do is
download a REGEX package for C. I'm trying to give examples of how
to do things from scratch in C, not how to simply use somebody
else's program, even if the source for the REGEX module is
available. If something isn't in the a standard library for C, then
it doesn't exist for the purpose of this project. (The only
exception I made is the module for collecting and decoding HTML
FORM contents, which is a prerequisite for this whole project.)
Feb 11 '07 #1
Share this Question
Share on Google+
232 Replies


P: n/a
On Feb 11, 6:57 am, rem6...@yahoo.com (robert maas, see http://
tinyurl.com/uh3t) was asking about code that:
verifies the text within a field to make sure it is a valid
representation of an integer, without any junk thrown in, i.e. it
must satisfy the regular expression: ^ *[-+]?[0-9]+ *$

If the contents of the field are wrong I want to diagnose as much
as reasonable what's wrong, not just say "syntax error".
[snip]
Do any of you have any other ideas what I might do to clean
up the C code? Don't write my code for me, but just give hints what
library routines might do 90% of the work for me, or suggest
re-design of the algorithm?

You could try something as simple as this:

strtol( string, &end, BASE );
if( *end != '\0' )
fprintf( stderr, "syntax error starting at '%c'\n", *end);

I'm not sure that this gives you as much syntax error
as you want, but it tells you where it occurs. (Also,
this doesn't exactly match your specification, since
this doesn't allow trailing whitespace, but that's
a trivial fix.)

--
Bill Pursell
Feb 11 '07 #2

P: n/a
robert maas, see http://tinyurl.com/uh3t wrote, On 11/02/07 06:57:
I'm working on examples of programming in several languages, all
(except PHP) running under CGI so that I can show both the source
files and the actually running of the examples online. The first
set of examples, after decoding the HTML FORM contents, merely
verifies the text within a field to make sure it is a valid
representation of an integer, without any junk thrown in, i.e. it
must satisfy the regular expression: ^ *[-+]?[0-9]+ *$

If the contents of the field are wrong I want to diagnose as much
as reasonable what's wrong, not just say "syntax error".
<snip>
Now we come to C: I presently have a horrible mess:
http://www.rawbw.com/~rem/HelloPlus/...4s.html#h4intc
<snip>
to C. Do any of you have any other ideas what I might do to clean
up the C code? Don't write my code for me, but just give hints what
library routines might do 90% of the work for me, or suggest
re-design of the algorithm? One thing I don't want to do is
<snip>

Good, we generally prefer to give help rather than do peoples work for
them :-)

I would suggest you look at the strto* functions which are part of
standard C taking specific note of the second and third parameters,
since you want to use both. The second parameter is used to tell you the
first invalid character (or the end of the string if completely valid)
and the last parameter to specify base 10 which is what the user will
expect. These functions will even tell you if the number in the string
is out of range for the type it is converted to. Finally, since there is
no strtoi you will probably have to use strtol and then check if it is
in the range of an int before assigning it to an int.
--
Flash Gordon
Feb 11 '07 #3

P: n/a
2007-02-11 <re***************@yahoo.com>,
robert maas, see http://tinyurl.com/uh3t wrote:
I'm working on examples of programming in several languages, all
(except PHP) running under CGI so that I can show both the source
files and the actually running of the examples online. The first
set of examples, after decoding the HTML FORM contents, merely
verifies the text within a field to make sure it is a valid
representation of an integer, without any junk thrown in, i.e. it
must satisfy the regular expression: ^ *[-+]?[0-9]+ *$
I'd use strtol with a base of 10.

Things to consider:
1. It doesn't care if there's junk after the numbers, but why do you?
You can always examine *endptr.
2. Won't work for converting integers greater than eleventy billion or
however much your system supports. But how do you intend to convert
them otherwise?
Feb 11 '07 #4

P: n/a
"robert maas, see http://tinyurl.com/uh3t" wrote:
>
I'm working on examples of programming in several languages, all
(except PHP) running under CGI so that I can show both the source
files and the actually running of the examples online. The first
set of examples, after decoding the HTML FORM contents, merely
verifies the text within a field to make sure it is a valid
representation of an integer, without any junk thrown in, i.e. it
must satisfy the regular expression: ^ *[-+]?[0-9]+ *$

If the contents of the field are wrong I want to diagnose as much
as reasonable what's wrong, not just say "syntax error".

Because perl and PHP include support for regular expressions, it
was obvious how to do it, and easy to accomplish:
Perl and PHP are off-topic here. Regular expressions are only
topical in reference to code to implement them. In addition, you
RE is wrong. A numeric field ends when the next character cannot
be used, not on a blank. This is easily done in C, see the
following example:. Note that it leaves detection and use of +- to
the calling function, similarly the decision about the termination
char. Note that this parses a stream.

/*--------------------------------------------------------------
* Read an unsigned value. Signal error for overflow or no
* valid number found. Returns 1 for error, 0 for noerror, EOF
* for EOF encountered before parsing a value.
*
* Skip all leading blanks on f. At completion getc(f) will
* return the character terminating the number, which may be \n
* or EOF among others. Barring EOF it will NOT be a digit. The
* combination of error, 0 result, and the next getc returning
* \n indicates that no numerical value was found on the line.
*
* If the user wants to skip all leading white space including
* \n, \f, \v, \r, he should first call "skipwhite(f);"
*
* Peculiarity: This specifically forbids a leading '+' or '-'.
*/
int readxwd(unsigned int *wd, FILE *f)
{
unsigned int value, digit;
int status;
int ch;

#define UWARNLVL (UINT_MAX / 10U)
#define UWARNDIG (UINT_MAX - UWARNLVL * 10U)

value = 0; /* default */
status = 1; /* default error */

ch = ignoreblks(f);

if (EOF == ch) status = EOF;
else if (isdigit(ch)) status = 0; /* digit, no error */

while (isdigit(ch)) {
digit = ch - '0';
if ((value UWARNLVL) ||
((UWARNLVL == value) && (digit UWARNDIG))) {
status = 1; /* overflow */
value -= UWARNLVL;
}
value = 10 * value + digit;
ch = getc(f);
} /* while (ch is a digit) */

*wd = value;
ungetc(ch, f);
return status;
} /* readxwd */

The #includes, skipwhite, and ignoreblks functions are omitted.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Feb 11 '07 #5

P: n/a

"robert maas, see http://tinyurl.com/uh3t" <re*****@yahoo.comwrote
Now we come to C: I presently have a horrible mess:
http://www.rawbw.com/~rem/HelloPlus/...4s.html#h4intc
I'm thinking of pulling out all the character-case testing into a
function that converts a character into a class-number (such as 1
for space, 2 for digit, 3 for sign, etc.), calling that all over
the place, and the using a SELECT statement on the result, which
won't change the logic of the code but might make it tidier.
Alternately I might hand-code replacements for the Lisp/Java
utilities for scanning strings, or find something in one of the C
libraries that would help, and then translate the Lisp or Java code
to C. Do any of you have any other ideas what I might do to clean
up the C code? Don't write my code for me, but just give hints what
library routines might do 90% of the work for me, or suggest
re-design of the algorithm? One thing I don't want to do is
download a REGEX package for C. I'm trying to give examples of how
to do things from scratch in C, not how to simply use somebody
else's program, even if the source for the REGEX module is
available. If something isn't in the a standard library for C, then
it doesn't exist for the purpose of this project. (The only
exception I made is the module for collecting and decoding HTML
FORM contents, which is a prerequisite for this whole project.)
The first thing is to make your interface clean.

If you want to parse from a string, take a block out of strtol's book.

int parseint(char *str, char **end).

Return the integer you read, and the end of the input you pased up to. If
you cannot read an integer successfully, make *end equal str and return
INT_MIN. INT_MIN is much less lilely than 0 or -1 to be confused with a real
integer if you have a lazy caller who doesn't check his end pointer
properly.

skip leading whitespace.
Read the optional +/- character and make sure there aren't two of them.
skip whitepace ?
Read digit one by one into an usigned integer, amd multiply by ten if there
are more digits to come. Terminate if the unsigned overflows.
Check for INT_MAX or -INT_MIN if the negative flag is set. Terminate on
overflow.
Convert to a signed integer.
Your spec now says to skip trailing whitespace. Probably a bad idea, but if
the instructions say do it we must do it.
Set the end pointer to end of input on success, input on fail.
Return answer on success, INT_MIN on fail.
Feb 11 '07 #6

P: n/a
From: Random832 <ran...@random.yi.org>
I'd use strtol with a base of 10.
Several people suggested that, but you made some additional
comments I want to reply to, so I'm responding here.
Things to consider:
1. It doesn't care if there's junk after the numbers, but why do you?
This is for processing a HTML FORM filled out by a user, a typical
user who is a total novice at computers yet is trying all sorts of
things found on the Web. If the form asks for an integer to be
entered, but the user enters something else, like two integers, or
an algebraic formula which just happens to start with an integer,
or a floating-point value or decimal fraction, or a fraction, I
don't want to just gobble the first part and ignore all the rest,
because obviously the luser didn't understand/follow instructions.
If I just process the first part and ignore the rest, the luser
will be totally confused why he/she didn't get the intended effect.
Better that I complain about the slightest mess in the input field.
You can always examine *endptr.
Per a nice example I found on the Web:
Linkname: Bullet Proof Integer Input Using strtol()
URL: http://home.att.net/~jackklein/c/code/strtol.html
I'm indeed now checking for any diagnostics that can be obtained
just from the results returned by strtol (the actual return value,
the global error flag, and the reference pointer endptr). See end
of this message for the code as I have it now.
2. Won't work for converting integers greater than eleventy billion or
however much your system supports. But how do you intend to convert
them otherwise?
Good point. My previous idea was for the user to get just the
syntax correct for integers, and then if the result is mangled it
obviously means this particular programming language (c, c++, java)
is using fixed-length binary integers, whereas if the result is
always correct no matter how many digits are given, then the
language (lisp) is using unlimited-precision integers. But if it's
easy to diagnose explicitly, such as provided by strtol, then
perhaps I can actually tell the user when overflow happens, to make
the lesson a bit less obscure.

Anyway, using strtol, with all the possible tests on the result:

All of these produce the correct diagnosis (note 15-char buffer for input):

Type a number:555555555555555555
You typed: [55555555555555]
Number out of range.

Type a number:2147000000
Dropping EOL char from end of string.
You typed: [2147000000]
Looks good? N=2147000000

Type a number:2148000000
Dropping EOL char from end of string.
You typed: [2148000000]
Number out of range.

Type a number:
Dropping EOL char from end of string.
You typed: []
No number given.

Type a number:5x
Dropping EOL char from end of string.
You typed: [5x]
After number, extra characters on input line.

But these are not the effects I want:

Type a number:x5
Dropping EOL char from end of string.
You typed: [x5]
No number given.

Type a number:- 5
Dropping EOL char from end of string.
You typed: [- 5]
No number given.

There *is* a number given in each case, just that there's junk
before the number in the first case, and gap between sign and
number in second case. It seems I'll need to manually scan from the
start of the field to the start of the number to distinguish these
patterns (brackets indicate optional):
[white] junk [white] sign [white] digits -- junk before start of number
[white] sign white digits -- gap between sign and number
[white] sign junk digits -- junk (or gap) between sign and number
[white] sign digits -- good
[white] digits -- good
strtol doesn't seem to be helping me diagnose the cruft before the number.
Listing of source code used for the above tests:

#include <stdio.h>
#include <errno.h>

#define MAXCH 15
/* Deliberately small buffer to test buffer-full condition */

main() {
char chars[MAXCH]; char* inres; /* Set by fgets */
size_t len; /* Set by strlen */
char onech;
char* endptr; long long_var; /* Set by strtol */
while (1) {
fpurge(stdin);
printf("\nType a number:");
inres = fgets(chars, MAXCH, stdin);
if (NULL==inres) {
printf("*** Got NULL back, which maybe means end-of-stream?\n");
break;
}
len = strlen(chars);
/* printf("Length of string = %d\n", len); */
if (0 >= len) {
printf("Horrible: Input was 0 chars, not even EOL char, how??\n");
break;
}
onech = chars[len-1];
/* printf("The last character is [%c]\n", onech); */
if ('\n' == onech) {
printf("Dropping EOL char from end of string.\n");
chars[len-1] = '\0';
}
printf("You typed: [%s]\n", inres, NULL, inres);
errno = 0;
long_var = strtol(chars, &endptr, 10);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number given.\n");
} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");
} else {
printf("Looks good? N=%ld\n", long_var);
}
sleep(1);
}
}
Feb 11 '07 #7

P: n/a
robert maas, see http://tinyurl.com/uh3t wrote, On 11/02/07 19:13:
>From: Random832 <ran...@random.yi.org>
I'd use strtol with a base of 10.

Several people suggested that, but you made some additional
comments I want to reply to, so I'm responding here.
>Things to consider:
1. It doesn't care if there's junk after the numbers, but why do you?
<snip comments about detecting bad input that happens to also contain a
number>
Better that I complain about the slightest mess in the input field.
That the the correct attitude for handling user input.
> You can always examine *endptr.

Per a nice example I found on the Web:
Linkname: Bullet Proof Integer Input Using strtol()
URL: http://home.att.net/~jackklein/c/code/strtol.html
I'm indeed now checking for any diagnostics that can be obtained
just from the results returned by strtol (the actual return value,
the global error flag, and the reference pointer endptr). See end
of this message for the code as I have it now.
Jack Klein knows his stuff. You have found a good reference.
>2. Won't work for converting integers greater than eleventy billion or
however much your system supports. But how do you intend to convert
them otherwise?

Good point. My previous idea was for the user to get just the
syntax correct for integers, and then if the result is mangled it
obviously means this particular programming language (c, c++, java)
is using fixed-length binary integers, whereas if the result is
always correct no matter how many digits are given, then the
language (lisp) is using unlimited-precision integers.
With C and C++ assuming that bad input will lead to obviously bad output
is not in general a good idea since in far too many situations it will
produce something that is not obviously bad.
But if it's
easy to diagnose explicitly, such as provided by strtol, then
perhaps I can actually tell the user when overflow happens, to make
the lesson a bit less obscure.
OK, that's good.

<snip>
But these are not the effects I want:

Type a number:x5
Dropping EOL char from end of string.
You typed: [x5]
No number given.

Type a number:- 5
Dropping EOL char from end of string.
You typed: [- 5]
No number given.

There *is* a number given in each case, just that there's junk
before the number in the first case, and gap between sign and
number in second case. It seems I'll need to manually scan from the
start of the field to the start of the number to distinguish these
patterns (brackets indicate optional):
[white] junk [white] sign [white] digits -- junk before start of number
[white] sign white digits -- gap between sign and number
[white] sign junk digits -- junk (or gap) between sign and number
Yes, you need to check for the above yourself if you want to report
them. strtol will only indicate that it the first non-space character
was invalid, not whether there was something valid further in.
[white] sign digits -- good
[white] digits -- good
The above, of course, are indicated by strtol succeeding ;-)
strtol doesn't seem to be helping me diagnose the cruft before the number.
Listing of source code used for the above tests:

#include <stdio.h>
#include <errno.h>
#include <stdlib.h/* For strtol. Very important since otherwise the
compiler is *required* to assume it returns an int not a long. */
#define MAXCH 15
/* Deliberately small buffer to test buffer-full condition */

main() {
Since no one has mentioned it yet I will. The above, whilst legal in the
original C standard, is bad style and no longer supported in the
new(ish) C standard that might one day become commonly implemented.
Don't use implicit and if you don't want parameters be explicit about it.

int main(void) {
char chars[MAXCH]; char* inres; /* Set by fgets */
size_t len; /* Set by strlen */
char onech;
char* endptr; long long_var; /* Set by strtol */
while (1) {
I prefer 'for (;;)' but that is purely a matter of style.
fpurge(stdin);
Standard C does not have an "fpurge" function or anything similar to
what I am guessing it does.
printf("\nType a number:");
As per Jack's example you need to flush stdout (or have a \n at the end
of the above line). There is also an argument that using puts (which
outputs a newline after the specified text) or fputs would be better
since they do not scan the string for format specifiers.
inres = fgets(chars, MAXCH, stdin);
Since chars is an array rather than a pointer you could use:
inres = fgets(chars, sizeof chars, stdin);
if (NULL==inres) {
printf("*** Got NULL back, which maybe means end-of-stream?\n");
It is end of stream or an error.
break;
}
len = strlen(chars);
/* printf("Length of string = %d\n", len); */
if (0 >= len) {
len cannot be negative or even 0 here for at least three reasons. It is
of type size_t which is unsigned and also strlen returns a size_t. The
third reason is that fgets reads until it either has enough to fill the
buffer (allowing space for the nul termination), until error or end of
stream, or up to and including the newline, which ever comes first. So
given a buffer length of 2 or more it will *always* either return NULL
or it will have written a string with a strlen of at least 1. So this if
cannot be taken.
printf("Horrible: Input was 0 chars, not even EOL char, how??\n");
break;
}
onech = chars[len-1];
/* printf("The last character is [%c]\n", onech); */
if ('\n' == onech) {
printf("Dropping EOL char from end of string.\n");
chars[len-1] = '\0';
}
else {
report that the line entered was too long and then probably read the
rest of the line up to and including the next newline.
}
printf("You typed: [%s]\n", inres, NULL, inres);
errno = 0;
long_var = strtol(chars, &endptr, 10);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
At this point you could scan from the start of the string for the first
character that is not white space and report a different error depending
on what it is using the is* functions from ctype.h. Alternatively you
could look at using strspn or strcspn from string.h
printf("No number given.\n");
} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");
} else {
printf("Looks good? N=%ld\n", long_var);
}
sleep(1);
sleep is not a standard function and seems rather pointless in this program.
}
}
--
Flash Gordon
Feb 11 '07 #8

P: n/a
From: Flash Gordon <s...@flash-gordon.me.uk>
sleep(1);
sleep is not a standard function and seems rather pointless in this program.
It's absolutely essential for peace of mind when dialed into a Unix
shell with VT100 emulator at 19200 baud. The first time I ran this
program, without the sleep call, and pressed ctrl-D to generate
end-of-stream on stdin, the program went into infinite read-EOS
spew-text loop, which filled up all modem buffers. I immediately
pressed ctrl-C to abort C program, and held it down for about ten
seconds, but it was too late, modem buffers were grossly full. I
then pressed ctrl-Z and held that down for several minutes, but
modem buffers were still spewing to the VT100 emulator. I then
scrolled to the top of the past-screens buffer to see if I could
save anything, but it was already too late, all the past-screens
buffer (appx. 30-40 full VT100 screensfull) had already been
overwritten by the spew. I then waited about ten minutes, watching
spew spew spew incessantly, with no way to know whether the program
had even seen my ctrl-C interrupt. Finally after ten minutes or so
I finally saw a shell prompt. I immediately put in the sleep before
any further work on the program. Now if it gets into an infinite
loop, I press ctrl-C and get instant response because there's no
ten minutes of spew already in the modem buffer.

I copied a few cleanup suggestions from your message and will be
responding about them later.
Feb 12 '07 #9

P: n/a
On 11 Feb, 06:57, rem6...@yahoo.com (robert maas, see http://
tinyurl.com/uh3t) wrote:

<snip>

[the program]
verifies the text within a field to make sure it is a valid
representation of an integer, without any junk thrown in, i.e. it
must satisfy the regular expression: ^ *[-+]?[0-9]+ *$

If the contents of the field are wrong I want to diagnose as much
as reasonable what's wrong, not just say "syntax error".
<snip>
Alternately I might hand-code replacements for the Lisp/Java
utilities for scanning strings, or find something in one of the C
libraries that would help,
if it was anything other than a number then sscanf() might
be worth a look.

<snip>
--
Nick Keighley


Feb 12 '07 #10

P: n/a
robert maas, see http://tinyurl.com/uh3t wrote, On 12/02/07 04:48:
>From: Flash Gordon <s...@flash-gordon.me.uk>
>> sleep(1);
sleep is not a standard function and seems rather pointless in this program.

It's absolutely essential for peace of mind when dialed into a Unix
shell with VT100 emulator at 19200 baud. The first time I ran this
program, without the sleep call, and pressed ctrl-D to generate
end-of-stream on stdin, the program went into infinite read-EOS
spew-text loop, which filled up all modem buffers. I immediately
<snip>

I can only suggest that you had some other bug in your program at that
point or a but in your modem. As presented your program would not do
that whether it detected an error or EOF it would break out of the loop
and terminate.

Having said that, I can see that if you are hitting that sort of problem
that a delay could be useful.
I copied a few cleanup suggestions from your message and will be
responding about them later.
OK.
--
Flash Gordon
Feb 12 '07 #11

P: n/a
From: Flash Gordon <s...@flash-gordon.me.uk>
... The first time I ran this
program, without the sleep call, and pressed ctrl-D to generate
end-of-stream on stdin, the program went into infinite read-EOS
spew-text loop, which filled up all modem buffers. ...
I can only suggest that you had some other bug in your program at that
point or a but in your modem. As presented your program would not do
that whether it detected an error or EOF it would break out of the loop
and terminate.
Not a bug. It's just that the part of the program to detect EOF wasn't yet
written, and that's the very part I was trying to develop.
Step 1: Put in a printf to see what value comes back when I press ctrl-D.
Step 2: Write code to detect that value and break out of loop.
Step 3: Test that to see whether it works.
Step 4: Remove the printf.
Unfortunately step 1 blew me out for ten minutes or so without the sleep.

Unfortunately c doesn't allow any sleep times except integers. I
looked at nanosecond sleep but it requires loading a special module
and building a special nanosecond object and then loading a number
into that object before you can then pass that object to some OO
method that does the actual sleep, a royal pain if it's just to
prevent spew from filling up modem buffers on dialups. The amount
of time I'd waste learning how to do all that would be worse than
the amount of time I waste having a full one-second sleep at each
interactive I/O transaction in the loop during the development of
this code destinded for CGI where there's a completely different
logic for interactive transactions and no chance for spew hence no
need for the sleep.

Anyway, here's the latest news on my task:

While searching various clues the kind folks here sent me, I
discovered some library functions (strspn, strcspn) which are
useful for skipping across whole classes of characters or
complements of such classes, similar to the functions I implemented
in Java (explicitly) and in Common Lisp (via anonymous-function
parameters). That made it possible to translate my lisp/java
algorithms directly to c.

I decided to completely separate the code for checking general
integer syntax [white]* [sign]? [digit]+ [white]* (pseudo-regex
notation), which is independent of the programming language (except
Java where plus sign isn't allowed in integer literals or string to
parseInt), from the petty code to check whether the resultant value
is within the allowed range for this or that fixed-precision data
type in this or that programming language as implemented by this or
that vendor.

So I have one function, stringCheckInteger, which checks whether
the string is of the appropriate general format, making liberal use
of strspn and strcspn, and another function, stringIntegerTellRange,
which checks whether the string-number can be converted to an
actual number by strtoll, and if so then also checks whether it's
within ranges of the successively smaller integer data types. I
think this is my final c version for the time being.
If anyone is curious, see:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intc>
go to the second form ("re-write").
Feb 13 '07 #12

P: n/a
robert maas, see http://tinyurl.com/uh3t wrote, On 13/02/07 04:24:
>From: Flash Gordon <s...@flash-gordon.me.uk>
>>... The first time I ran this
program, without the sleep call, and pressed ctrl-D to generate
end-of-stream on stdin, the program went into infinite read-EOS
spew-text loop, which filled up all modem buffers. ...
I can only suggest that you had some other bug in your program at that
point or a but in your modem. As presented your program would not do
that whether it detected an error or EOF it would break out of the loop
and terminate.

Not a bug. It's just that the part of the program to detect EOF wasn't yet
written, and that's the very part I was trying to develop.
So it still was not needed in the program you posted.
Step 1: Put in a printf to see what value comes back when I press ctrl-D.
Step 2: Write code to detect that value and break out of loop.
Step 3: Test that to see whether it works.
Step 4: Remove the printf.
Unfortunately step 1 blew me out for ten minutes or so without the sleep.
That is because it is the wrong approach
1) read the documentation to see what the correct way to do it is
2) write the code
3) test it

Fewer steps and more likely to give you a reliable result.

If you used your method with "isspace" it might lead you to think it
returns 1 to indicate a space, then due to a library upgrade your code
could break because actually it returns any non-zero value for a space.
Unfortunately c doesn't allow any sleep times except integers. I
Wrong. C does not allow *any* sleeping. The slepp function is *not* part
of C it is part of something else your system provides and makes
accessible from C as an extension.

<snip>
think this is my final c version for the time being.
If anyone is curious, see:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intc>
go to the second form ("re-write").
I may or may not look later.
--
Flash Gordon
Feb 13 '07 #13

P: n/a
Flash Gordon said:
robert maas, see http://tinyurl.com/uh3t wrote, On 13/02/07 04:24:
<snip>
>
>Unfortunately c doesn't allow any sleep times except integers. I

Wrong. C does not allow *any* sleeping.
Wrong. C does *allow* sleeping. It just doesn't *support* it.
The [sleep] function is *not* part of C
Arguable. It's not defined by the Standard, I agree. But what is a
language, if not the set of all sentences that can be formed according
to the rules of that language? It is certainly possible to call a
function named sleep(), within the rules of C.

Incidentally, I am not arguing that sleep() is topical.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Feb 13 '07 #14

P: n/a
"robert maas, see http://tinyurl.com/uh3t" wrote:
>
.... snip ...
>
Not a bug. It's just that the part of the program to detect EOF
wasn't yet written, and that's the very part I was trying to
develop.
Step 1: Put in a printf to see what value comes back when I press
ctrl-D.
What for? You have a macro called EOF available. Use it.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Feb 13 '07 #15

P: n/a
Richard Heathfield wrote, On 13/02/07 10:36:
Flash Gordon said:
>robert maas, see http://tinyurl.com/uh3t wrote, On 13/02/07 04:24:
<snip>
>>Unfortunately c doesn't allow any sleep times except integers. I
Wrong. C does not allow *any* sleeping.

Wrong. C does *allow* sleeping. It just doesn't *support* it.
If you want to argue it that way the OP is still wrong. Since if C
allows it then it certainly does not prevent the sleep times from being
double or anything else.
>The [sleep] function is *not* part of C

Arguable. It's not defined by the Standard, I agree. But what is a
language, if not the set of all sentences that can be formed according
to the rules of that language? It is certainly possible to call a
function named sleep(), within the rules of C.
Yes, and the rules of C allow the sleep function to take a double.
Incidentally, I am not arguing that sleep() is topical.
Indeed. You are arguing terminology and I don't have any problem with
yours. I was just continuing using the terminology the OP used which was
possibly wrong of me. However, my original comment about the use of
sleep was simply that it was not a standard function and seemed
pointless in the code presented, the OP appeared not to have understood
that point based on talking about C only allowing integer sleep times.

It is important for the OP to realise that the sleep function s/he is
using is not one provided by the C language but one provided by his
specific implementation (and a number of other implementations, but not
even all implementations for common desktops).
--
Flash Gordon
Feb 13 '07 #16

P: n/a
On Sun, 11 Feb 2007 11:13:34 -0800, robert maas, wrote:
>Per a nice example I found on the Web:
Linkname: Bullet Proof Integer Input Using strtol()
URL: http://home.att.net/~jackklein/c/code/strtol.html
The linked code does not reflect the current C Standard:
"If the correct value is outside the range of representable values,
LONG_MIN, LONG_MAX ... is returned ... and the value of the macro
ERANGE is stored in errno."

Best regards,
Roland Pibinger
Feb 13 '07 #17

P: n/a
On Sun, 11 Feb 2007 22:26:27 +0000, Flash Gordon wrote:
>robert maas, see http://tinyurl.com/uh3t wrote, On 11/02/07 19:13:
> errno = 0;
long_var = strtol(chars, &endptr, 10);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {

At this point you could scan from the start of the string for the first
character that is not white space and report a different error depending
on what it is using the is* functions from ctype.h. Alternatively you
could look at using strspn or strcspn from string.h
You consider leading whitespace an error?
>
> printf("No number given.\n");
} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");
} else {
printf("Looks good? N=%ld\n", long_var);
}
sleep(1);
IMO, the last part of the function should look like the following:

errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number or not parsable number given.\n");
} else if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else if (endptr != chars) {
printf("After number, extra characters on input line.\n");
} else {
printf("Unknown error, should never happen.\n");
}

Best regards,
Roland Pibinger
Feb 13 '07 #18

P: n/a
Roland Pibinger wrote, On 13/02/07 14:32:
On Sun, 11 Feb 2007 11:13:34 -0800, robert maas, wrote:
>Per a nice example I found on the Web:
Linkname: Bullet Proof Integer Input Using strtol()
URL: http://home.att.net/~jackklein/c/code/strtol.html

The linked code does not reflect the current C Standard:
"If the correct value is outside the range of representable values,
LONG_MIN, LONG_MAX ... is returned ... and the value of the macro
ERANGE is stored in errno."
Looks like it allows for that to me. It includes:
if (ERANGE == errno)
{
puts("number out of range\n");
}

Admittedly it does not separate out positive and negative out of range,
but that information is mentioned in the text.
--
Flash Gordon
Feb 13 '07 #19

P: n/a
Roland Pibinger wrote, On 13/02/07 15:45:
On Sun, 11 Feb 2007 22:26:27 +0000, Flash Gordon wrote:
>robert maas, see http://tinyurl.com/uh3t wrote, On 11/02/07 19:13:
>> errno = 0;
long_var = strtol(chars, &endptr, 10);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
At this point you could scan from the start of the string for the first
character that is not white space and report a different error depending
on what it is using the is* functions from ctype.h. Alternatively you
could look at using strspn or strcspn from string.h

You consider leading whitespace an error?
Not in this case. Since the OP wanted more specific errors I suggested
scanning for the first non-whitespace character to allow identification
of the character that caused the failure.
>> printf("No number given.\n");
} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");
} else {
printf("Looks good? N=%ld\n", long_var);
}
sleep(1);

IMO, the last part of the function should look like the following:

errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number or not parsable number given.\n");
The OP wanted to be more specific in error reporting hence my suggesting
ways of analysing this further.
} else if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else if (endptr != chars) {
You have already trapped the case when endptr==chars above, so you know
that endptr!=chars if you reach here so I would consider the above test
to be a sign of the coder having not understood what s/he was writing.
printf("After number, extra characters on input line.\n");
} else {
printf("Unknown error, should never happen.\n");
It is guaranteed not to happen!
}
--
Flash Gordon
Feb 13 '07 #20

P: n/a
On Tue, 13 Feb 2007 18:01:04 +0000, Flash Gordon wrote:
>Roland Pibinger wrote, On 13/02/07 15:45:
>IMO, the last part of the function should look like the following:

errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number or not parsable number given.\n");

The OP wanted to be more specific in error reporting hence my suggesting
ways of analysing this further.
>} else if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else if (endptr != chars) {

You have already trapped the case when endptr==chars above, so you know
that endptr!=chars if you reach here so I would consider the above test
to be a sign of the coder having not understood what s/he was writing.
.... or who wants to make explicit which condition is tested instead of
using a 'catch-all' else block. Since errno, endptr, chars and *endptr
are used in the if statements it's not so easy to correspond those
comparisons to the relevant parts or the strtol specification.
> printf("After number, extra characters on input line.\n");
} else {
printf("Unknown error, should never happen.\n");

It is guaranteed not to happen!
I'll replace the line with assert(0).

Best regards,
Roland Pibinger
Feb 13 '07 #21

P: n/a
Roland Pibinger wrote, On 13/02/07 19:04:
On Tue, 13 Feb 2007 18:01:04 +0000, Flash Gordon wrote:
>Roland Pibinger wrote, On 13/02/07 15:45:
>>IMO, the last part of the function should look like the following:

errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number or not parsable number given.\n");
The OP wanted to be more specific in error reporting hence my suggesting
ways of analysing this further.
>>} else if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else if (endptr != chars) {
You have already trapped the case when endptr==chars above, so you know
that endptr!=chars if you reach here so I would consider the above test
to be a sign of the coder having not understood what s/he was writing.

... or who wants to make explicit which condition is tested instead of
using a 'catch-all' else block.
Then why isn't it
else if (endptr != chars && '\0' != *endptr && errno !- ERANGE)
Since errno, endptr, chars and *endptr
are used in the if statements it's not so easy to correspond those
comparisons to the relevant parts or the strtol specification.
It is very easy. It is even easier to see that you you have a redundant
if because you have already checked for the opposite condition and your
final if only muddies the waters.

I can see no good reason to test for a COND and !COND in a simple if
chain such as this.
>> printf("After number, extra characters on input line.\n");
} else {
printf("Unknown error, should never happen.\n");
It is guaranteed not to happen!

I'll replace the line with assert(0).
Slightly better would be to get rid of the last if above and just use an
else and then put an appropriate assert in the else clause. However, if
you are going to assert anything at all there is still the question why
you don't assert everything.
--
Flash Gordon
Feb 13 '07 #22

P: n/a
From: Flash Gordon <s...@flash-gordon.me.uk>
>... The first time I ran this
program, without the sleep call, and pressed ctrl-D to generate
end-of-stream on stdin, the program went into infinite read-EOS
spew-text loop, which filled up all modem buffers. ...
I can only suggest that you had some other bug in your program at that
point or a but in your modem. As presented your program would not do
that whether it detected an error or EOF it would break out of the loop
and terminate.
Not a bug. It's just that the part of the program to detect EOF wasn't yet
written, and that's the very part I was trying to develop.
So it still was not needed in the program you posted.
That depends on how you think of the program. If it had been
intended as a standalone program to distribute to others, then the
sleep could be regarded as a "dunzell" (StarTrek TOS jargon), i.e.
a part that serves no useful function. However in fact it was just
a test rig to develop modules which would later be installed
primarily in a CGI environment (where the toplevel stdin test loop
would not be present at all). As a test rig, where I might at any
time add new buggy code that might produce infinite spew, whereby
I'd need protection from modem-buffer disaster, it was quite
appropriate for the sleep to be in the toplevel loop at all times.
What was posted was just the current version of that test rig at
the moment I posted. But in fact that sleep would be present in
*any* version of that test rig at any time after I encountered the
modem-buffer disaster and consequently took precautions against it
ever happening again in any version of that test rig or any other
test rig descended from it.

If anyone happens to like my program enough to copy it and use it
themselves, but doesn't like the sleep in it, feel free to remove
it, but then don't complain to me if you subsequently try to modify
the program in other ways and introduce a bug and fill up your
modem buffers or even worse fill up all free swap space on your PC
and crash the OS and can't re-boot. (YMMV)
Step 1: Put in a printf to see what value comes back when I press ctrl-D.
Step 2: Write code to detect that value and break out of loop.
Step 3: Test that to see whether it works.
Step 4: Remove the printf.
Unfortunately step 1 blew me out for ten minutes or so without the sleep.
That is because it is the wrong approach
1) read the documentation to see what the correct way to do it is
2) write the code
3) test it
That's not good development technique. Documentation often is
misunderstood. If your approach is followed, your program might
have a subtle bug where you're not getting the value you thought
you're getting but you have the test written backwards or otherwise
wrong and for the cases you tested your multiple mistakes are
covering for each other making the program "work" despite being
totally wrongly written.

It's best to read the documentation (as I did, but did't include in
the steps of actual program development, sorry if you assumed
contrary to fact), and the install both the call to whatever
library routine *and* a printf of the return value, then look at
the output to see if it conforms to how you read the documentation
to mean, and if so then proceed to write the test on that basis.
But if the return value doesn't agree with what you thought the
documentation said, you need to consider various alternatives:
- You aren't calling the correct function because you loaded the
wrong library.
- You are calling the correct function in the wrong way (as
happened to me the first time I tried strtoll, see other thread).
- You misunderstood the documentation.

Once you are sure the function returns the value you expect in all
test cases that cover in-range out-of-range cases as well as
carefully constructed right-at-edge-of-range cases, if any of that
makes sense for the given fuction, *then* it's time to write the
test to distinguish between the various classes of results as you
now *correctly* understand them based on agreement between your
reading of documentation and your live tests.

So in this case, calling fgets, I needed to test all these cases:
- Empty input: NonNull return value, Buffer contains EOL NUL
- Normal input: NonNull return value, Buffer contains chars EOL NUL
- Input that overruns buffer: NonNull return value, Buffer contains chars NUL
- Abort via end-of-stream generated via ctrl-D: NULL return value.
- Abort via ctrl-C: Program aborts to shell all by itself.
- Abort via escape (a.k.a. altmode): Goes in as garbage screwup character, avoid.
One case (buffer overrun) I really needed to see for myself,
because the documentation didn't make it clear whether fgets would
omit the NUL so it could fill the entire buffer with data to cram
it all in and not lose that last character, or truncate the data
one shorter to guarantee a NUL was there. In fact the latter
occurs. But I was prepared to force a NUL there, overwriting the
last byte, if fgets had done the first instead. One thing I *did*
have to do is check whether the last character before the NUL was
EOL or not, and clobber it to NUL (shortening string by one
character per c's NUL-terminated-byte-array convention for
emulating "strings") only if it was EOL, so the string seen by the
rest of the program would consistently *not* have the EOL
character.

Now not all those cases are actually necessary for the end purpose
of this program, developing a module intended for CGI usage, but
it's nice to know my how the basic terminal-input routine of my
stdin test rig performs for *all* inputs before making extensive
use of it for *anything*. I don't want confusion later where I
don't know whether strange results are due to bug in test rig or
bug in the actual module I'm trying to develop.
Unfortunately c doesn't allow any sleep times except integers. ...
Wrong. C does not allow *any* sleeping. ...
Let me re-phrase that: The sleep function provided by the library
whose header is unistd.h doesn't allow any sleep times except
integers. Now are you happy?
Feb 14 '07 #23

P: n/a
From: CBFalconer <cbfalco...@yahoo.com>
Step 1: Put in a printf to see what value comes back when I press
ctrl-D.
What for? You have a macro called EOF available. Use it.
Even if I were to take advice, I'd *still* as a first step put in a
printf to tell me the value of EOF and also the returned value so I
can see if they really are the same when I press ctrl-D.

But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.

It doesn't sound like comparing the return value with EOF is a good
way to diagnose what really happened. If I ever decide this needs
fixing, I'll fix it by checking both feof and ferror, not by
comparing with EOF. (Or compare with EOF as a first pass, then if
that matches go ahead and check both feof and ferror to see which
sub-case applies.)
Feb 14 '07 #24

P: n/a
From: rpbg...@yahoo.com (Roland Pibinger)
The linked code does not reflect the current C Standard:
"If the correct value is outside the range of representable values,
LONG_MIN, LONG_MAX ... is returned ... and the value of the macro
ERANGE is stored in errno."
Hmm, indeed I seem to recall seeing that in the specs as I was
researching this before deciding to use strtoll instead. I'll have
to take a look at that someday when I have time.
Feb 14 '07 #25

P: n/a
From: rpbg...@yahoo.com (Roland Pibinger)
IMO, the last part of the function should look like the following:
errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number or not parsable number given.\n");
} else if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else if (endptr != chars) {
printf("After number, extra characters on input line.\n");
} else {
printf("Unknown error, should never happen.\n");
}
Before I started this task, I made a design desision that
whitespace before and/or after the number is fine, but any other
stray character not part of the [optionalSign] oneOrMoreDigits is
an error. Your advice is inconsistent with the part of the decision
whereby trailing whitespace is fine.

Part of my decision was that whitespace allowance should be
symmetric. It should be allowed before iff allowed after. strtol is
assymtric in this respect, allowing whitespace before (and
rejecting stray non-white text before), but failing to distinguish
between trailing whitespace (OK) and trailing junk (Not OK), either
rejecting both (if caller checks to make sure the final pointer
matches end of string), or accepting both (if caller doesn't make
that check).

There's so much that strtol fails to check the way I want, that
it's best to just not use it at all for preliminary syntax
checking, so I ended up writing my own code, which first version
was ugly, but second version is pretty clean, making liberal use of
strspn and strcspn, which I didn't know about until after I had
already written that ugly first version (and translated it to
equally ugly c++), and then gone ahead to write clean lisp and java
versions, and then also gone ahead to write regex stuff for perl
and PHP, and finally I came back to look at the ugly C to see if I
might make it less ugly.

Your advice to use strtol to do the preliminary syntax check wasn't
good, but in an indirect way it helped, because searching for
documentation for strtol accidently turned up the documentation for
strtoll and for strspn and strcspn.
Feb 14 '07 #26

P: n/a
"robert maas, see http://tinyurl.com/uh3t" wrote:
>From: CBFalconer <cbfalco...@yahoo.com>
>>Step 1: Put in a printf to see what value comes back when I press
ctrl-D.

What for? You have a macro called EOF available. Use it.

Even if I were to take advice, I'd *still* as a first step put in a
printf to tell me the value of EOF and also the returned value so I
can see if they really are the same when I press ctrl-D.
You NEVER need to know the value of EOF. You simply need to know
that it is negative, and outside the range of char, especially
unsigned char. This is why you usually receive chars in an int.
>
But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>

Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.
WRONG. Those functions are to distinguish between error and
physical EOF when some input routine actually returns EOF. By the
time feof has shown up it is too late to control use of the input
data. C is unlike Pascal in this respect.

BTW, please do not strip attributions for material you quote.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Feb 14 '07 #27

P: n/a
"robert maas, see http://tinyurl.com/uh3t" wrote:
>
.... snip ...
>
Part of my decision was that whitespace allowance should be
symmetric. It should be allowed before iff allowed after. strtol
is assymtric in this respect, allowing whitespace before (and
rejecting stray non-white text before), but failing to distinguish
between trailing whitespace (OK) and trailing junk (Not OK),
either rejecting both (if caller checks to make sure the final
pointer matches end of string), or accepting both (if caller
doesn't make that check).
Not so. The returned value of endptr simply allows the user to
make that decision for himself.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Feb 14 '07 #28

P: n/a
robert maas, see http://tinyurl.com/uh3t said:
>From: Flash Gordon <s...@flash-gordon.me.uk>
>[...] C does not allow *any* sleeping. ...

Let me re-phrase that: The sleep function provided by the library
whose header is unistd.h doesn't allow any sleep times except
integers. Now are you happy?
"If something isn't in the a standard library for C, then it doesn't
exist for the purpose of this project." - robert maas, in the article
starting this thread.

<unistd.his not a standard header, and none of the functions for which
it is required to be included are in the standard library. Therefore,
by your own argument, the sleep function you are talking about does not
exist.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Feb 14 '07 #29

P: n/a
robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 01:41:
>From: Flash Gordon <s...@flash-gordon.me.uk>
>>>>... The first time I ran this
program, without the sleep call, and pressed ctrl-D to generate
end-of-stream on stdin, the program went into infinite read-EOS
spew-text loop, which filled up all modem buffers. ...
I can only suggest that you had some other bug in your program at that
point or a but in your modem. As presented your program would not do
that whether it detected an error or EOF it would break out of the loop
and terminate.
Not a bug. It's just that the part of the program to detect EOF wasn't yet
written, and that's the very part I was trying to develop.
So it still was not needed in the program you posted.

That depends on how you think of the program. If it had been
<snip>

I think of programs as presented. As presented there was no reason for
the sleep.
If anyone happens to like my program enough to copy it and use it
themselves, but doesn't like the sleep in it, feel free to remove
it, but then don't complain to me if you subsequently try to modify
the program in other ways and introduce a bug and fill up your
modem buffers or even worse fill up all free swap space on your PC
and crash the OS and can't re-boot. (YMMV)
None of those would give me a problem. Even if it was possible for one
of those to give me a problem I would not need the sleep function.

You might want to find out how to use a debugger on your system, then
you can step through the code when you are not sure about it as part of
your testing.
>>Step 1: Put in a printf to see what value comes back when I press ctrl-D.
Step 2: Write code to detect that value and break out of loop.
Step 3: Test that to see whether it works.
Step 4: Remove the printf.
Unfortunately step 1 blew me out for ten minutes or so without the sleep.
That is because it is the wrong approach
1) read the documentation to see what the correct way to do it is
2) write the code
3) test it

That's not good development technique.
True, I should have included some earlier steps such as analysing the
requirements & designing the software.
Documentation often is
misunderstood.
My experience if that the above applies to people who thing that
experimenting with a function is a good way to find out about it. It
does not in my experience apply to those who believe the best way to
find out is to read the documentation.
If your approach is followed, your program might
have a subtle bug where you're not getting the value you thought
you're getting but you have the test written backwards or otherwise
wrong and for the cases you tested your multiple mistakes are
covering for each other making the program "work" despite being
totally wrongly written.
That is what testing if for. You feed in as much data (in the loosest
sense) as practical carefully crafted to do your damnedest to break the
code and thus find what is wrong with it.

You said in your post that the way to do it was basically to experiment
with the function.
It's best to read the documentation (as I did, but did't include in
the steps of actual program development, sorry if you assumed
contrary to fact),
I can only go on what you actually post.
and the install both the call to whatever
library routine *and* a printf of the return value, then look at
the output to see if it conforms to how you read the documentation
to mean, and if so then proceed to write the test on that basis.
But if the return value doesn't agree with what you thought the
documentation said, you need to consider various alternatives:
- You aren't calling the correct function because you loaded the
wrong library.
- You are calling the correct function in the wrong way (as
happened to me the first time I tried strtoll, see other thread).
- You misunderstood the documentation.
Testing your program will find all of these. Well, it will if you test
it properly.
Once you are sure the function returns the value you expect in all
test cases that cover in-range out-of-range cases as well as
carefully constructed right-at-edge-of-range cases, if any of that
makes sense for the given fuction, *then* it's time to write the
test to distinguish between the various classes of results as you
now *correctly* understand them based on agreement between your
reading of documentation and your live tests.

So in this case, calling fgets, I needed to test all these cases:
- Empty input: NonNull return value, Buffer contains EOL NUL
- Normal input: NonNull return value, Buffer contains chars EOL NUL
- Input that overruns buffer: NonNull return value, Buffer contains chars NUL
- Abort via end-of-stream generated via ctrl-D: NULL return value.
- Abort via ctrl-C: Program aborts to shell all by itself.
- Abort via escape (a.k.a. altmode): Goes in as garbage screwup character, avoid.
All that would have been in the test set for testing your program so
having read the documentation and written the relevant module you would
test and see that it worked as expected, including the program
gracefully handling "garbage" input.
One case (buffer overrun) I really needed to see for myself,
because the documentation didn't make it clear whether fgets would
omit the NUL so it could fill the entire buffer with data to cram
it all in and not lose that last character, or truncate the data
one shorter to guarantee a NUL was there. In fact the latter
occurs.
If you cannot understand the documentation you have available that is
the time to ask those with more experience/knowledge. Had you at that
point posted here saying that it was not clear from the documentation
you have then someone here would clarify it for you.
But I was prepared to force a NUL there, overwriting the
last byte, if fgets had done the first instead. One thing I *did*
have to do is check whether the last character before the NUL was
EOL or not, and clobber it to NUL (shortening string by one
character per c's NUL-terminated-byte-array convention for
emulating "strings") only if it was EOL, so the string seen by the
rest of the program would consistently *not* have the EOL
character.
So, as you cannot tell that from your documentation how do you know that
behaviour is not specific to your implementation and might not change
when a patch is installed on the machine later today?
Now not all those cases are actually necessary for the end purpose
of this program, developing a module intended for CGI usage, but
it's nice to know my how the basic terminal-input routine of my
stdin test rig performs for *all* inputs before making extensive
use of it for *anything*. I don't want confusion later where I
don't know whether strange results are due to bug in test rig or
bug in the actual module I'm trying to develop.
So you test your test rig once you have written it.
>>Unfortunately c doesn't allow any sleep times except integers. ...
Wrong. C does not allow *any* sleeping. ...

Let me re-phrase that: The sleep function provided by the library
whose header is unistd.h doesn't allow any sleep times except
integers. Now are you happy?
Yes.

Understanding what is part of C and what is not is important so that you
can isolate the system specifics and know what will have to be changed
to run the program on some other system.
--
Flash Gordon
Feb 14 '07 #30

P: n/a
On Tue, 13 Feb 2007 18:32:05 -0800, robert maas wrote:
>From: rpbg...@yahoo.com (Roland Pibinger)
IMO, the last part of the function should look like the following:
errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("No number or not parsable number given.\n");
} else if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else if (endptr != chars) {
printf("After number, extra characters on input line.\n");
} else {
printf("Unknown error, should never happen.\n");
}

Before I started this task, I made a design desision that
whitespace before and/or after the number is fine, but any other
stray character not part of the [optionalSign] oneOrMoreDigits is
an error. Your advice is inconsistent with the part of the decision
whereby trailing whitespace is fine.
Ok, in your original code you did not distinguish between (allowed)
trailing whitespace and (not allowed) extra characters:

} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");

>Part of my decision was that whitespace allowance should be
symmetric. It should be allowed before iff allowed after. strtol is
assymtric in this respect, allowing whitespace before (and
rejecting stray non-white text before), but failing to distinguish
between trailing whitespace (OK) and trailing junk (Not OK), either
rejecting both (if caller checks to make sure the final pointer
matches end of string), or accepting both (if caller doesn't make
that check).
Here is a 'symmetric' version that allows for leading and trailing
whitespace but not for 'stray non-white text':

errno = 0;
long_var = strtol(chars, &endptr, 0);
if (ERANGE == errno) {
printf("Number out of range.\n");
} else if (endptr==chars) {
printf("Not a (parsable) number given.\n");
} else {
while (isspace (*endptr)) { // trailing whitespace?
++endptr;
}
if ('\0' == *endptr) {
printf("Looks good? N=%ld\n", long_var);
} else {
printf("After number, invalid extra characters on input
line.\n");
}
}

I hope that this is now a 100% solution. I agree that strtol is a good
example of how not to design a function interface.

Best regards,
Roland Pibinger
Feb 14 '07 #31

P: n/a
From: rpbg...@yahoo.com (Roland Pibinger)
Before I started this task, I made a design desision that
whitespace before and/or after the number is fine, but any other
stray character not part of the [optionalSign] oneOrMoreDigits is
an error. Your advice is inconsistent with the part of the decision
whereby trailing whitespace is fine.
Ok, in your original code you did not distinguish between (allowed)
trailing whitespace and (not allowed) extra characters:
I don't believe you've even looked at my original code.
Do you rememer seeing this function definition?
/* Given a string (nul-term), and index where digits ended,
scan to very end making sure no junk, return code:
garafnum = garbage after number */
enum errcode strchkint4(char* str, int* pix) {
char ch;
while (1) {
ch = str[*pix];
if ((0 == ch) || ('\n' == ch)) {
/* printf("At ix=%d, ch=%c, nul/eol reached.\n", *pix, ch); */
return(0);
}
else if (' ' == ch) {
/* printf("At ix=%d, ch=%c, skip white.\n", *pix, ch); */
(*pix)++;
}
else {
/* printf("At ix=%d, ch=%c, junk.\n", *pix, ch); */
return(garafnum);
}
}
}
If you don't remember seeing that code, then you haven't looked at
the original code I wrote for the C implementation of this task,
because *that* is the relevant original code.
} else if ('\0' != *endptr) {
printf("After number, extra characters on input line.\n");
You're totally confused. That's not my original code at all.
Here's the chronology:
-1- Original code, such as the piece I posted above.
-2- Translation of original code to C++, which can be found here:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html#h4intcpp>
-3- Complete re-write in Common Lisp.
-4- Translation of lisp version to java.
-5- Complete re-write in perl.
-6- Translation of perl version to PHP.
-7- Getting advice to try strtol.
-8- Researching strtol, discovering strtoll which is better.
-9- Trying strtoll in test rig, having trouble.
-A- Getting advice about why strtoll didn't work for me.
-B- Fixing test rig to use strtoll correctly, but being dissatisfied
because it fails to distinguish between trailing whitespace and
trailing junk.
-C- Discovering strspn and strcspn.
-D- Translating lisp/java version to c using strspn and strcspn,
using strtoll only after the syntax check has already been
completed.
-E- Your confusion between the first version -1- using while loop
and something somewhere from -9- to the end using strtol[l].
I agree that strtol is a good example of how not to design a
function interface.
At least we're in agreement about that one thing!

There's still the policy decision whether to show absolute
beginners how to write their own code, such as scanning for the
first character that matches or doesn't match a bag of some type,
using position-if and position-if-not in Common Lisp or strspn and
strcspn in C, or just call a magic genie which does almost what you
want but screws up in one aspect requiring a post-call fixup to
make the result 100% correct. At the moment, I prefer the scanning
method in all languages except perl and PHP, because it's
symmetric, and easily translatable between for languages rather
than special to just one add-on library of one laguage. In perl and
PHP I'm presently using regular expressions, a sort of "magic
genie" but without the design flaw that strol[l] have, because (1)
they are nicely integrated into the language, no hassle to use
them, and (2) they are in fact advertised as a primary reason to
use those languages so I might as well show off such usage when I'm
comparing how to do the same task in all six languages.

On the other hand, that's slightly moot for this specific purpose,
which was merely to extract a numeric value from a HTML FORM field
string in the safest way possible, so that the numeric value could
then be used in the actual sample code fragment, which I haven't
started writing yet.

If anyone is curious about the overall project (multi-language
"cookbook" in form of matrix per one or two datatypes that each
operation/function deals with), I've finished all the built-in c
and c++ operators, and their Common Lisp equivalents, and now I'm
doing the c libraries, starting with ctype.h where I'm about
halfway finished. See toplevel "cookook" file:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/CookTop.html>
click on chapter 3 skeleton in progress.
Feb 14 '07 #32

P: n/a
robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 20:23:
>From: rpbg...@yahoo.com (Roland Pibinger)
<snip>
>I agree that strtol is a good example of how not to design a
function interface.

At least we're in agreement about that one thing!

There's still the policy decision whether to show absolute
beginners how to write their own code, such as scanning for the
first character that matches or doesn't match a bag of some type,
using position-if and position-if-not in Common Lisp or strspn and
strcspn in C, or just call a magic genie which does almost what you
want but screws up in one aspect requiring a post-call fixup to
make the result 100% correct.
From your perspective it might "screw up" one aspect, but that is
because you are assuming the string is meant to have only one data item.
strtol and friends are designed on the basis that you might want to pass
the rest of the string to something else, so they tell you where to
start. In your case that is looking to see if the remainder is white
space or not, but sometime people might be doing other things.
At the moment, I prefer the scanning
method in all languages except perl and PHP, because it's
symmetric, and easily translatable between for languages rather
than special to just one add-on library of one laguage. In perl and
PHP I'm presently using regular expressions, a sort of "magic
genie" but without the design flaw that strol[l] have, because (1)
they are nicely integrated into the language, no hassle to use
them, and (2) they are in fact advertised as a primary reason to
use those languages so I might as well show off such usage when I'm
comparing how to do the same task in all six languages.

On the other hand, that's slightly moot for this specific purpose,
which was merely to extract a numeric value from a HTML FORM field
string in the safest way possible, so that the numeric value could
then be used in the actual sample code fragment, which I haven't
started writing yet.
Personally I would still go with strtol[l] and then check whether the
trailing data is white space or not.
If anyone is curious about the overall project (multi-language
"cookbook" in form of matrix per one or two datatypes that each
operation/function deals with), I've finished all the built-in c
and c++ operators, and their Common Lisp equivalents, and now I'm
doing the c libraries, starting with ctype.h where I'm about
halfway finished. See toplevel "cookook" file:
<http://www.rawbw.com/~rem/HelloPlus/CookBook/CookTop.html>
click on chapter 3 skeleton in progress.
Looking at some of the earlier stuff you have work to do there as well.
The hello world programs in C are using implicit int for main which is
not allowed in the latest standard, the web one fails to include stdio.h
which is required (unless you want to do the work of providing your own
prototype), and one of them is a deliberately obfusticated program which
relies on ASCII which the C standard does not guarantee.

This from your "CookBook" is wrong for C95 and earlier, and since you
use implicit int all over the place you are not using C99:

| In c, each function definition is supposed to be before the first time
| it is called. That's because the compiler works forward through the
| file checking each fuction-call to make sure the function is defined,
| and generates an error message immediately when it sees a attempt to
| call a function that isn't defined.

So is this prototype you show "int g2(int n1,n2);"

There are several other errors.

I suggest you need to learn C properly before writing any kind of
"CookBook" that includes C in the languages it uses.
--
Flash Gordon
Feb 14 '07 #33

P: n/a
Flash Gordon <sp**@flash-gordon.me.ukwrites:
robert maas, see http://tinyurl.com/uh3t wrote, On 14/02/07 20:23:
[snip]
This from your "CookBook" is wrong for C95 and earlier, and since you
use implicit int all over the place you are not using C99:

| In c, each function definition is supposed to be before the first time
| it is called. That's because the compiler works forward through the
| file checking each fuction-call to make sure the function is defined,
| and generates an error message immediately when it sees a attempt to
| call a function that isn't defined.

So is this prototype you show "int g2(int n1,n2);"

There are several other errors.
Including the use of "defined" rather than "declared". A function
call requires a declaration for the called function; it doesn't
require a definition. (That's in C99; C90 allows calls without
declarations, but providing declarations, preferably prototypes, is
still an excellent idea.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Feb 15 '07 #34

P: n/a
From: Flash Gordon <s...@flash-gordon.me.uk>
you are assuming the string is meant to have only one data item.
Yes, that's the situation here, when validating the contents of a
single HTML-FORM text field, which is supposed to contain exactly
the representation of one integer using decimal notation,
optionally with whitespace around it either/both way(s).
strtol and friends are designed on the basis that you might want
to pass the rest of the string to something else, so they tell
you where to start.
So basically you make sure you've gobbled everything preceding the
item of interest, except whitespace, then you call the function,
which skips the leading whitespace and gobbles the item of
interest, leaving any trailing whitespace and any items of later
interest. So whitespace is treated in an asymmetrical manner, and
at the very end of a chain of [white]* [item]! parsing you have a
single [white]* [null] parser just to verify somebody didn't leave
more useful items that haven't been gobbled?

I'll have to remember that paradigm if and when I ever ask a user
to type in more than one item on a single line, such as if I ever
write a CGI-accessible Soduku solver where a whole row is entered
in a single text field.

Thanks for explaining that other input paradigm, sorta like scanf
but more robust.
In your case that is looking to see if the remainder is white
space or not, but sometime people might be doing other things.
Yes. If I wanted to fit my single-item syntax-check into that
multi-item-chain paradigm, I'd have to do it like you suggested in
an earlier message. But unfortunately when it says "no number
present" it really means "no number *immediately* present at start
of line, ignoring optional whitespace". So to satisfy my spec, that
would have to be sub-cased, where if it hits the no-number
condition I'll have to scan for a digit anyway to separate the
sub-cases of junk-before-number and truly-no-number-anywhere.

For now I still like the strspn and strcspn version best for the
current application. But thanks for the explanation of the other
paradigm that I might use for another application someday.
Looking at some of the earlier stuff you have work to do there as
well. The hello world programs in C are using implicit int for
main which is not allowed in the latest standard, the web one
fails to include stdio.h which is required (unless you want to do
the work of providing your own prototype),
Let me use -Wall to fix all that ... h.c h1.c h2.c done

In cgis.c (needed for h3.c and beyond), there's a line of code that
shifts the existing value to the left 4 bits and then adds in the
four new bits obtained from the hexadecimal character in the string
it's walking. The line of code looks like this:
c = c<<4 + h;
but the gnu c compiler complains:
cgis.c:118: warning: suggest parentheses around + or - inside shift
Give that there are clearly extra spacing around the =, while the
<< is compact, it's quite clear the intention of the author was:
c = (c<<4) + h;
so it's stupid for the compiler to suggest making it instead:
c = c<<(4 + h);
Should I leave it as-is, or put parens around the shift to avoid
the stupid mis-leading warning?? (Your personal opinion, what you'd
do in my circumstance, writing code examples to share with others,
but in this case simply using somebody else's module to which I
already had to fix a bug before it'd compile.)

Fixed h3.c, all done. Thanks for the heads-up. All my code worked
fine as they were, but they are supposed to be examples for novices
to copy and try and emulate etc. so they faltered in that respect.
Take another look now if you have time.
and one of them is a deliberately obfusticated program which
relies on ASCII which the C standard does not guarantee.
Which one specifically? Cite a line of code taht relies on ASCII
and I'll get the idea which section of it to study?
... you use implicit int all over the place ...
The only place I used implicit int was in return value for main,
which has now been fixed in all cgi-bin/*.c files unless I screwed
up somewhere.
So is this prototype you show "int g2(int n1,n2);"
I don't see anything wrong with that prototype. Do I need to
declare n1 and n2 separately, like this?
int g2(int n1, int n2);
There are several other errors.
Feel free to find a couple totally different errors and tell me
about them, the I'll fix them and anything else they remind me of.
I suggest you need to learn C properly before writing any kind of
"CookBook" that includes C in the languages it uses.
I already took three semester-length C classes. That's all that are
offered at De Anza College. What do you suggest for further
correction of anything I happened to get wrong after three
semesters of formal study plus various Web-based exploration
looking for specific info such as strtoll and strcspn?
Feb 15 '07 #35

P: n/a
2007-02-13 <re***************@yahoo.com>,
robert maas, see http://tinyurl.com/uh3t wrote:
>From: Flash Gordon <s...@flash-gordon.me.uk>
... The first time I ran this
program, without the sleep call, and pressed ctrl-D to generate
end-of-stream on stdin, the program went into infinite read-EOS
spew-text loop, which filled up all modem buffers. ...
I can only suggest that you had some other bug in your program at that
point or a but in your modem. As presented your program would not do
that whether it detected an error or EOF it would break out of the loop
and terminate.

Not a bug. It's just that the part of the program to detect EOF wasn't yet
written, and that's the very part I was trying to develop.
Step 1: Put in a printf to see what value comes back when I press ctrl-D.
Step 2: Write code to detect that value and break out of loop.
Step 3: Test that to see whether it works.
Step 4: Remove the printf.
Unfortunately step 1 blew me out for ten minutes or so without the sleep.
Why was there a loop at all in step 1?
Feb 15 '07 #36

P: n/a
robert maas, see http://tinyurl.com/uh3t wrote, On 15/02/07 00:24:
>From: Flash Gordon <s...@flash-gordon.me.uk>
you are assuming the string is meant to have only one data item.
<snip>
>Looking at some of the earlier stuff you have work to do there as
well. The hello world programs in C are using implicit int for
main which is not allowed in the latest standard, the web one
fails to include stdio.h which is required (unless you want to do
the work of providing your own prototype),

Let me use -Wall to fix all that ... h.c h1.c h2.c done
You should use "-ansi -pedantic" as well, together with possibly -W.
In cgis.c (needed for h3.c and beyond), there's a line of code that
shifts the existing value to the left 4 bits and then adds in the
four new bits obtained from the hexadecimal character in the string
it's walking. The line of code looks like this:
c = c<<4 + h;
but the gnu c compiler complains:
cgis.c:118: warning: suggest parentheses around + or - inside shift
Give that there are clearly extra spacing around the =, while the
<< is compact, it's quite clear the intention of the author was:
c = (c<<4) + h;
so it's stupid for the compiler to suggest making it instead:
c = c<<(4 + h);
You consider a compiler to be stupid for following the language
specification? C, like most computing languages, does not use white
space to group expressions. I seem to recall you also cover Perl in your
"CookBook" and based on this one assumption I would say you don't know
Perl or C.

Did you actually even go to the effort of trying code before putting it
up on your web site? I think not.

<snip>
>I suggest you need to learn C properly before writing any kind of
"CookBook" that includes C in the languages it uses.

I already took three semester-length C classes. That's all that are
offered at De Anza College.
I'm sorry, but either you failed or those courses based on your current
knowledge or they appear to be almost worthless.
What do you suggest for further
correction of anything I happened to get wrong after three
semesters of formal study plus various Web-based exploration
looking for specific info such as strtoll and strcspn?
Well, in general doing web-based stuff is a bad idea unless you have a
*very* good reason to trust the specific ones you are using. Your
"CookBook" currently seems to be a prime example of why you should *not*
trust web resources.

Do the world a favour and take down your "CookBook" since you are a long
way from having enough knowledge to write it for even one language, let
alone 6.

I suggest you start looking at the comp.lang.c FAQ (Google will find it)
and buy a copy of K&R2 (the full details are in the bibliography of the
FAQ). Work through *all* the exercises in K&R2 starting with the
assumption that you do not know C since you really do not know it.
--
Flash Gordon
Feb 15 '07 #37

P: n/a
re*****@yahoo.com (robert maas, see http://tinyurl.com/uh3t) wrote:
But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>
Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.
GNU is wrong on ISO C and does not care. Film at eleven.

Richard
Feb 15 '07 #38

P: n/a
Richard Bos wrote:
re*****@yahoo.com (robert maas, see http://tinyurl.com/uh3t) wrote:
>But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>

Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.

GNU is wrong on ISO C and does not care. Film at eleven.
In what way?

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Feb 16 '07 #39

P: n/a
From: Keith Thompson <k...@mib.org>
There are several other errors.
Including the use of "defined" rather than "declared".
OK, there was one section of CookTop.html that was sloppy in the
jargon. I think I've tentatively fixed it. It's rather awkward at
present, but at least it doesn't confuse the two terms. Here's the
(backwards) diff:
% diff CookTop.html*
1787,1788c1787
< checking each fuction-call to make sure the function is declared (i.e.
< at least a prototype showing return type and formal parameters), and
---
checking each fuction-call to make sure the function is defined, and
1790,1793c1789,1791
< function that isn't declared. It can't guess that you're calling a function
< you will be defining later in the file. Most of the time you actually define
< each function before using it. But if you really must call a
< fuction before you defie it, for example if you have two functions that
---
function that isn't defined. It can't guess that you're calling a function
you will be defining later in the file. But if you really must call a
fuction before you define it, for example if you have two functions that
1795,1796c1793
< You write just a declaration for any function that needs to be called before
< it's defined. You write the type of return value,
---
You write a function-definition template. You write the type of return value,
1811c1808
< to try to keep the declaration matching the actual function definition
---
to try to keep the template matching the actual function definition
Thanks for the "heads-up".
Feb 16 '07 #40

P: n/a
From: Random832 <ran...@random.yi.org>
Why was there a loop at all in step 1?
Because after compiling and starting the program and typing a test
value and restarting the program and typing another test value and
restarting the program and typing another test value and restarting
the program and typing another test value, I got fed up with having
to manually re-start the program every time I wanted to type in a
new test value.
Feb 16 '07 #41

P: n/a
From: Flash Gordon <s...@flash-gordon.me.uk>
Let me use -Wall to fix all that ... h.c h1.c h2.c done
You should use "-ansi -pedantic" as well, together with possibly -W.
Why? What purpose would be served by doing that?
so it's stupid for the compiler to suggest making it instead:
c = c<<(4 + h);
You consider a compiler to be stupid for following the language
specification?
The language specification does not forbid or suggest against
shifting a value to the left to make room for adding another small
bunch of bits on the right. The code as written is perfectly valid,
a suggestion that it ought to be changed to add in the new bits
right on top of the old ones (mangling both) and *then* shifting to
the left (leaving a hole where the new bits should have been) is
not a good suggestion.
Did you actually even go to the effort of trying code before
putting it up on your web site? I think not.
The code for doing the data processing, yes. Didn't you see the
thread where I had a SLEEP call in the test rig to prevent runaway
spew if ctrl-D was pressed to generate EOS on STDIN. After I got
the code for validating string decimal representation of integer
and conversion to actual long long int datatype all working, *then*
I interface it to CGI and put it up, and tried it, and made sure it
was all working before leaving it standing for others to use.

The code for interfacing to CGI, well there's no way to test that
without putting it up on cgi-bin, where anybody might accidently
try it while I'm right in the midst of working on it. There's no
way to avoid that. There's no way for me to run any CGI software
without making it public-available. But I make sure there's a short
period from when I first try interfacing it until it's working, and
it has a WARNING, CODE NOT YET TESTED YET ... at the boundary
between what's already working and what I'm testing at the moment,
just in case somebody else tries it in the middle of a development
period.

I'm probably going to continue the same policy in the future. Any
time I am starting to write a brand-new major algorithm that will
require a lot of work before it's ready for others to try, I'll do
it in a stdio test rig before interfacing it to CGI. But any time
I'm just adding one or two line(s) of code at a time to an existing
script I'll probably put it directly online with that warning ahead
of it. Do you have a serious problem with my policy in this matter??
I already took three semester-length C classes. That's all that are
offered at De Anza College.
I'm sorry, but either you failed
I got an "A" in every one of those classes. If you don't believe
me, come here, we'll go to the public library where there's access
to JavaScript (required for viewing transcripts), and I'll show you
my complete DeAnza transcript. If you want to call me a liar in a
public newsgroup, then fuck you bastard!!
or those courses based on your current knowledge or they appear
to be almost worthless.
You're entitled to your opinion on such matters. Perhaps you should
come here and look at my transcript to see which instructors were
teaching those classes, and then you write a formal letter to
De Anza College complaining that all those instructors are
incompetant to teach C programming classes.
Your "CookBook" currently seems to be a prime example of why you
should *not* trust web resources.
The primary purpose of my "CookBook" is to show, in several
languages in parallel how to do various common tasks, such as the
tasks provided by standard libraries in the various languages, and
eventually some of the more advanced tasks covered in the Perl and
Common Lisp cookbooks. That should accomplish several purposes:
- If a person is trying to learn a new language, and knows how to
do something in one language but needs to know how to do
something equivalent in the new language, whereby the person can
directly search for the library function in the language and
thereby jump directly to the place where that function is
compared to equivalents in other languages.
- If the person wants to convert one kind of data to another kind,
the person can look in the table of contents to find either type
first and the other type as a sub-heading, and thereby have a
short section of similar functions to browse, not distracted by
other functions that deal with other combinations of data types.
- If a person is trying to pick an appropriate language for some
utility, and has an idea what specific data processing steps
would be involved in the task, the person can look up each
relevant section per processing step and get a idea how well
each language covers that step, and thereby get a general idea
how much extra work would be required, or whether it's even
feasible, in the various languages.

At present my "CookBook" is very far from completion. I have
finished including one c library, and lisp equivalents, and am
starting on two more c libraries. I still need to include the rest
of the c libraries, all the stuff in lisp that has no c equivalent,
and include java equivalents for all of that. Then someday I need
to check differences between c and c++ for this all and show the
c++ way whereever different. I also need to include all the java
stuff that's not available in c or lisp. Also someday I need to
include perl and php equivalents where different from c. And of
course include the perl/php stuff that's not in the other languages
at all. For the moment, I'm concentrating on completeness of
data/processing tasks, not much covering control structures such as
thread or inter-process communication at all. Better to be complete
(eventually, this year I hope) in one major class of processing
tasks, than to jump around willy nilly and never get any particular
class of tasks completely covered in five years. In particular, in
browsing the table of contents of the fine GNU C library document:
<http://www.aquaphoenix.com/ref/gnu_c_library/>
I noticed a large amount of stuff on pipes and sockets, which I've
decided *not* to include on this first major pass, partly because
it'd be the "straw that broke the camel's back" for my workload,
but also because it doesn't fit into the datatype matrix anyway.
I've decided to stick totally with the libraries that process data
types inside the machine, until I get that virtually all done.
Actually I'm not even sure I want to finish the library I started
exploring yesterday, the stuff with floating-point numbers. I might
decide to abort that before investing any more time with it.
I suggest you start looking at the comp.lang.c FAQ (Google will find it)
Is this the one you want me to look at? <http://c-faq.com/>

I browsed it a little, and found one apparent mistake:
<http://c-faq.com/aryptr/arraylval.html>

Q: How can an array be an lvalue, if you can't assign to it?
__________________________________________________ _______________

A: The term ``lvalue'' doesn't quite mean ``something you can assign
to''; a better definition is ``something that has a location (in
memory).'' [footnote] The ANSI/ISO C Standard goes on to define a
``modifiable lvalue''; an array is not a modifiable lvalue. See also
question 6.5.

In fact you *cannot* assign to an array (except if it was declared
as a formal parameter, in which case it's already degraded to a
simple pointer which *can* be assigned to). You can only assign to
an *element* of an array. For example:
int main(void) {
char name[10] = "John";
name[2] = 'a'; /* Valid, assign to element name is now "Joan". */
name = "Mike"; /* Not legal, assign to *array* itself. */
...
Am I correct there? Thus the question above presumes a false fact,
and the answer should right at the top point out the false premise,
not assume the false premise and issue a red herring of an answer.

Hmmm, curious:
<http://c-faq.com/misc/returnparens.html>
Just the other day somebody corrected me because I followed the
examples/spec in K&R on pages 23, 68, and 70, where the sytax is
repeatedly stated as return(expression). But way back on page 203
it says instead return expression; (no parens), which I noticed
just now for the very first time, in respose to this FAQ item. Is
that a mistake in proofreading in K&R, and if so which was correct
at the time it was written, i.e. were pages 23/68/70 all wrong, or
was page 203 wrong, at the time it was written?
and buy a copy of K&R2
I have no money to buy anything. Please provide me with a job that
pays earned income if you want to change this present condition of
my life.
Feb 16 '07 #42

P: n/a
re*****@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
>From: Flash Gordon <s...@flash-gordon.me.uk>
Let me use -Wall to fix all that ... h.c h1.c h2.c done
You should use "-ansi -pedantic" as well, together with possibly -W.

Why? What purpose would be served by doing that?
It would catch more errors.
so it's stupid for the compiler to suggest making it instead:
c = c<<(4 + h);
You consider a compiler to be stupid for following the language
specification?

The language specification does not forbid or suggest against
shifting a value to the left to make room for adding another small
bunch of bits on the right. The code as written is perfectly valid,
a suggestion that it ought to be changed to add in the new bits
right on top of the old ones (mangling both) and *then* shifting to
the left (leaving a hole where the new bits should have been) is
not a good suggestion.
Quoting what you wrote up-thread:
| In cgis.c (needed for h3.c and beyond), there's a line of code that
| shifts the existing value to the left 4 bits and then adds in the
| four new bits obtained from the hexadecimal character in the string
| it's walking. The line of code looks like this:
| c = c<<4 + h;
| but the gnu c compiler complains:
| cgis.c:118: warning: suggest parentheses around + or - inside shift
| Give that there are clearly extra spacing around the =, while the
| << is compact, it's quite clear the intention of the author was:
| c = (c<<4) + h;
| so it's stupid for the compiler to suggest making it instead:
| c = c<<(4 + h);

You have misunderstood what that expression means. I think somebody
already explained it to you; I'll try again.

The spacing around operators is *ignored* by the compiler, though it
can be useful for legibility and to make your intent clear to the
reader. You wrote:
c = c<<4 + h;
which you apparently wanted to be evaluated as
c = (c<<4) + h;
but it actually *means*
c = c << (4 + h);
because the "+" operator binds more tightly than the "<<" operator.
gcc was kind enough (and clever enough) to warn you about this.

The spacing might convey your intention to a human reader; it does not
convey anything to the compiler.

If you don't believe me, try this program (10 and 17 are just
arbitrary values chosen to cause the expression to give different
results depending on the grouping):

#include <stdio.h>
int main(void)
{
int c = 10;
int h = 17;

if (c<<4 + h == (c<<4) + h) {
printf("c<<4 + h == (c<<4) + h\n");
}

if (c<<4 + h == c<<(4 + h)) {
printf("c<<4 + h == c<<(4 + h)\n");
}
return 0;
}

Another example, that might be clearer:
x+y * z
*looks* like it should mean
(x + y) * z
but it actually means
x + (y * z)

If you find yourself using spacing to indicate grouping in an
expression, I suggest you use parentheses instead.

[...]
The code for interfacing to CGI, well there's no way to test that
without putting it up on cgi-bin, where anybody might accidently
try it while I'm right in the midst of working on it. There's no
way to avoid that.
[...]

<OFF-TOPIC>
There are a number of ways to avoid that; some of them may not be
available to you, depending on the resources to which you have access.

If you're able to set up your own web server, you can probably
configure it so that nobody else can access it, and experiment to your
heart's content.

If that's not possible, and all you can do is install your code in
cgi-bin, you can try installing it with a name that nobody is likely
to stumble across. You can exercise the code because you know its
name, but nobody else can.
</OFF-TOPIC>

If you have questions about CGI, try asking them in
comp.infosystems.www.authoring.cgi.

[snip]
I already took three semester-length C classes. That's all that are
offered at De Anza College.
I'm sorry, but either you failed

I got an "A" in every one of those classes. If you don't believe
me, come here, we'll go to the public library where there's access
to JavaScript (required for viewing transcripts), and I'll show you
my complete DeAnza transcript. If you want to call me a liar in a
public newsgroup, then fuck you bastard!!
Calm down; nobody called you a liar. And consider watching your
language; there's no point in needlessly offending people.
>or those courses based on your current knowledge or they appear
to be almost worthless.
[...]
>I suggest you start looking at the comp.lang.c FAQ (Google will find it)

Is this the one you want me to look at? <http://c-faq.com/>

I browsed it a little, and found one apparent mistake:
<http://c-faq.com/aryptr/arraylval.html>

Q: How can an array be an lvalue, if you can't assign to it?
__________________________________________________ _______________

A: The term ``lvalue'' doesn't quite mean ``something you can assign
to''; a better definition is ``something that has a location (in
memory).'' [footnote] The ANSI/ISO C Standard goes on to define a
``modifiable lvalue''; an array is not a modifiable lvalue. See also
question 6.5.

In fact you *cannot* assign to an array (except if it was declared
as a formal parameter, in which case it's already degraded to a
simple pointer which *can* be assigned to). You can only assign to
an *element* of an array.
[...]

Of course. Read the FAQ again, more carefully; it doesn't say or
imply that you can assign to an array. The FAQ is perfectly correct.

[...]
Am I correct there? Thus the question above presumes a false fact,
and the answer should right at the top point out the false premise,
not assume the false premise and issue a red herring of an answer.
The letter 'l' in the word "lvalue" originally referred to the *left*
side of an assignment. The idea was that an "lvalue" was an
expression that can appear on the left side of an assignment, and an
"rvalue" was an expression that can appear on the right side of an
assignment. This terminology predates C, and those meanings may have
been appropriate for earlier, simpler languages. In C, the meaning of
"lvalue" has changed to include any expression that designates an
object, whether it can be assigned to or not (and the term "rvalue"
has been largely dropped).

The question is based on a misconception. Someone who's familiar with
the historical meaning of "lvalue" is likely to be confused by the
fact that an array can be an lvalue, but can't appear on the left side
of an assignment. The whole point of the answer is to correct that
misconception.
Hmmm, curious:
<http://c-faq.com/misc/returnparens.html>
Just the other day somebody corrected me because I followed the
examples/spec in K&R on pages 23, 68, and 70, where the sytax is
repeatedly stated as return(expression). But way back on page 203
it says instead return expression; (no parens), which I noticed
just now for the very first time, in respose to this FAQ item. Is
that a mistake in proofreading in K&R, and if so which was correct
at the time it was written, i.e. were pages 23/68/70 all wrong, or
was page 203 wrong, at the time it was written?
The examples are all correct. The parentheses are *optional*. Both
return(42);
and
return 42;
are perfectly legal.
>and buy a copy of K&R2

I have no money to buy anything. Please provide me with a job that
pays earned income if you want to change this present condition of
my life.
I'm sorry if your financial situation is unfavorable, but that
certainly isn't anybody else's responsibility. Complaining here about
things we can't help you with wastes your time and ours.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Feb 16 '07 #43

P: n/a
CBFalconer <cb********@yahoo.comwrote:
Richard Bos wrote:
re*****@yahoo.com (robert maas, see http://tinyurl.com/uh3t) wrote:
But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>

Many of the functions described in this chapter return the value of
the macro EOF to indicate unsuccessful completion of the operation.
Since EOF is used to report both end of file and random errors, it's
often better to use the feof function to check explicitly for end of
file and ferror to check for errors.
GNU is wrong on ISO C and does not care. Film at eleven.

In what way?
In that it's _not_ better to use the feof() function to check for eof.
feof() is good as an aid to distinguish between eof and error _once
you've already checked for EOF_. IOW, it's used with EOF, not better
than.

Richard
Feb 16 '07 #44

P: n/a
Richard Bos wrote:
CBFalconer <cb********@yahoo.comwrote:
>Richard Bos wrote:
>>re*****@yahoo.com (robert maas wrote:

But: <http://www.gnu.org/software/libc/manual/html_node/EOF-and-Errors.html>

Many of the functions described in this chapter return the
value of the macro EOF to indicate unsuccessful completion
of the operation. Since EOF is used to report both end of
file and random errors, it's often better to use the feof
function to check explicitly for end of file and ferror to
check for errors.

GNU is wrong on ISO C and does not care. Film at eleven.

In what way?

In that it's _not_ better to use the feof() function to check for
eof. feof() is good as an aid to distinguish between eof and
error _once you've already checked for EOF_. IOW, it's used with
EOF, not better than.
While I agree with your statement above, how does that make GNU
wrong? feof shows the file is at EOF, but not that a read etc.
failed. If it is not at EOF a read may succeed, or may fail due to
reaching EOF, or may fail due to i/o error. I think you are
objecting to the fact that they don't state explicitly that these
calls should be used to resolve the cause of receiving an EOF
signal. We don't know the context of the above quote without going
to the original, which I haven't. 'better' may simply mean better
than assuming receiving EOF means the file is at eof.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Feb 16 '07 #45

P: n/a
robert maas wrote:
>From: Flash Gordon <s...@flash-gordon.me.uk>
.... snip ...
>
>and buy a copy of K&R2

I have no money to buy anything. Please provide me with a job that
pays earned income if you want to change this present condition of
my life.
Then you might be well advised to listen to at least some of the
advice you are receiving rather than going off in the wilderness
with random insults. Your knowledge shows gaping holes, and by
your own statements that can only be due to the lack of quality in
your education and/or failure to listen. Your performance here
makes the latter more likely.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Feb 16 '07 #46

P: n/a
Keith Thompson wrote:
re*****@yahoo.com (robert maas, see http://tinyurl.com/uh3t) writes:
Flash Gordon wrote:
<snip>
I suggest you start looking at the comp.lang.c FAQ (Google will find it)
Is this the one you want me to look at? <http://c-faq.com/>

I browsed it a little, and found one apparent mistake:
<http://c-faq.com/aryptr/arraylval.html>

Q: How can an array be an lvalue, if you can't assign to it?
__________________________________________________ _______________

A: The term ``lvalue'' doesn't quite mean ``something you can assign
to''; a better definition is ``something that has a location (in
memory).'' [footnote] The ANSI/ISO C Standard goes on to define a
``modifiable lvalue''; an array is not a modifiable lvalue. See also
question 6.5.

In fact you *cannot* assign to an array (except if it was declared
as a formal parameter, in which case it's already degraded to a
simple pointer which *can* be assigned to). You can only assign to
an *element* of an array.
[...]

Of course. Read the FAQ again, more carefully; it doesn't say or
imply that you can assign to an array. The FAQ is perfectly correct.

[...]
Am I correct there? Thus the question above presumes a false fact,
and the answer should right at the top point out the false premise,
not assume the false premise and issue a red herring of an answer.

The letter 'l' in the word "lvalue" originally referred to the *left*
side of an assignment. The idea was that an "lvalue" was an
expression that can appear on the left side of an assignment, and an
"rvalue" was an expression that can appear on the right side of an
assignment. This terminology predates C, and those meanings may have
been appropriate for earlier, simpler languages. In C, the meaning of
"lvalue" has changed to include any expression that designates an
object, whether it can be assigned to or not (and the term "rvalue"
has been largely dropped).
If so, is there a reason to retain that term at all, and not use a
more generic term like expression?

<snip>

Feb 16 '07 #47

P: n/a
Keith Thompson wrote:
The letter 'l' in the word "lvalue" originally referred to the *left*
side of an assignment. The idea was that an "lvalue" was an
expression that can appear on the left side of an assignment,
Actually no. The idea was that an lvalue was the /value/ that was
obtained by evaluating an expression which was to be assigned
to, ie, traditionally [but not exclusively] on the left-hand-side
of an assignment.
and an "rvalue" was an expression that can appear on the right
side of an assignment.
Similarly, "rvalue" meant the /value/ obtained by evaluating an
expression for its "ordinary", not-being-assigned-to, value,
traditionally the right-hand-side of an assignment.

In the assignment `L := R`, we find L's lvalue and R's rvalue,
and then do some assignment magic which puts the rvalue "into"
the lvalue, which means [absent side-effects ...] that L's
rvalue is now [maybe some conversion of] what R's rvalue was.

Part of the reason for introducing this distinction was to
formalise why the variable `a` in `a := a + 1` means two
different things in the two different places: the left-hand
`a` is evaluated for its lvalue and the right-hand one for
its rvalue. For a variable this typically means evaluating
its lvalue and then dereferencing that.

In some languages, literals have lvalues, so the assignment
`1 := 2` is legal. Depending on the language semantics, `1`
may have a single lvalue, or a different one each time it
is evaluated. (The rvalue of `1` might or might not use its
lvalue.) While for assignment this looks like the rabid
and hungry sabre-toothed tiger, it makes more sense for
parameter-passing ...
This terminology predates C,
having been introduced or popularised by, if I recall,
Christopher Strachey, in the late 60's-early 70's; it
turns up (ditto) in his /Fundamental Concepts in Programming
Languages/ which a quick google doesn't find (references,
yes, text, no). My paper copy is somewhere at home.
and those meanings may have been appropriate for earlier,
simpler languages.
In fact they can work for modern languages -- many of which
are /simpler/ in these respects than some earlier languages.
In C, the meaning of "lvalue" has changed to include any
expression that designates an object, whether it can be
assigned to or not
That still fits inside the original formulation: the lvalue
is the value you get /by evaluating on the left/; you may
then be able to store into (through?) it, or not.

(I agree there's a shift to calling the /expression/ the
lvalue, rather than its /value/ the lvalue. I shall
spare you what some people might be moved to call a
"hissy fit" about this.)
(and the term "rvalue" has been largely dropped).
In favour of "value", isn't it?

C has (at least) three modes for expression evaluation:
lvalue, (r)value, and what one might call "svalue",
evaluation as the operand of `sizeof`.

--
Chris "electric hedgehog" Dollin
"Our future looks secure, but it's all out of our hands"
- Magenta, /Man and Machine/

Feb 16 '07 #48

P: n/a
"santosh" <sa*********@gmail.comwrites:
Keith Thompson wrote:
[...]
>The letter 'l' in the word "lvalue" originally referred to the *left*
side of an assignment. The idea was that an "lvalue" was an
expression that can appear on the left side of an assignment, and an
"rvalue" was an expression that can appear on the right side of an
assignment. This terminology predates C, and those meanings may have
been appropriate for earlier, simpler languages. In C, the meaning of
"lvalue" has changed to include any expression that designates an
object, whether it can be assigned to or not (and the term "rvalue"
has been largely dropped).

If so, is there a reason to retain that term at all, and not use a
more generic term like expression?
The C standard has done just that. There are exactly two occurrences
of the word "rvalue" in the C99 standard. One is in a footnote in
6.3.2.1:

[...]
What is sometimes called "rvalue" is in this International
Standard described as the "value of an expression".

The other is the index entry referring to this footnote.

C90 has the same wording.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Feb 16 '07 #49

P: n/a
Chris Dollin <ch**********@hp.comwrites:
Keith Thompson wrote:
>The letter 'l' in the word "lvalue" originally referred to the *left*
side of an assignment. The idea was that an "lvalue" was an
expression that can appear on the left side of an assignment,

Actually no. The idea was that an lvalue was the /value/ that was
obtained by evaluating an expression which was to be assigned
to, ie, traditionally [but not exclusively] on the left-hand-side
of an assignment.
Ah, that makes sense.
>and an "rvalue" was an expression that can appear on the right
side of an assignment.

Similarly, "rvalue" meant the /value/ obtained by evaluating an
expression for its "ordinary", not-being-assigned-to, value,
traditionally the right-hand-side of an assignment.
And that explains why that footnote says that an rvalue is the *value*
of the expression, while an lvalue has come to refer to the expression
itself. Thanks for the clarification.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Feb 16 '07 #50

232 Replies

This discussion thread is closed

Replies have been disabled for this discussion.