In article
<news:c7******************************@localhost.t alkaboutprogramming.com>
Verrigan <ve******@direcway.com> wrote:
I knew I was doing something wrong.
You still are, although -- in the fashion of undefined behavior --
it appears to be working at the moment anyway (hence probably just
waiting for some "important" moment to crash :-) ).
static int get_command_args(char *msg, char *args[])
{
char *p;
int i = 0;
p = args[i];
What is the value of i? (Answer: 0. We just set it to 0 right
here.) What is the value of args[i], i.e., args[0]? (Answer
depends on caller, obviously, who supplied args. Presumably the
value of "args" is valid, but args[0] had better hold some actual
value as well, because we just copied to to "p".)
Let us take a look at the caller for a moment so we can figure out
what the value of args[0] is.
static int parse_command(int sock, char *msg)
{
char *args[BUFFER_LEN];
So, in parse_command(), args has type "array (size BUFFER_LEN) of
pointer to char". If BUFFER_LEN is (say) 1000, that makes it
"array 1000 of pointer to char" -- 1000 elements, each of type
"pointer to char", and each one uninitialized.
Since "args" is an ordinary automatic variable, it is uninitialized
and thus full of garbage. args[0] is garbage, args[1] is garbage,
args[2] is garbage, and so on up through args[999] (still assuming
BUFFER_LEN is 1000).
if(get_command_args(msg, args) == -1)
return -1;
This passes the array "args" by value to the function get_command_args()
-- the place where we are trying to figure out what its args[0]
is. By The Rule about arrays and pointers in C (see
<http://web.torek.net/torek/c/pa.html>), the "value" of the array
is a pointer to its first element. Ultimately, args[i] in
get_command_args winds up meaning the same thing as args[i] here
in parse_command().
Thus, we can finally answer the question: "What is the value of
args[i], i.e., args[0], that we are going to copy into p in
get_command_args()?" The answer is: garbage. args[0] has never
been set to anything. We copy the trash value, whatever it is,
into "p". If we are really lucky, just copying the trash causes
an immediate error, so that we find the bug before anything else
goes wrong. Given what you have said elsewhere, we appear to be
unlucky: not only does copying trash into p not stop the program
immediately, but get_command_args() even runs to completion.
Let us imagine that "p" happens to point to some write-able memory
somewhere, such as the user's bank balance :-) , and press on:
while(*msg && !(*msg == END_CHAR)) {
What are the values of "msg" and "*msg"? Those depend on
parse_command() too, but it just gets them from somewhere else.
We will have to assume that they are something sensible (and
I am sure they are; I have seen plenty of code like this, and
this part is generally gotten right).
We might as well assume that *msg != 0 and *msg != END_CHAR, so
that we enter the loop. (If not we will just set *p to '\0',
clobbering the user's bank balance. You were not planning to
use that money, were you? :-) )
if(*msg == ARG_CHAR) {
Again, this depends on *msg. If, on the first trip through the
loop, *msg is ARG_CHAR, we will do this now; and likely *msg will
be ARG_CHAR at least one or two times inside the loop and we will
do this at least once or twice, after doing the "else" below a
few times as well.
*p = '\0';
Clobber the user's bank balance (on the first ARG_CHAR anyway),
then:
i++;
change i to 1 (on the first ARG_CHAR anyway; change it to 2 on
the 2nd ARG_CHAR, and so on)...
if(!(args[i] = malloc(sizeof(args[i])))) {
Here is another "guaranteed bug".
Calls to malloc should generally have one of two forms:
ptr = malloc(sizeof *ptr); /* to get one object */
or: ptr = malloc(n * sizeof *ptr); /* to get N objects */
This call looks like neither. Rather, it resembles:
ptr = malloc(sizeof ptr); /* missing "*"; maybe also need "n *" */
or in this case:
args[i] = malloc(sizeof args[i]); /* still missing a "*" */
So, this asks malloc() for "sizeof args[i]" bytes -- probably 2 or
4 but perhaps even 8 now, all depending on sizeof(char *) on your
particular box.
All of this is wrapped inside a test for malloc() failing (which
is good!), and if malloc fails we have a funny-looking thing that
must be a macro (from the all-caps):
PERROR(malloc);
return -1;
}
(Of course, by returning -1, we may lose some memory -- there could
have been some earlier, succeeding malloc()s -- but if the program
itself is going to exit soon, this is just a little sloppy, as long
as the system releases all memory on program exit.)
If the malloc() succeeds, on the other hand, we copy the result
into "p" for additional trips around the while loop, then advance
over the ARG_CHAR in *msg:
p = args[i];
*msg++;
} else {
This is the "*msg is not an ARG_CHAR" case:
*p++ = *msg++;
Here we copy the character from *msg to *p, advancing over the
copied character in both. Again, p is either somewhere in the
user's bank balance (which we are still faithfully clobbering) when
i == 0, or is based on the result of malloc() (i != 0).
If i != 0 (so that p *is* based on a malloc() call), how many chars
have we copied into the pointer we got from malloc()? We can in
fact find out: args[i] and p were initially the same, and p gets
incremented after each "char" is copied into *p, so the number of
"char"s copied so far -- either before or after "*p++ = *msg++" --
must be the same as (p - args[i]). This will count up: 0, 1, 2,
3, ..., as we copy "char"s from *msg++ to *p++. Each time *msg is
not an ARG_CHAR, we copy another one.
How many "char"s *can* we copy before we "copy too many"? The
answer depends on sizeof(char *), because we only asked malloc()
for sizeof(char *) bytes.[%] How many "char"s *will* we copy?
That depends on how many there are between each ARG_CHAR.
Note that each one will have a terminating '\0' added, either
on the earlier "*p = '\0'" line, or the one about to come up:
}
}
*p = '\0';
This finishes off the last one (or, if i==0, finishes clobbering
the user's bank balance).
-----
[%footnote
Remember that, in C:
A byte is a char and a char is a byte
Eight bits is common but nine is in sight
Digital sig-i-nal processors? Whew!
There you may find that you've got thirty-two!
Each "C byte" is CHAR_BIT bits long, and CHAR_BIT is allowed to
be more than 8. It has to be *at least* 8. Most systems make
it exactly 8, because that works very well.]
-----
return i++;
}
This "return" is a peculiar statement. "i++" says "find the current
value of i, and schedule i+1 to be written back to i some time
before the next sequence point." In other words, increment i, but
also hang on to the original value, as the value of the entire
expression. We then return the saved/original value. The act of
returning destroys the variable -- so why did we increment it?
This is not *wrong*, it is just peculiar. If get_command_args()
has made two strings for args[0] and args[1] (so that i==1), it
returns 1, not 2. If it has made just one, it returns 0, not 1.
It is impossible for it to make none -- it always writes a '\0'
byte. (It happens to fail to set args[0] at all, but this is a
separate bug, which presumably we might fix by having it malloc()
something for args[0] as well.)
So, the two big problems here are:
- args[0] is never malloc()ed at all, and
- args[i], i != 0, *is* malloc()ed, but the number of bytes
(aka "char"s) malloc()ed is not based on the number of bytes
that will be written.
A possible bug, but perhaps it is not a bug at all but rather part
of the design, is that some args[i]s may be empty strings. For
instance, if msg[2] is ARG_CHAR and msg[3] is *also* ARG_CHAR,
this winds up represented in the args[] array as an empty string.
Another "medium-likely" bug is that get_command_args() returns 0
even when given a completely empty string. Remember that returning
0 means args[0] is valid -- we return one less than the number of
strings we made. Thus, the empty string in "msg" represents one,
instead of zero, "args". Probably it should represent no args at
all, and get_command_args() should, e.g., return 2 when args[0]
and args[1] are both valid.
One other possible bug is the dreaded "buffer overflow". The
get_command_args() function has no idea how many args[i]'s it
can write on. The caller -- parse_command(), in this case --
provided BUFFER_LEN of them, without telling get_command_args()
that it did in fact provide BUFFER_LEN of them. This may well
be OK, because we know that get_command_args() can only write
at most "k" args[] elements, where k is the number of non-'\0'
bytes in msg[]. (That is, k == strlen(msg) as originally passed
in.) If k is guaranteed not to exceed BUFFER_LEN, this cannot
cause an array-bounds error.
Still, it might be better to re-design get_command_args() a bit so
that it knows how many array elements it can write on. Or,
alternatively, it could malloc() the array "args" itself, as well
as each element args[i]. (In this case, "args" will have type
"char **" instead of "char *[N]" for some constant N, but -- due
to The Rule about arrays and pointers in C -- these can be used
*as if* they were interchangeable, other than the malloc() and
free() that get the space for the "array".)
Finally, there is one other "design note", as it were: when
get_command_args() succeeds, its caller eventually needs to free
each args[i] element individually, making one call to free() for
each call get_command_args() made to malloc(). It may (or may
not) be possible and desirable to change this. In particular,
it may be possible to use *no* calls to malloc() at all, and it
is certainly possible (but maybe not desirable -- a lot depends
on code not shown here) to use just one call.
What if, instead of copying each "piece" of "msg" one piece at
a time into separately-malloc()ed memory, we copy the entire
thing into *one* malloc()ed area? This copy can then be whacked
upon as much as we like, without damaging the original. This
leads to the following routine:
/*
* Break up the provided string into a sequence of (possibly
* empty) strings separated by ARG_CHARs. For instance,
* "zero,one,,three" might become "zero" then "one" then ""
* then "three" (assuming ARG_CHAR is ',').
*
* The array args[] must be large enough to hold the result,
* but the result will not exceed strlen(msg).
*
* Failure (to obtain space) is represented by returning -1.
* An empty string in "msg" is represented by returning 0.
* Only nonempty strings result in args[] arrays.
*
* When all done, the caller must call free(args[0]).
*/
static int get_command_args(char *msg, char **args, size_t argsize) {
size_t len = strlen(msg);
char *mem, *p;
size_t i;
/* this test is optional; without it, "" is one arg */
if (len == 0)
return 0; /* empty string = no arguments, no malloc */
mem = malloc(len + 1); /* +1 for '\0' */
if (mem == NULL) {
/* you can print a message here if you wish */
return -1; /* failure; no memory allocated at all */
}
memcpy(mem, msg, len + 1); /* copy the '\0' too */
for (i = 0, p = mem;;) {
if (i >= argsize)
panic("get_command_args: i >= argsize");
/*
* At this point we have a (possibly empty) string that
* starts at "p" and goes to the next (perhaps immediate)
* ARG_CHAR. Save the start and advance p to the ARG_CHAR.
* If there are no more ARG_CHARs, p becomes NULL and we
* exit the loop.
*/
args[i++] = p;
p = strchr(p, ARG_CHAR);
if (p == NULL)
break;
*p++ = '\0'; /* clobber ARG_CHAR and move forward */
}
return i;
}
(I used the library strchr() function, rather than a small loop,
to advance p. As a result I have to check for NULL instead of
*p==0.)
Note that the caller must now pass the number of elements in the
array "args":
char *args[BUFFER_LEN];
...
n = get_command_args(msg, args, BUFFER_LEN);
or:
n = get_command_args(msg, args, sizeof args / sizeof *args);
and if the code would be about to overwrite an invalid args[i], it
calls panic() (an "internal error detected" routine, which you must
supply).
The call to malloc() above does not have the "comp.lang.c-recommended"
form:
p = malloc(sizeof *p);
or: p = malloc(n * sizeof *p);
What happened to the "sizeof"? The answer is: I was lazy and left
it out. We know that sizeof(char) is 1, and "mem" has type "char *"
so "sizeof *mem" is "sizeof(char)" which is 1. Some people will
prefer to write:
mem = malloc((len + 1) * sizeof *mem);
which is also quite correct.
Finally, you might ask: "How can we get rid of even the one remaining
malloc()?" Well, that depends on whether we can overwrite the
original "msg" array, and point into it, instead of into malloc()ed
memory. As should be fairly obvious -- at least if one draws this
diagram (note, fixed-width font required):
msg:
+---+---+---+---+---+---+---+---+---+---+---+
| o | n | e | , | t | w | o | , | , | 3 |\0 |
+---+---+---+---+---+---+---+---+---+---+---+
args:
+-----+-----+-----+-----+
| * | * | * | * |
+--|--+--|--+--|--+--|--+
| | \___ \________________
/ \________ \_____________ \
| \ \ |
v | | |
mem: v v v
+---+---+---+---+---+---+---+---+---+---+---+
| o | n | e |\0 | t | w | o |\0 |\0 | 3 |\0 |
+---+---+---+---+---+---+---+---+---+---+---+
-- the only reason we copied "msg" in the first place was to be
able to replace each ARG_CHAR (',' above) with '\0'. What if we
just replace it directly in msg[]?
Whether this is suitable depends on the lifetime of the original
"msg" buffer, and whether you plan to re-use it while still wanting
args[0] through args[3] to be valid. But it is often an option,
and it does avoid the (now lone) call to malloc().
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it
http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.