Segmentation Fault...

Verrigan

I'm trying to pass an array to a function, and change the values of that
array, but I can't seem to figure it out.. Please help me figure out what
I'm doing wrong... Any help would be appreciated... Even if it's "You can't
change the value of a passed array." :) Thanks!

Code:
static int get_command_args(char *msg, char **args)
{
char *p;
int i, loops;

i = 0;
loops = 0;
p = args[i];
while(*msg && !(*msg == END_CHAR)) {
loops++;
fprintf(stdout, "%d\n", loops);
*p++ = *msg++;
if(*msg == ARG_CHAR) {
i++;
*p = '\0';
p = args[i];
*msg++;
}
}
return i;
}

static int parse_command(int sock, char *msg)
{
char *args[10];
char buf[BUFFER_LEN];

get_command_args(msg, args);
...

gdb debugging info:
0x0804922e in get_command_args (msg=0xbfffe530
"login§verrigan§82jsheud¶\n", args=0xbfffe0c0) at interface.c:65
65 *p++ = *msg++;
(gdb) back
#0 0x0804922e in get_command_args (msg=0xbfffe530
"login§verrigan§82jsheud¶\n", args=0xbfffe0c0) at interface.c:65
#1 0x08049280 in parse_command (sock=9, msg=0xbfffe530
"login§verrigan§82jsheud¶\n") at interface.c:81

Nov 14 '05 #1

Subscribe Reply

1773

Jack Klein

On Sat, 30 Oct 2004 00:05:39 -0400, "Verrigan" <ve******@direcway.com>
wrote in comp.lang.c:

I'm trying to pass an array to a function, and change the values of that
array, but I can't seem to figure it out.. Please help me figure out what
I'm doing wrong... Any help would be appreciated... Even if it's "You can't
change the value of a passed array." :) Thanks!

Code:
static int get_command_args(char *msg, char **args)
{
char *p;
int i, loops;

i = 0;
loops = 0;
p = args[i];
First, there is almost always a way to write a loop without a
pre-initialization like the one you wrote above.

Second, this assignment requires that args[i] contain a valid pointer
value.

Third, if your data is now what you expect and contains too many
'ARG_CHARA', your program will walk past the end of the 'args' array
and produce undefined behavior. This type of sloppy handling of
buffers is the cause of many security defects in C programs.
while(*msg && !(*msg == END_CHAR)) {
loops++;
fprintf(stdout, "%d\n", loops);
*p++ = *msg++;
if(*msg == ARG_CHAR) {
i++;
*p = '\0';
p = args[i];
*msg++;
}
}
return i;
}

static int parse_command(int sock, char *msg)
{
char *args[10];
Here is where you 'args' as an uninitialized array of ten pointers to
char. The array does not contain valid pointers. Defining pointers,
separately or in arrays, does not create anything for them to point to
nor does it point them at anything.

So inside your function when you execute 'p = args[i]' statements, you
are assigning uninitialized values to 'p'. Then you try to write to
memory through the uninitialized pointer, causing undefined behavior.
char buf[BUFFER_LEN];

get_command_args(msg, args);
..

gdb debugging info:
0x0804922e in get_command_args (msg=0xbfffe530
"login§verrigan§82jsheud¶\n", args=0xbfffe0c0) at interface.c:65
65 *p++ = *msg++;
(gdb) back
#0 0x0804922e in get_command_args (msg=0xbfffe530
"login§verrigan§82jsheud¶\n", args=0xbfffe0c0) at interface.c:65
#1 0x08049280 in parse_command (sock=9, msg=0xbfffe530
"login§verrigan§82jsheud¶\n") at interface.c:81

You need to understand more about pointers, how to define them,
initialize them, and make sure they point to the proper type and
amount of memory for what you plan to do with them. Consult a book on
C programming.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Nov 14 '05 #2

Karthik Kumar

Verrigan wrote:

I'm trying to pass an array to a function, and change the values of that
array, but I can't seem to figure it out.. Please help me figure out what
I'm doing wrong... Any help would be appreciated... Even if it's "You can't
change the value of a passed array." :) Thanks!

When you post to the n.g. please post compilable code .
Find my comments inline ..
Code:
static int parse_command(int sock, char *msg)
{
char *args[10];
But you have not allocated memory for
the members of array of pointers to char . (args).
char buf[BUFFER_LEN];
Do not post unnecessary code. It distracts the attention.

get_command_args(msg, args);
..
static int get_command_args(char *msg, char **args)
{
char *p;
int i, loops;

i = 0;
loops = 0;
p = args[i];
You are assigning a pointer. Still at this point,
args[i] pointer has not yet been allocated.
while(*msg && !(*msg == END_CHAR)) {
loops++;
fprintf(stdout, "%d\n", loops);
*p++ = *msg++;
And you are dereferencing it here.
Seems like you are trying to get the tokens of a string.
Why would not you use strtok, if that is the case ?

if(*msg == ARG_CHAR) {
i++;
*p = '\0';
p = args[i];
*msg++;
}
}
return i;
}
gdb debugging info:
0x0804922e in get_command_args (msg=0xbfffe530
"login§verrigan§82jsheud¶\n", args=0xbfffe0c0) at interface.c:65
65 *p++ = *msg++;
(gdb) back
#0 0x0804922e in get_command_args (msg=0xbfffe530
"login§verrigan§82jsheud¶\n", args=0xbfffe0c0) at interface.c:65
#1 0x08049280 in parse_command (sock=9, msg=0xbfffe530
"login§verrigan§82jsheud¶\n") at interface.c:81

--
Karthik. http://akktech.blogspot.com .
' Remove _nospamplz from my email to mail me. '

Nov 14 '05 #3

Verrigan

I knew I was doing something wrong. I had been working on other functions
in the code for way too long, and let this one get to me. :)

Here is what I ended up with, and it works as expected.

...
static int get_command_args(char *msg, char *args[])
{
char *p;
int i = 0;

p = args[i];
while(*msg && !(*msg == END_CHAR)) {
if(*msg == ARG_CHAR) {
*p = '\0';
i++;
if(!(args[i] = malloc(sizeof(args[i])))) {
PERROR(malloc);
return -1;
}
p = args[i];
*msg++;
} else {
*p++ = *msg++;
}
}
*p = '\0';
return i++;
}

static int parse_command(int sock, char *msg)
{
char *args[BUFFER_LEN];

if(get_command_args(msg, args) == -1)
return -1;
/* Put the rest of the code here. */
}
...

Thanks again!

Nov 14 '05 #4

Chris Croughton

On Sat, 30 Oct 2004 18:47:22 -0400, Verrigan
<ve******@direcway.com> wrote:

I knew I was doing something wrong. I had been working on other functions
in the code for way too long, and let this one get to me. :)

Here is what I ended up with, and it works as expected.
That is, crashes with illegal address access?
..
static int get_command_args(char *msg, char *args[])
{
char *p;
int i = 0;

p = args[i];
while(*msg && !(*msg == END_CHAR)) {
if(*msg == ARG_CHAR) {
What if the first character in your 'msg' is an ARG_CHAR?
*p = '\0';
So you are writing into an unknown address (p = args[i] which hasn't
been set up yet if the first character in 'msg' is ARG_CHAR);
i++;
if(!(args[i] = malloc(sizeof(args[i])))) {
You are allocating a 4 byte (probably) string? Type of args[i] is
char*, size of that is 4 bytes on 32-bit machines. At this point you
don't know how many characters are in each argument.
PERROR(malloc);
return -1;
}
p = args[i];
*msg++;
} else {
*p++ = *msg++;
}
}
*p = '\0';
return i++;
}

And how about if there are multiple ARG_CHARs in a row, do you want them
taken as separate arguments or should they be ignored (like whitespace
in command lines)?

What if there are more arguments than you expect?

Chris C

Nov 14 '05 #5

Chris Torek

In article
<news:c7******************************@localhost.t alkaboutprogramming.com>
Verrigan <ve******@direcway.com> wrote:

I knew I was doing something wrong.
You still are, although -- in the fashion of undefined behavior --
it appears to be working at the moment anyway (hence probably just
waiting for some "important" moment to crash :-) ).
static int get_command_args(char *msg, char *args[])
{
char *p;
int i = 0;

p = args[i];
What is the value of i? (Answer: 0. We just set it to 0 right
here.) What is the value of args[i], i.e., args[0]? (Answer
depends on caller, obviously, who supplied args. Presumably the
value of "args" is valid, but args[0] had better hold some actual
value as well, because we just copied to to "p".)

Let us take a look at the caller for a moment so we can figure out
what the value of args[0] is.
static int parse_command(int sock, char *msg)
{
char *args[BUFFER_LEN];

So, in parse_command(), args has type "array (size BUFFER_LEN) of
pointer to char". If BUFFER_LEN is (say) 1000, that makes it
"array 1000 of pointer to char" -- 1000 elements, each of type
"pointer to char", and each one uninitialized.

Since "args" is an ordinary automatic variable, it is uninitialized
and thus full of garbage. args[0] is garbage, args[1] is garbage,
args[2] is garbage, and so on up through args[999] (still assuming
BUFFER_LEN is 1000).
if(get_command_args(msg, args) == -1)
return -1;
This passes the array "args" by value to the function get_command_args()
-- the place where we are trying to figure out what its args[0]
is. By The Rule about arrays and pointers in C (see
<http://web.torek.net/torek/c/pa.html>), the "value" of the array
is a pointer to its first element. Ultimately, args[i] in
get_command_args winds up meaning the same thing as args[i] here
in parse_command().

Thus, we can finally answer the question: "What is the value of
args[i], i.e., args[0], that we are going to copy into p in
get_command_args()?" The answer is: garbage. args[0] has never
been set to anything. We copy the trash value, whatever it is,
into "p". If we are really lucky, just copying the trash causes
an immediate error, so that we find the bug before anything else
goes wrong. Given what you have said elsewhere, we appear to be
unlucky: not only does copying trash into p not stop the program
immediately, but get_command_args() even runs to completion.

Let us imagine that "p" happens to point to some write-able memory
somewhere, such as the user's bank balance :-) , and press on:
while(*msg && !(*msg == END_CHAR)) {
What are the values of "msg" and "*msg"? Those depend on
parse_command() too, but it just gets them from somewhere else.
We will have to assume that they are something sensible (and
I am sure they are; I have seen plenty of code like this, and
this part is generally gotten right).

We might as well assume that *msg != 0 and *msg != END_CHAR, so
that we enter the loop. (If not we will just set *p to '\0',
clobbering the user's bank balance. You were not planning to
use that money, were you? :-) )
if(*msg == ARG_CHAR) {
Again, this depends on *msg. If, on the first trip through the
loop, *msg is ARG_CHAR, we will do this now; and likely *msg will
be ARG_CHAR at least one or two times inside the loop and we will
do this at least once or twice, after doing the "else" below a
few times as well.
*p = '\0';
Clobber the user's bank balance (on the first ARG_CHAR anyway),
then:
i++;
change i to 1 (on the first ARG_CHAR anyway; change it to 2 on
the 2nd ARG_CHAR, and so on)...
if(!(args[i] = malloc(sizeof(args[i])))) {
Here is another "guaranteed bug".

Calls to malloc should generally have one of two forms:

ptr = malloc(sizeof *ptr); /* to get one object */
or: ptr = malloc(n * sizeof *ptr); /* to get N objects */

This call looks like neither. Rather, it resembles:

ptr = malloc(sizeof ptr); /* missing "*"; maybe also need "n *" */

or in this case:

args[i] = malloc(sizeof args[i]); /* still missing a "*" */

So, this asks malloc() for "sizeof args[i]" bytes -- probably 2 or
4 but perhaps even 8 now, all depending on sizeof(char *) on your
particular box.

All of this is wrapped inside a test for malloc() failing (which
is good!), and if malloc fails we have a funny-looking thing that
must be a macro (from the all-caps):
PERROR(malloc);
return -1;
}
(Of course, by returning -1, we may lose some memory -- there could
have been some earlier, succeeding malloc()s -- but if the program
itself is going to exit soon, this is just a little sloppy, as long
as the system releases all memory on program exit.)

If the malloc() succeeds, on the other hand, we copy the result
into "p" for additional trips around the while loop, then advance
over the ARG_CHAR in *msg:
p = args[i];
*msg++;
} else {
This is the "*msg is not an ARG_CHAR" case:
*p++ = *msg++;
Here we copy the character from *msg to *p, advancing over the
copied character in both. Again, p is either somewhere in the
user's bank balance (which we are still faithfully clobbering) when
i == 0, or is based on the result of malloc() (i != 0).

If i != 0 (so that p *is* based on a malloc() call), how many chars
have we copied into the pointer we got from malloc()? We can in
fact find out: args[i] and p were initially the same, and p gets
incremented after each "char" is copied into *p, so the number of
"char"s copied so far -- either before or after "*p++ = *msg++" --
must be the same as (p - args[i]). This will count up: 0, 1, 2,
3, ..., as we copy "char"s from *msg++ to *p++. Each time *msg is
not an ARG_CHAR, we copy another one.

How many "char"s *can* we copy before we "copy too many"? The
answer depends on sizeof(char *), because we only asked malloc()
for sizeof(char *) bytes.[%] How many "char"s *will* we copy?
That depends on how many there are between each ARG_CHAR.

Note that each one will have a terminating '\0' added, either
on the earlier "*p = '\0'" line, or the one about to come up:
}
}
*p = '\0';
This finishes off the last one (or, if i==0, finishes clobbering
the user's bank balance).
-----
[%footnote

Remember that, in C:

A byte is a char and a char is a byte
Eight bits is common but nine is in sight
Digital sig-i-nal processors? Whew!
There you may find that you've got thirty-two!

Each "C byte" is CHAR_BIT bits long, and CHAR_BIT is allowed to
be more than 8. It has to be *at least* 8. Most systems make
it exactly 8, because that works very well.]
-----
return i++;
}

This "return" is a peculiar statement. "i++" says "find the current
value of i, and schedule i+1 to be written back to i some time
before the next sequence point." In other words, increment i, but
also hang on to the original value, as the value of the entire
expression. We then return the saved/original value. The act of
returning destroys the variable -- so why did we increment it?

This is not *wrong*, it is just peculiar. If get_command_args()
has made two strings for args[0] and args[1] (so that i==1), it
returns 1, not 2. If it has made just one, it returns 0, not 1.
It is impossible for it to make none -- it always writes a '\0'
byte. (It happens to fail to set args[0] at all, but this is a
separate bug, which presumably we might fix by having it malloc()
something for args[0] as well.)

So, the two big problems here are:

- args[0] is never malloc()ed at all, and
- args[i], i != 0, *is* malloc()ed, but the number of bytes
(aka "char"s) malloc()ed is not based on the number of bytes
that will be written.

A possible bug, but perhaps it is not a bug at all but rather part
of the design, is that some args[i]s may be empty strings. For
instance, if msg[2] is ARG_CHAR and msg[3] is *also* ARG_CHAR,
this winds up represented in the args[] array as an empty string.

Another "medium-likely" bug is that get_command_args() returns 0
even when given a completely empty string. Remember that returning
0 means args[0] is valid -- we return one less than the number of
strings we made. Thus, the empty string in "msg" represents one,
instead of zero, "args". Probably it should represent no args at
all, and get_command_args() should, e.g., return 2 when args[0]
and args[1] are both valid.

One other possible bug is the dreaded "buffer overflow". The
get_command_args() function has no idea how many args[i]'s it
can write on. The caller -- parse_command(), in this case --
provided BUFFER_LEN of them, without telling get_command_args()
that it did in fact provide BUFFER_LEN of them. This may well
be OK, because we know that get_command_args() can only write
at most "k" args[] elements, where k is the number of non-'\0'
bytes in msg[]. (That is, k == strlen(msg) as originally passed
in.) If k is guaranteed not to exceed BUFFER_LEN, this cannot
cause an array-bounds error.

Still, it might be better to re-design get_command_args() a bit so
that it knows how many array elements it can write on. Or,
alternatively, it could malloc() the array "args" itself, as well
as each element args[i]. (In this case, "args" will have type
"char **" instead of "char *[N]" for some constant N, but -- due
to The Rule about arrays and pointers in C -- these can be used
*as if* they were interchangeable, other than the malloc() and
free() that get the space for the "array".)

Finally, there is one other "design note", as it were: when
get_command_args() succeeds, its caller eventually needs to free
each args[i] element individually, making one call to free() for
each call get_command_args() made to malloc(). It may (or may
not) be possible and desirable to change this. In particular,
it may be possible to use *no* calls to malloc() at all, and it
is certainly possible (but maybe not desirable -- a lot depends
on code not shown here) to use just one call.

What if, instead of copying each "piece" of "msg" one piece at
a time into separately-malloc()ed memory, we copy the entire
thing into *one* malloc()ed area? This copy can then be whacked
upon as much as we like, without damaging the original. This
leads to the following routine:

/*
* Break up the provided string into a sequence of (possibly
* empty) strings separated by ARG_CHARs. For instance,
* "zero,one,,three" might become "zero" then "one" then ""
* then "three" (assuming ARG_CHAR is ',').
*
* The array args[] must be large enough to hold the result,
* but the result will not exceed strlen(msg).
*
* Failure (to obtain space) is represented by returning -1.
* An empty string in "msg" is represented by returning 0.
* Only nonempty strings result in args[] arrays.
*
* When all done, the caller must call free(args[0]).
*/
static int get_command_args(char *msg, char **args, size_t argsize) {
size_t len = strlen(msg);
char *mem, *p;
size_t i;

/* this test is optional; without it, "" is one arg */
if (len == 0)
return 0; /* empty string = no arguments, no malloc */

mem = malloc(len + 1); /* +1 for '\0' */
if (mem == NULL) {
/* you can print a message here if you wish */
return -1; /* failure; no memory allocated at all */
}

memcpy(mem, msg, len + 1); /* copy the '\0' too */

for (i = 0, p = mem;;) {
if (i >= argsize)
panic("get_command_args: i >= argsize");

/*
* At this point we have a (possibly empty) string that
* starts at "p" and goes to the next (perhaps immediate)
* ARG_CHAR. Save the start and advance p to the ARG_CHAR.
* If there are no more ARG_CHARs, p becomes NULL and we
* exit the loop.
*/
args[i++] = p;
p = strchr(p, ARG_CHAR);
if (p == NULL)
break;
*p++ = '\0'; /* clobber ARG_CHAR and move forward */
}

return i;
}

(I used the library strchr() function, rather than a small loop,
to advance p. As a result I have to check for NULL instead of
*p==0.)

Note that the caller must now pass the number of elements in the
array "args":

char *args[BUFFER_LEN];
...
n = get_command_args(msg, args, BUFFER_LEN);

or:

n = get_command_args(msg, args, sizeof args / sizeof *args);

and if the code would be about to overwrite an invalid args[i], it
calls panic() (an "internal error detected" routine, which you must
supply).

The call to malloc() above does not have the "comp.lang.c-recommended"
form:

p = malloc(sizeof *p);
or: p = malloc(n * sizeof *p);

What happened to the "sizeof"? The answer is: I was lazy and left
it out. We know that sizeof(char) is 1, and "mem" has type "char *"
so "sizeof *mem" is "sizeof(char)" which is 1. Some people will
prefer to write:

mem = malloc((len + 1) * sizeof *mem);

which is also quite correct.

Finally, you might ask: "How can we get rid of even the one remaining
malloc()?" Well, that depends on whether we can overwrite the
original "msg" array, and point into it, instead of into malloc()ed
memory. As should be fairly obvious -- at least if one draws this
diagram (note, fixed-width font required):

msg:
+---+---+---+---+---+---+---+---+---+---+---+
| o | n | e | , | t | w | o | , | , | 3 |\0 |
+---+---+---+---+---+---+---+---+---+---+---+

args:
+-----+-----+-----+-----+
| * | * | * | * |
+--|--+--|--+--|--+--|--+
| | \___ \________________
/ \________ \_____________ \
| \ \ |
v | | |
mem: v v v
+---+---+---+---+---+---+---+---+---+---+---+
| o | n | e |\0 | t | w | o |\0 |\0 | 3 |\0 |
+---+---+---+---+---+---+---+---+---+---+---+

-- the only reason we copied "msg" in the first place was to be
able to replace each ARG_CHAR (',' above) with '\0'. What if we
just replace it directly in msg[]?

Whether this is suitable depends on the lifetime of the original
"msg" buffer, and whether you plan to re-use it while still wanting
args[0] through args[3] to be valid. But it is often an option,
and it does avoid the (now lone) call to malloc().
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Nov 14 '05 #6

Similar topics