On Mon, 29 Dec 2003 13:35:50 -0000, "Ian Todd"
<ha********@REMOVEyahooME.com> wrote:
Hi,
I am trying to read in a list of data from a file. Each line has a string in
its first column. This is what i want to read. I could start by saying
char[1000][] to read in 1000 lines to the array( i think!!).
But I want to use malloc. Each string is at most 50 characters long, and
there may be zero to thousands of lines. How do I actually start the array?
I have seen char **array etc. At first I tried char *array[50] but I think
that gives 50 pointers to chars :( .How do I use malloc to set space for the
array once I know how many lines there are? is it
array=malloc(lines*sizeof(char *)) ??
Finally, I want to print each of the strings using printf. How do I access
say the 30th line?
I have sucessfully done this a few times with doubles, ints etc, but now
with strings it seems a 'two dimesional' problem and I'm confused with the
pointer aspect of it.
Can anyone start me off on a simple solution?
"A string is a contiguous sequence of characters terminated by and
including the first null character" (n869, section 7.1.1, paragraph
1). While not always technically precise, we tend to refer to this
sequence as an array. You imply that you want to store the data in an
array of strings. So it is a 2-d situation as you surmise.
There are two popular solutions, an "array of pointers" and a "pointer
to an array." Even though they are significantly different, the two
share a common syntax in referring to the individual strings and to
the characters that make up the strings. While this common syntax
simplifies the language it also causes some pervasive confusion.
If T is an object type, then T* is type pointer to T. If we define
T* p;
then p is a pointer to T and when initialized with a non-NULL value
points to exactly one object of type T. However, we frequently allow
p to point to the first of many objects of type T, with the objects in
a contiguous sequence. We then treat p as if it were an array of T
and use p[i] to refer to the i-th object in the array.
For the "array of pointers" approach:
Define a pointer to pointer to char (char** pp;).
Allocate space for some number of pointers to char
(pp = malloc(N * sizeof *pp);). Note that pp now points to the first
on N pointers to char. Even though these pointers are currently
uninitialized, we can still talk of pp as an array of N pointers or
pointing to such an array.
Read the next string from the file into a buffer.
Determine the length of the string.
Allocate enough space to hold the string
(pp[i] = malloc(length);). Note that pp[i] now points to the first of
*length* char. Even though these char are currently uninitialized, we
can still talk of pp[i] as an array of *length* char or pointing to
such an array.
Copy the string from the buffer to the memory pointed to by pp[i]
or, almost equivalently, copy the string from the buffer to the array
pp[i].
Loop back and read the next string.
Things to note:
The size of each string is independent of the other strings.
If viewed in the common left justified tabular form, the right ends of
the strings would not line up. This is often referred to as a jagged
array.
The value in pp is an address.
The value at that address is another address. This value is
the first of N such addresses. You refer to any particular value with
the expression pp[i].
Each pp[i] is the address of a char. This char is the first
char in a contiguous sequence of char terminated by a null character.
Therefore, this sequence is a string. We can say pp[i] points to the
string. Furthermore, we can refer to any particularly character in
the i-th string with the expression pp[i][j].
If you ever determine that the number of strings exceeds N,
you can use realloc to cause pp to point to a larger area capable of
holding more pointers. This is why pp is sometimes referred to as a
dynamic 2-d array.
For the "pointer to an array" approach:
Determine the maximum length of any string (you said 50).
Define a pointer to an array of *maximum* char
(char (*pa)[50];). Note that this is truly a pointer to an array, not
a pointer to the first of a sequence that we are treating as an array.
When we say a char * points to an array or string, we are using verbal
shorthand to avoid the cumbersome expression in the previous sentence
and we are being a little imprecise. When we say pa points to an
array, we are being very precise.
Allocate space for some number of such arrays
(pa = malloc(N * sizeof *pa);). Note that pa now points to the first
of N objects. Even though these objects are uninitialized, we can
still talk of pa as an array of N object or as pointing to such an
array. The fact is that each object is an array of 50 char. So we
talk of pa as an array of N arrays of 50 char. We can use pa[i] to
refer to the i-th object so pa[i] is the i-th array of 50 char.
Read the next string from the file into a buffer.
Copy the string from the buffer to the array pa[i].
Loop back and read the next string.
Things to note.
Each string is housed in an array of 50 char.
The value in pa is an address.
The object at that address is an array of 50 char. This
object is the first of N such objects. You refer to any particular
array as pa[i]. You can refer to a particular character in this array
with pa[i][j].
If you ever determine that the number of strings exceeds N,
you can use realloc to cause pa to point to a larger area capable of
holding more arrays. This is why pa is sometimes referred to as a
dynamic 2-d array. Unlike pp above which is dynamic in both
dimensions, pa is dynamic only in the first dimension. The second
dimension is fixed (at 50 in this example).
<<Remove the del for email>>