By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,780 Members | 1,550 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,780 IT Pros & Developers. It's quick & easy.

Memory Allocation Problem, please help

P: n/a
Hi,

I've written the code that follows, and I use the function add_word(),
it seems to work fine
*before* increase_arrays() is called that uses realloc() to allocate
more memory to words. But *after* calling increase_arrays(), I
received segmentation fault. I tried to step it through gdb, and I
found out that after calling increase_arrays(), words[0]'s original
value is modified, and if I tried to access it, I get <address 0x11
out of bound>. It's there something wrong with the way I used
realloc()? Please help me out, my limited knowledge of C can carry as
far as here, and I don't think I can figure this out by myself. (I've
checked the C Faq, but still can't figure it out, now I see why people
says that memory management is a pain.). Thanks in advance.

#define INC_SIZE 3 /* number of elements the arrays should
increase.*/

static int array_size; /* size of arrays words, word_counts */
static int total_words; /* total number of words found */
static char **words; /* an array to of words occured */
static int *word_counts; /* number of the corresponding word in arrays
words appeared */

/* Increases the storage for the arrays words, word_counts
* and adjust related variables
*/
static void increase_arrays(){
int i;
int new_size = array_size + INC_SIZE;
char **tmp = realloc(*words, sizeof(char*)*(new_size+1));
int *tmp1 = realloc(word_counts, sizeof(int)*new_size);
if(!(*tmp) || ! tmp1){
fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
exit(1);
}
memset(tmp1 + array_size, 0, sizeof(int)*INC_SIZE);
*words = *tmp;
for(i = array_size; i < new_size; i++)
words[i] = NULL;
word_counts = tmp1;
array_size = new_size;
}

/* return the index of word in the array words on success,
* -1 if word does not exist in the array.
*/
static int word_index(const char *word){
int index;
for(index = 0; index < total_words; index++){
if(!strcmp(words[index], word)) /* matches*/
return index;
}
/* word does not exist */
return -1;
}

/* add word to the array words if it does not exist,
* else words is not modified.
* adjust related the word_counts, and other related
* variables.
*/
static void add_word(const char *word){
int index;
int length;
index = word_index(word);
if(index == -1){
/* word does not exist*/
if(total_words >= array_size)
increase_arrays();
length = strlen(word)+1;
words[total_words]=calloc(length, sizeof(char*));
if(words[total_words]){
strcpy(words[total_words], word);
word_counts[total_words] = 1;
total_words++;
}
else{
fprintf(stderr, "Fatal Error:failed to allocate memory.\n");
exit(1);
}
}
else{
/* word exists */
word_counts[index]++;
}
}

May 12 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
we********@gmail.com said:
Hi,

I've written the code that follows, and I use the function add_word(),
it seems to work fine
*before* increase_arrays() is called that uses realloc() to allocate
more memory to words. But *after* calling increase_arrays(), I
received segmentation fault.
I looked at your code for about half a minute, and couldn't understand
it. Clearly you don't understand it either (otherwise it would be
working). I conclude that you're trying to do too much in one go.

Modularise.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
May 12 '07 #2

P: n/a
On 12 May 2007 02:19:03 -0700, "we********@gmail.com"
<we********@gmail.comwrote:
>Hi,

I've written the code that follows, and I use the function add_word(),
it seems to work fine
*before* increase_arrays() is called that uses realloc() to allocate
more memory to words. But *after* calling increase_arrays(), I
received segmentation fault. I tried to step it through gdb, and I
found out that after calling increase_arrays(), words[0]'s original
value is modified, and if I tried to access it, I get <address 0x11
out of bound>. It's there something wrong with the way I used
realloc()? Please help me out, my limited knowledge of C can carry as
far as here, and I don't think I can figure this out by myself. (I've
checked the C Faq, but still can't figure it out, now I see why people
says that memory management is a pain.). Thanks in advance.

#define INC_SIZE 3 /* number of elements the arrays should
increase.*/

static int array_size; /* size of arrays words, word_counts */
static int total_words; /* total number of words found */
static char **words; /* an array to of words occured */
static int *word_counts; /* number of the corresponding word in arrays
words appeared */

/* Increases the storage for the arrays words, word_counts
* and adjust related variables
*/
static void increase_arrays(){
int i;
int new_size = array_size + INC_SIZE;
char **tmp = realloc(*words, sizeof(char*)*(new_size+1));
You want to realloc words, not *words. * words is the same as
words[0]. It is the pointer to the first word. You want to
reallocate the array of pointers.
> int *tmp1 = realloc(word_counts, sizeof(int)*new_size);
if(!(*tmp) || ! tmp1){
*tmp is the value of the first pointer that words used to point to.
Not the value returned by realloc. If realloc actually failed, tmp
would be NULL and *tmp would invoke undefined behavior. You want
if (!tmp || !tmp1)
> fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
exit(1);
Use EXIT_FAILURE, not 1.
> }
memset(tmp1 + array_size, 0, sizeof(int)*INC_SIZE);
*words = *tmp;
words = tmp;
> for(i = array_size; i < new_size; i++)
words[i] = NULL;
word_counts = tmp1;
array_size = new_size;
}

/* return the index of word in the array words on success,
* -1 if word does not exist in the array.
*/
static int word_index(const char *word){
int index;
for(index = 0; index < total_words; index++){
if(!strcmp(words[index], word)) /* matches*/
return index;
}
/* word does not exist */
return -1;
}

/* add word to the array words if it does not exist,
* else words is not modified.
* adjust related the word_counts, and other related
* variables.
*/
static void add_word(const char *word){
int index;
int length;
index = word_index(word);
if(index == -1){
/* word does not exist*/
if(total_words >= array_size)
increase_arrays();
length = strlen(word)+1;
words[total_words]=calloc(length, sizeof(char*));
if(words[total_words]){
strcpy(words[total_words], word);
word_counts[total_words] = 1;
total_words++;
}
else{
fprintf(stderr, "Fatal Error:failed to allocate memory.\n");
exit(1);
}
}
else{
/* word exists */
word_counts[index]++;
}
}

Remove del for email
May 12 '07 #3

P: n/a
<we********@gmail.comha scritto nel messaggio
news:11**********************@y80g2000hsf.googlegr oups.com...
I've written the code that follows, and I use the function add_word(),
it seems to work fine
*before* increase_arrays() is called that uses realloc() to allocate
more memory to words. But *after* calling increase_arrays(), I
received segmentation fault. I tried to step it through gdb, and I
found out that after calling increase_arrays(), words[0]'s original
value is modified, and if I tried to access it, I get <address 0x11
out of bound>. It's there something wrong with the way I used
realloc()? Please help me out, my limited knowledge of C can carry as
far as here, and I don't think I can figure this out by myself. (I've
checked the C Faq, but still can't figure it out, now I see why people
says that memory management is a pain.). Thanks in advance.
I won't bother to correct such a mess, I'm just signaling random
errors I'm finding.
#define INC_SIZE 3 /* number of elements the arrays should
increase.*/

static int array_size; /* size of arrays words, word_counts */
static int total_words; /* total number of words found */
static char **words; /* an array to of words occured */
static int *word_counts; /* number of the corresponding word in arrays
words appeared */

/* Increases the storage for the arrays words, word_counts
* and adjust related variables
*/
static void increase_arrays(){
int i;
int new_size = array_size + INC_SIZE;
char **tmp = realloc(*words, sizeof(char*)*(new_size+1));
what are you doing? tmp points to a char*, while *words points to
a char. I think you meant realloc(words, ...)
int *tmp1 = realloc(word_counts, sizeof(int)*new_size);
if(!(*tmp) || ! tmp1){
*tmp is uninitialized. You want to check tmp itself.
fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
exit(1);
Use exit(EXIT_FAILURE), it is portable.
}
memset(tmp1 + array_size, 0, sizeof(int)*INC_SIZE);
*words = *tmp;
for(i = array_size; i < new_size; i++)
words[i] = NULL;
word_counts = tmp1;
array_size = new_size;
}

/* return the index of word in the array words on success,
* -1 if word does not exist in the array.
*/
static int word_index(const char *word){
int index;
for(index = 0; index < total_words; index++){
if(!strcmp(words[index], word)) /* matches*/
return index;
}
/* word does not exist */
return -1;
}

/* add word to the array words if it does not exist,
* else words is not modified.
* adjust related the word_counts, and other related
* variables.
*/
static void add_word(const char *word){
int index;
int length;
index = word_index(word);
if(index == -1){
/* word does not exist*/
if(total_words >= array_size)
increase_arrays();
length = strlen(word)+1;
words[total_words]=calloc(length, sizeof(char*));
if(words[total_words]){
strcpy(words[total_words], word);
word_counts[total_words] = 1;
total_words++;
}
else{
fprintf(stderr, "Fatal Error:failed to allocate memory.\n");
exit(1);
}
}
else{
/* word exists */
word_counts[index]++;
}
}

May 12 '07 #4

P: n/a
"we********@gmail.com" <we********@gmail.comwrites:
Hi,

I've written the code that follows, and I use the function add_word(),
it seems to work fine
*before* increase_arrays() is called that uses realloc() to allocate
more memory to words. But *after* calling increase_arrays(), I
received segmentation fault. I tried to step it through gdb, and I
found out that after calling increase_arrays(), words[0]'s original
value is modified, and if I tried to access it, I get <address 0x11
out of bound>. It's there something wrong with the way I used
realloc()? Please help me out, my limited knowledge of C can carry as
far as here, and I don't think I can figure this out by myself. (I've
checked the C Faq, but still can't figure it out, now I see why people
says that memory management is a pain.). Thanks in advance.
Full marks for:
(1) Checking the FAQ
(2) Having a reasonable stab at finding the problem (you actually did
find it, you just did not see that you did!).
(3) Checking your allocation return values.
(4) Not casting the return from calloc/realloc,

I've added some comments unrelated to the original problem.
#define INC_SIZE 3 /* number of elements the arrays should
increase.*/

static int array_size; /* size of arrays words, word_counts */
static int total_words; /* total number of words found */
static char **words; /* an array to of words occured */
static int *word_counts; /* number of the corresponding word in arrays
words appeared */

/* Increases the storage for the arrays words, word_counts
* and adjust related variables
*/
When you have two arrays that grow in lock-step like this it is often
easier to have one array whose elements are a struct:

struct word_count {
char *word;
int count;
};

static struct word_count *word_counts;
static void increase_arrays(){
int i;
int new_size = array_size + INC_SIZE;
It is often better to use a multiplicative growth strategy (you need
to remember to deal with the initial size being zero, though).
char **tmp = realloc(*words, sizeof(char*)*(new_size+1));
I am puzzled by the +1 here. Are you being cautious? It is much
better to be *sure* what size you need a write that.
int *tmp1 = realloc(word_counts, sizeof(int)*new_size);
if(!(*tmp) || ! tmp1){
fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
exit(1);
}
memset(tmp1 + array_size, 0, sizeof(int)*INC_SIZE);
*words = *tmp;
Here is you primary problem. You wanted to say "words = tmp;".
*words is the same as word[0] which is why you saw it change
unexpectedly.
for(i = array_size; i < new_size; i++)
words[i] = NULL;
word_counts = tmp1;
array_size = new_size;
}

/* return the index of word in the array words on success,
* -1 if word does not exist in the array.
*/
static int word_index(const char *word){
int index;
for(index = 0; index < total_words; index++){
if(!strcmp(words[index], word)) /* matches*/
return index;
}
/* word does not exist */
return -1;
}

/* add word to the array words if it does not exist,
* else words is not modified.
* adjust related the word_counts, and other related
* variables.
*/
static void add_word(const char *word){
int index;
int length;
index = word_index(word);
if(index == -1){
/* word does not exist*/
if(total_words >= array_size)
increase_arrays();
length = strlen(word)+1;
words[total_words]=calloc(length, sizeof(char*));
You want to allocate length * sizeof char (not char *). By since char
* is at least as big as char you got away with it.

I'd write:

words[total_words] = malloc(length);

because you don't need to zero the data (you replace is at once) and
sizeof char is 1 *by definition*.

A tiny point: I bet 9 out of 10 C programmers would write length =
strlen(word) and then malloc(length + 1).
if(words[total_words]){
strcpy(words[total_words], word);
word_counts[total_words] = 1;
total_words++;
}
else{
fprintf(stderr, "Fatal Error:failed to allocate memory.\n");
exit(1);
}
}
else{
/* word exists */
word_counts[index]++;
}
}
A small quibble: you *should* have posted a whole program (with
headers and main).

--
Ben.
May 12 '07 #5

P: n/a
On May 12, 6:33 pm, Ben Bacarisse <ben.use...@bsb.me.ukwrote:
"weidong...@gmail.com" <weidong...@gmail.comwrites:
Hi,
I've written the code that follows, and I use the function add_word(),
it seems to work fine
*before* increase_arrays() is called that uses realloc() to allocate
more memory to words. But *after* calling increase_arrays(), I
received segmentation fault. I tried to step it through gdb, and I
found out that after calling increase_arrays(), words[0]'s original
value is modified, and if I tried to access it, I get <address 0x11
out of bound>. It's there something wrong with the way I used
realloc()? Please help me out, my limited knowledge of C can carry as
far as here, and I don't think I can figure this out by myself. (I've
checked the C Faq, but still can't figure it out, now I see why people
says that memory management is a pain.). Thanks in advance.

Full marks for:
(1) Checking the FAQ
(2) Having a reasonable stab at finding the problem (you actually did
find it, you just did not see that you did!).
(3) Checking your allocation return values.
(4) Not casting the return from calloc/realloc,

I've added some comments unrelated to the original problem.
#define INC_SIZE 3 /* number of elements the arrays should
increase.*/
static int array_size; /* size of arrays words, word_counts */
static int total_words; /* total number of words found */
static char **words; /* an array to of words occured */
static int *word_counts; /* number of the corresponding word in arrays
words appeared */
/* Increases the storage for the arrays words, word_counts
* and adjust related variables
*/

When you have two arrays that grow in lock-step like this it is often
easier to have one array whose elements are a struct:

struct word_count {
char *word;
int count;

};

static struct word_count *word_counts;
static void increase_arrays(){
int i;
int new_size = array_size + INC_SIZE;

It is often better to use a multiplicative growth strategy (you need
to remember to deal with the initial size being zero, though).
char **tmp = realloc(*words, sizeof(char*)*(new_size+1));

I am puzzled by the +1 here. Are you being cautious? It is much
better to be *sure* what size you need a write that.
int *tmp1 = realloc(word_counts, sizeof(int)*new_size);
if(!(*tmp) || ! tmp1){
fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
exit(1);
}
memset(tmp1 + array_size, 0, sizeof(int)*INC_SIZE);
*words = *tmp;

Here is you primary problem. You wanted to say "words = tmp;".
*words is the same as word[0] which is why you saw it change
unexpectedly.
for(i = array_size; i < new_size; i++)
words[i] = NULL;
word_counts = tmp1;
array_size = new_size;
}
/* return the index of word in the array words on success,
* -1 if word does not exist in the array.
*/
static int word_index(const char *word){
int index;
for(index = 0; index < total_words; index++){
if(!strcmp(words[index], word)) /* matches*/
return index;
}
/* word does not exist */
return -1;
}
/* add word to the array words if it does not exist,
* else words is not modified.
* adjust related the word_counts, and other related
* variables.
*/
static void add_word(const char *word){
int index;
int length;
index = word_index(word);
if(index == -1){
/* word does not exist*/
if(total_words >= array_size)
increase_arrays();
length = strlen(word)+1;
words[total_words]=calloc(length, sizeof(char*));

You want to allocate length * sizeof char (not char *). By since char
* is at least as big as char you got away with it.

I'd write:

words[total_words] = malloc(length);

because you don't need to zero the data (you replace is at once) and
sizeof char is 1 *by definition*.

A tiny point: I bet 9 out of 10 C programmers would write length =
strlen(word) and then malloc(length + 1).
if(words[total_words]){
strcpy(words[total_words], word);
word_counts[total_words] = 1;
total_words++;
}
else{
fprintf(stderr, "Fatal Error:failed to allocate memory.\n");
exit(1);
}
}
else{
/* word exists */
word_counts[index]++;
}
}

A small quibble: you *should* have posted a whole program (with
headers and main).

--
Ben.

Thanks for the advice, which has been the most helpful, and I've
finally got my modified code running. (see code that follows) I am
wondering how I can use realloc() to reallocate memory for an array of
pointer to character string? I tried to use this in the previous code:
char **character_array = malloc(sizeof(char*)*ARRAY_SIZE);

/* This is used in the increase_array()*/
void increase_array(){
char **new_character_array = realloc(character_array,
sizeof(char*)*(ARRAY_SIZE*2));
character_array = new_character_array;
}

I changed the original *character_array = *new_character_array to
character_array = new_character_array. But
I still got segmentation fault, so, how should the above code should
be written? Thanks again.

/* Entropy, a program to calculate the entropy of a file with the
formula:
----
H(S) = \ (P )*log ( 1/(P ))
/ i 2 i
----
i

i.e. H(S) = sum of P[i]*log(base=2, 1/P[i]) for i in 0 to
sizeof(S)

build with -lm
*/
#define INC_SIZE 3 /* number of elements the arrays should
increase.*/
#define MAX_CHAR_LINE 400 /* Maximum number of char in one line */

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

/* this can be made to a structure.*/
typedef struct word_tag{
char *word;
int count;
}word;

/* This time, I use a struct, instead of separate arrays. */
struct{
word *words;
int used;
int total;
}array;

static int total_word_count;

static double log_base_2(double i);
static double entropy(double probability, int total_occurance);
static void init_arrays();
static void increase_arrays();
static int word_index(const char *word);
static void add_word(const char *word);

/* calculate the logarithm of i to the base of 2.
* log2() maybe used in C99 implementation, but this conforms to C89
standard.*/
static double log_base_2(double i){
return log(i)/log(2);
}

/* calculate the entropy of the one word. */
static double entropy(double probability, int total_occurance){
return total_occurance * probability * log_base_2(1/probability);
}

/* Initialise the arrays words, word_counts, and the variables
array_size
* and total_words.
*/
static void init_arrays(){
int i;
array.words = malloc(sizeof(array)*INC_SIZE);
if(!array.words){
fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
exit(1);
}
for(i = 0; i < INC_SIZE; i++){
array.words[i].word = NULL;
array.words[i].count = 0;
}
array.total = INC_SIZE;
array.used = 0;
total_word_count = 0;
}

/* Increases the storage for the arrays words, word_counts
* and adjust related variables
*/
static void increase_arrays(){
int i;
int new_size = array.total + INC_SIZE;
/* an array of pointer to char* with size new_size*/
word *tmp_words = realloc(array.words, sizeof(word)*new_size);
if(!tmp_words){
fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
exit(1);
}
array.words = tmp_words;
for(i = array.total; i < new_size; i++)
array.words[i].word = NULL;
array.total = new_size;
}

/* return the index of word in the array words on success,
* -1 if word does not exist in the array.
*/
static int word_index(const char *word){
int index;
for(index = 0; index < array.used; index++){
if(!strcmp(array.words[index].word, word)) /* matches*/
return index;
}
/* word does not exist */
return -1;
}

/* add word to the array words if it does not exist,
* else words is not modified.
* adjust related the word_counts, and other related
* variables.
*/
static void add_word(const char *word){
int index;
int length;
index = word_index(word);
if(index == -1){
/* word does not exist*/
if(array.used >= array.total)
increase_arrays();
length = strlen(word);
(array.words[array.used]).word=malloc(length+1);
if(array.words[array.used].word){
strcpy(array.words[array.used].word, word);
array.words[array.used].count = 1;
array.used +=1;
}
else{
fprintf(stderr, "Fatal Error:failed to allocate memory.\n");
exit(1);
}
}
else{
/* word exists */
array.words[index].count++;
}
total_word_count++;
}

int main(int argc, char *argv[]){
FILE *infile;
int i = 1;
int trace = 1;
double total_entropy = 0.0f;
if(argc < 2){
argv[1] = "../entropy.c";
argc = 2;
/*
printf("Usage:\n\t%s filename\n", argv[0]);
exit(1);
*/
}

infile = fopen(argv[1], "r");
if(!infile){
fprintf(stderr, "Failed to open file: %s. %s\n", argv[0],
strerror(errno));
exit(1);
}
init_arrays();
while(!feof(infile)){
char buffer[MAX_CHAR_LINE];
char *c;
char *token;
memset(buffer, 0, MAX_CHAR_LINE);
fgets(buffer, MAX_CHAR_LINE, infile);
/* replace \n \t with space to work with strtok()*/
while((c = strchr(buffer, '\n')))
*c = ' ';
while((c = strchr(buffer, '\t')))
*c = ' ';
token = strtok(buffer, " ");
while(token){
add_word(token);
token = strtok(NULL, " ");
}
}
fclose(infile);
for(i = 0; i < array.used; i++){
double itsEntropy, probability;
probability = (double)array.words[i].count / total_word_count;
itsEntropy = entropy(probability, array.words[i].count);
if(trace)
printf("%50s %10f\n", array.words[i].word, itsEntropy);
total_entropy += itsEntropy;
}
printf("total entropy: %f\n", total_entropy);
return 0;
}

May 12 '07 #6

P: n/a
static char **words; /* an array to of words occured */

In your declaration, words is a pointer to a pointer to a char. I
would have written

static char* *words;

words is a pointer to an array of (char*)s. If you have say five char*
stored in words [0], words [1], words [2], words [3] and words [4],
then you reallocate words. For example (missing the error handling)

words = realloc (words, 6 * sizeof (char *));

If you want to change the third pointer from 10 to 20 characters, you
would do (missing error handling)

words [2] = realloc (words [2], 20);

May 12 '07 #7

P: n/a
"christian.bau" <ch***********@cbau.wanadoo.co.ukwrites:
>static char **words; /* an array to of words occured */

In your declaration, words is a pointer to a pointer to a char. I
would have written

static char* *words;

words is a pointer to an array of (char*)s.
[...]

I wouldn't; I'd write "static char **words;".

We're not going to resolve the style issue here, but I don't see that
putting a space between the '*' characters implies that words points
to an array of char* rather than to just a single char*.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
May 12 '07 #8

P: n/a

<we********@gmail.comha scritto nel messaggio
news:11**********************@p77g2000hsh.googlegr oups.com...
/* calculate the logarithm of i to the base of 2.
* log2() maybe used in C99 implementation, but this conforms to C89
standard.*/
static double log_base_2(double i){
return log(i)/log(2);
}
Maybe #define log_base_2(i) (log((i))/log(2)) would do that and
won't have function call overhead.
int main(int argc, char *argv[]){
FILE *infile;
int i = 1;
int trace = 1;
double total_entropy = 0.0f;
Why on earth do you use f to make the costant a float if you use it
to initialize a double?
May 12 '07 #9

P: n/a
On 12 May 2007 10:09:25 -0700, "we********@gmail.com"
<we********@gmail.comwrote:

snip 150 obsolete lines
>Thanks for the advice, which has been the most helpful, and I've
finally got my modified code running. (see code that follows) I am
If you are going to post a completely new program, there is no need to
quote 150 lines of obsolete and irrelevant code.

snip questions about old code
>/* Entropy, a program to calculate the entropy of a file with the
formula:
----
H(S) = \ (P )*log ( 1/(P ))
/ i 2 i
----
i
Even in a monospaced font, this didn't come out right.
i.e. H(S) = sum of P[i]*log(base=2, 1/P[i]) for i in 0 to
sizeof(S)

build with -lm
*/
#define INC_SIZE 3 /* number of elements the arrays should
increase.*/
#define MAX_CHAR_LINE 400 /* Maximum number of char in one line */

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

/* this can be made to a structure.*/
typedef struct word_tag{
char *word;
int count;
}word;

/* This time, I use a struct, instead of separate arrays. */
struct{
word *words;
int used;
int total;
It doesn't hurt to place these variables inside this struct but it
doesn't buy you anything either. In this program, the only effect is
to increase the typing load.
>}array;
Using words which have a common and intuitive meaning in a completely
different way only leads to confusion. array is not an array. It is
a single instance of an anonymous (having no tag) struct.
>
static int total_word_count;

static double log_base_2(double i);
static double entropy(double probability, int total_occurance);
static void init_arrays();
If the function takes no arguments, then specify it as such:
static void init_arrays(void);
>static void increase_arrays();
static int word_index(const char *word);
static void add_word(const char *word);

/* calculate the logarithm of i to the base of 2.
* log2() maybe used in C99 implementation, but this conforms to C89
standard.*/
static double log_base_2(double i){
return log(i)/log(2);
}

/* calculate the entropy of the one word. */
static double entropy(double probability, int total_occurance){
return total_occurance * probability * log_base_2(1/probability);
}

/* Initialise the arrays words, word_counts, and the variables
array_size
* and total_words.
*/
static void init_arrays(){
int i;
array.words = malloc(sizeof(array)*INC_SIZE);
This is the wrong size. You don't want sizeof(array); you may have
confused yourself into thinking array was the array. You want
sizeof(word). When you are allocating memory and assigning the
address to a pointer, the allocated memory should always have the same
type as what the pointer points to. array.words points to a word.
Therefore, however many objects you are allocating space for will each
have type word and will each occupy sizeof(word) bytes.

Make life easy on yourself. Use the following form for malloc
pointer_name = malloc(object_count * sizeof *pointer_name);
> if(!array.words){
fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
exit(1);
This is not portable. Use EXIT_FAILURE instead of 1 to maximize the
number of people here who can help diagnose your code.
> }
for(i = 0; i < INC_SIZE; i++){
array.words[i].word = NULL;
array.words[i].count = 0;
}
array.total = INC_SIZE;
array.used = 0;
total_word_count = 0;
}

/* Increases the storage for the arrays words, word_counts
* and adjust related variables
*/
static void increase_arrays(){
int i;
int new_size = array.total + INC_SIZE;
/* an array of pointer to char* with size new_size*/
You don't have, and don't want, and array of pointer to char*. You
have a single pointer to word which points to the first element of a
dynamically allocated array of struct. Did using the variable name
"array" confuse you again?
> word *tmp_words = realloc(array.words, sizeof(word)*new_size);
You have it correct here?
> if(!tmp_words){
fprintf(stderr, "Fatal Error: failed to allocate memory.\n");
Use a different error message than the one you used for malloc. You
might also include the value of new_size so you get a clue about the
limits of your system.
> exit(1);
}
array.words = tmp_words;
You have now successfully expanded your array of struct.
> for(i = array.total; i < new_size; i++)
This will loop through each of the new elements of the array.
> array.words[i].word = NULL;
Neither one matters but in init_array you went to the trouble to
initialize both members of each struct in the array and here you only
initialize one.
> array.total = new_size;
}

/* return the index of word in the array words on success,
* -1 if word does not exist in the array.
*/
static int word_index(const char *word){
int index;
for(index = 0; index < array.used; index++){
if(!strcmp(array.words[index].word, word)) /* matches*/
return index;
}
/* word does not exist */
return -1;
}

/* add word to the array words if it does not exist,
* else words is not modified.
* adjust related the word_counts, and other related
* variables.
*/
static void add_word(const char *word){
int index;
int length;
index = word_index(word);
if(index == -1){
/* word does not exist*/
if(array.used >= array.total)
increase_arrays();
length = strlen(word);
(array.words[array.used]).word=malloc(length+1);
if(array.words[array.used].word){
strcpy(array.words[array.used].word, word);
array.words[array.used].count = 1;
array.used +=1;
}
else{
fprintf(stderr, "Fatal Error:failed to allocate memory.\n");
exit(1);
}
}
else{
/* word exists */
array.words[index].count++;
}
total_word_count++;
}

int main(int argc, char *argv[]){
FILE *infile;
int i = 1;
int trace = 1;
double total_entropy = 0.0f;
if(argc < 2){
argv[1] = "../entropy.c";
If argc is less than 2, argv[1] may not exist. In any event, you
never use the new values of argv[1] or argc (below) so why bother?
> argc = 2;
/*
printf("Usage:\n\t%s filename\n", argv[0]);
exit(1);
*/
}

infile = fopen(argv[1], "r");
if(!infile){
fprintf(stderr, "Failed to open file: %s. %s\n", argv[0],
argv[0] is not the name of the file. argv[1] is.
>strerror(errno));
fopen() is not required to set errno.
> exit(1);
}
init_arrays();
while(!feof(infile)){
This does not do what you want. It will cause you to attempt to
process the last line of the file twice. I'm pretty sure it is an
unintended side effect but the memset below insures that the second
attempt to process the last line recognizes end of string immediately.
> char buffer[MAX_CHAR_LINE];
char *c;
char *token;
memset(buffer, 0, MAX_CHAR_LINE);
Why? You immediately fill buffer with data from the file and fgets is
guaranteed to properly terminate the string.
> fgets(buffer, MAX_CHAR_LINE, infile);
You should always check the result of file I/O. This is where you
would find out that you are at end of file and there is no more data.
> /* replace \n \t with space to work with strtok()*/
while((c = strchr(buffer, '\n')))
*c = ' ';
while((c = strchr(buffer, '\t')))
*c = ' ';
token = strtok(buffer, " ");
while(token){
add_word(token);
token = strtok(NULL, " ");
You could eliminate the two while loops above by using " \n\t" as your
token string instead of " ".
> }
}
fclose(infile);
for(i = 0; i < array.used; i++){
double itsEntropy, probability;
probability = (double)array.words[i].count / total_word_count;
itsEntropy = entropy(probability, array.words[i].count);
if(trace)
printf("%50s %10f\n", array.words[i].word, itsEntropy);
Since your variables are doubles, you might as well use %lf and take
advantage of the extra precision.
> total_entropy += itsEntropy;
}
printf("total entropy: %f\n", total_entropy);
return 0;
}

Remove del for email
May 12 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.