pkirk25 wrote:
I wonder if anyone has time to write a small example program based on
this data or to critique my own effort?
A file called Realm List.html contains the following data:
Bladefist-Horde
Nordrassil-Horde
Draenor-Alliance
Nordrassil-Alliance
Nordrassil-Neutral
Draenor-Horde
Moonglade-Horde
Shadowmoon-Horde
Moonglade-Alliance
This is not a typical format for a .html file. It looks like straight
text.
I need an algorithm that takes from this list the names of Realms that
we have both Alliance and Horde data for and to list them in
alphabetical order. My desired format of this list will be:
Draenor-Alliance
Draenor-Horde
Moonglade-Alliance
Moonglade-Horde
Nordrassil-Alliance
Nordrassil-Horde
So you want to sort the lines of text alphabetically right?
I'm committed to using ANSI C with the K&R book as my guide. It says a
string is an array of chars.
In straight C strings are just a rats nest of pain and suffering. C
handles string within arrays of chars, however, it doesn't do a lot to
help you manage the array of chars to begin with. I.e., the problems
of memory management are basically yours to solve.
Right now I can open the file, I can use strtok to split each line but
I can't work out how to create an array of these strings?
If you were looking at a language like Perl or Python this kind of
thing is just 4 or 5 lines of code. What you want is some way to do
something similar in C.
By using more advanced strings (i.e., not the crap that the language
comes with), this is possible in the C language. I have written a
string library called "The Better String Library" which is perfectly
suited for this. Using this library leads you to:
#include <stdio.h>
#include <stdlib.h>
#include "bstrlib.h"
/* The string comparison function for qsort */
static int lineCmp (const bstring * s0, const bstring * s1) {
return bstrcmp (*s0, *s1); /* Deal with the extra indirection */
}
int main () {
FILE * fp = fopen ("Realm List.html", "r");
if (fp) {
bstring b = bread ((bNread) fread, fp); /* Read the whole input */
struct bstrList * sl = bsplit (b, '\n'); /* Build the array of
lines */
fclose (fp);
if (sl) {
qsort (sl->entry, sl->qty, sizeof (bstring), lineCmp); /* Sort
the lines */
fp = fopen ("Realm List.html", "w"); /* Overwrite the input file
*/
if (fp) {
int i;
for (i=0; i < sl->qty; i++) { /* Write each line */
fprintf (fp, "%s\n", (char *) sl->entry[i]->data);
}
fclose (fp); /* Commit the file */
}
bstrListDestroy (sl); /* Clean up string list */
}
bdestroy (b); /* Clean up input string */
}
return 0;
}
Not quite as compact as you can get from other modern languages, but a
whole hell of a lot better than what you can achieve in straight C.
Notice how there are no mysterious constants on array declarations that
usually either over-commits memory or under allocates it. You can
download the library here:
http://bstring.sf.net/
FILE *realmList;
if((realmList=f open("Realm List.html","w+" )) == NULL)
"w+" mode means append to current file in write mode. So you will end
up just increasing the size of the input file, rather than overwriting
it with its sorted contents.
{
printf("Couldn' t open \"Realm List.html\" for output\n");
return 1;
}
/*
List Realms and count of each
*/
char strLine[244];
Where does the number 244 come from? (Hint: you don't know, and
neither do I, nor does K&R nor anyone else.)
while (fgets(strLine, 244, realmList))
{
char *tmp = NULL;
char token[] = "-";
tmp = strtok(strLine, token);
This tokenization is not useful. Lines are split by '\n', however this
is already taken care of by the fgets(). With lexical sorting, the
hyphened format of your input data is intrinsically taken into account.
Your concern here is to *store* the line somewhere, not parse it. The
problem is that as you run through the loop, you are going to overwrite
the contents of strLine each time -- i.e., you can only hold storage
for one line at a time. What you need to is to create new storage for
each line, and retain a copy of each line in that storage. You then
have the additional problem that you need to hold the entire collection
of allocated lines in such a way as to come back and be able to sort
them.
C has a function for sorting called "qsort" but that requires that your
lines be stuffed in an array. Perhaps we can have some more "magical
numbers":
char * lineArray[23726]; /* I literally mashed the keyboard for that
number */
then somewhere along that way we could use a counter to store the
results of each read -- but we still need storage for each line. So we
learn about "malloc" and we can try to cobble together a solution:
char strLine[244], * lineArray[23726];
int counter = 0;
if (NULL != (realmList = fopen ("Realm List.html", "r"))) {
while (fgets(strLine, 244, realmList)) {
char * storage = (char *) malloc (1+strlen (strLine));
if (storage) {
strcpy (storage, strLine);
lineArray[counter] = storage; /* This may overflow */
counter++;
} else {
printf ("We ran out of memory\n");
return 1;
}
}
fclose (realmList);
qsort (lineArray, counter, sizeof (char *), pstrcmp);
/* Write the file ... */
/* Free the memory (if you want to be a good citizen) */
}
And you are going to have to make a pstrcmp() function which takes a
pair of char **'s and calls strcmp on the once dereferenced parameters.
The problem with these kinds of typical C solutions is that you are
limited to lines of length at most 243 (before truncation and a
resulting input corruption happens), and you can hold at most 23726
lines (before the program overflows and causes undefined behaviour).
So all that work, and still the solution is just not satisfactory. (We
could avoid the overflow by putting a test on the value of counter as
it is incremented and just break out of the loop -- but this just leads
to another truncation condition.)
The solution that uses Bstrlib given above has no such limitation --
you are only limited by available memory. The Bstrlib version will
never give you erroneous results -- errors (in Bstrlib running out of
memory is the only real error) are deterministical ly detectable and you
can abort the whole process as the errors are detected, or use some
other strategy to proceed if you like.
--
Paul Hsieh
http://www.pobox.com/~qed/ http://bstring.sf.net/