saraca means ashoka tree wrote:
The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?. Is there still
possible way to optimize the code. Am I using everything as per the
text book. I am just apprehensive whether this may break or may cause a
memmory leak. Any chance for it.
[code snipped; see up-thread]
Before worrying about efficiency, worry about correctness.
You use malloc() and realloc() without checking for failure,
the way you use realloc() will cause a memory leak if realloc()
ever fails, the insertion of '\0' can run off the end of your
allocated region, an empty tag "<>" will leave you with a
non-string lacking the terminal '\0', and a '<' without a
matching '>' will send your code completely off the rails.
Once you've fixed these five bugs (and any others I didn't
happen to spot in your badly-indented code), you can start
measuring the performance of your program to see whether any
efficiency improvements are needed. Keep in mind that if it
takes you one hour to improve the speed by one millisecond,
you must run the program 3.6 million times just to break even.
If efficiency improvements are needed (as they well may be;
your code as it stands is far from tight), here are four
suggestions. Note that the C language itself has no notion of
"efficiency ," so the actual effect of these suggestions will
vary from platform to platform. As a practical matter, all
four are likely to improve matters, but this is not guaranteed.
Again, you must measure.
Suggestion #1: Learn how to use the strchr() function,
because it can probably locate the '<' and '>' characters
faster than you can. Don't reinvent the wheel.
Suggestion #2: If all you need to do is print out the
substrings between '<' and '>', print them directly from the
source buffer and get rid of the malloc() and realloc() calls.
Learn how to use the "%.*s" format specification, or learn how
to use fwrite().
Suggestion #3: If your real program needs to store the
substrings somewhere instead of just printing them out, don't
allocate memory until you've located the closing '>' and know
how much space you'll need. This avoids wasting memory when
you get a short substring, and avoids the overhead of realloc()
when you get a long one.
Suggestion #4: Learn how to use the memcpy() function,
because it can probably copy characters from the big string
to your destination area faster than you can. (It will
almost certainly do better than your current practice of
storing most destination positions twice!) Don't reinvent
the wheel.
--
Er*********@sun .com