473,548 Members | 2,716 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Hows my code

The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?. Is there still
possible way to optimize the code. Am I using everything as per the
text book. I am just apprehensive whether this may break or may cause a
memmory leak. Any chance for it.

#define TOKN_SIZE 256

void tagfinder() {

char ch, *tokens;
int i, j ,len;
i=j=0;
//scan buffer holds the webpage as a string.
len = strlen(scan_buf fer);

while(i < len) {
ch = scan_buffer[i++];
if(ch == '<') {
tokens = malloc(TOKN_SIZ E*sizeof(char)) ;
j=0;
while(ch != '>') {
ch = scan_buffer[i++];
if(j >= TOKN_SIZE)
tokens = realloc(tokens, (j+TOKN_SIZE) * sizeof(char));
if(ch != '>') {
tokens[j++] = ch;
tokens[j] = '\0';
}

}// end of while(ch != '>')
printf("%s\n",t okens);
free(tokens);
}//end of if(ch == '<')
}//end of while(len > 0)

}

Nov 14 '05 #1
10 1895
saraca means ashoka tree wrote:
The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?. Is there still
possible way to optimize the code. Am I using everything as per the
text book. I am just apprehensive whether this may break or may cause a
memmory leak. Any chance for it.
[code snipped; see up-thread]


Before worrying about efficiency, worry about correctness.
You use malloc() and realloc() without checking for failure,
the way you use realloc() will cause a memory leak if realloc()
ever fails, the insertion of '\0' can run off the end of your
allocated region, an empty tag "<>" will leave you with a
non-string lacking the terminal '\0', and a '<' without a
matching '>' will send your code completely off the rails.

Once you've fixed these five bugs (and any others I didn't
happen to spot in your badly-indented code), you can start
measuring the performance of your program to see whether any
efficiency improvements are needed. Keep in mind that if it
takes you one hour to improve the speed by one millisecond,
you must run the program 3.6 million times just to break even.

If efficiency improvements are needed (as they well may be;
your code as it stands is far from tight), here are four
suggestions. Note that the C language itself has no notion of
"efficiency ," so the actual effect of these suggestions will
vary from platform to platform. As a practical matter, all
four are likely to improve matters, but this is not guaranteed.
Again, you must measure.

Suggestion #1: Learn how to use the strchr() function,
because it can probably locate the '<' and '>' characters
faster than you can. Don't reinvent the wheel.

Suggestion #2: If all you need to do is print out the
substrings between '<' and '>', print them directly from the
source buffer and get rid of the malloc() and realloc() calls.
Learn how to use the "%.*s" format specification, or learn how
to use fwrite().

Suggestion #3: If your real program needs to store the
substrings somewhere instead of just printing them out, don't
allocate memory until you've located the closing '>' and know
how much space you'll need. This avoids wasting memory when
you get a short substring, and avoids the overhead of realloc()
when you get a long one.

Suggestion #4: Learn how to use the memcpy() function,
because it can probably copy characters from the big string
to your destination area faster than you can. (It will
almost certainly do better than your current practice of
storing most destination positions twice!) Don't reinvent
the wheel.

--
Er*********@sun .com

Nov 14 '05 #2

On Tue, 21 Dec 2004, saraca means ashoka tree wrote:

The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?. Is there still
possible way to optimize the code[?]
Of course.
Am I using everything as per the text book[?] I am just apprehensive
whether this may break or may cause a memory leak. Any chance for it[?]


Re-post your code with some indentation, and maybe someone will take
the trouble to look at it. Right now, it's completely unreadable.
(If your problem is that you're trying to post hard tabs to Usenet...
don't! I recommend
http://www.contrib.andrew.cmu.edu/~a...ftware/detab.c
, for obvious reasons. ;) Run 'detab -R -4 myprogram.c' and re-post.)

Make sure the text you're posting actually compiles, by the way.
What you posted in this message doesn't compile in either C90 or C99,
the languages discussed in this newsgroup. I strongly recommend you
don't use '//'-style comments in C code you intend to show anyone;
they tend to do Bad Things like

// this is a long comment that overflows the line and turns into a
syntax error

and every so often (though less and less frequently, thankfully) we see

file://this comment was mangled by a Windows-based news client

-Arthur
Nov 14 '05 #3
saraca means ashoka tree wrote:
The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?.
Better would be to not use malloc at all, and just remember
a pointer to the token start, and its length (or perhaps,
the offset into scan_buffer of the start and end).

The only reason to malloc would be if you needed to
destroy scan_buffer but keep the tokens, or if you
needed to pass the token to a function that cannot
handle a length-counted string (printf is not one of
those functions).

#define TOKN_SIZE 256

void tagfinder() {

char ch, *tokens;
int i, j ,len;
i=j=0;
//scan buffer holds the webpage as a string.
len = strlen(scan_buf fer);
You forgot to include stdlib.h and string.h
while(i < len) {
ch = scan_buffer[i++];
if(ch == '<') {
tokens = malloc(TOKN_SIZ E*sizeof(char)) ;
sizeof(char) is always 1.
You need to check malloc's return value. It will return NULL
if you have run out of memory.
j=0;
while(ch != '>') {
ch = scan_buffer[i++];
if(j >= TOKN_SIZE)
tokens = realloc(tokens, (j+TOKN_SIZE) * sizeof(char));
You need to check realloc's return value.
Also, if you run out of memory then realloc will return NULL
and leak the original buffer. So to avoid leaks in the case
of a memory shortage you need to so something like:
temp = realloc(....... );
if (!temp) { free(tokens); exit(EXIT_FAILU RE); }
tokens = temp;

This is also bad design because you realloc one char
at a time. So if your token was 1024 in length then
you will end up doing 1024 - TOKN_SIZE allocations.
You could at least increase the allocation size by
TOKN_SIZE each time.

Even better would be to count the length of the token
before you do any allocations at all. Then you only
need to allocate once (a memory allocation is orders
of magnitude slower than scanning a token twice).
if(ch != '>') {
tokens[j++] = ch;
tokens[j] = '\0';
}
You overflowed 'tokens'. For example, if j == TOKN_SIZE-1
then tokens[j++]=ch sets the last character to ch,
and then tokens[j]=0 writes past the end of the buffer.

This is inefficient anyway because you write a \0 every
time. You should only write the \0 once, after the
token is finished.

}// end of while(ch != '>')
printf("%s\n",t okens);
free(tokens);
}//end of if(ch == '<')
}//end of while(len > 0)

}


Nov 14 '05 #4
I am not certain whether the program only exists to extract and store
tags, or if this is just a single piece of a web-browser or such.
If it is part of a larger application, I would recommend not to store
the strings. If you set up constants:

#define NO_TAG -1
#define TAG_HTML 0
#define TAG_HEAD 1
#define TAG_BODY 2
#define TAG_P 3

or such, then you can store your information in a tree structure that
you can enumerate and switch each tag:

int currentTag;
while ((currentTag = getNext(tagTree )) != NO_TAG) {
/* tagTree might be some sort of tree structure that retains the
hierarchy of the tags */

switch (currentTag) {
case TAG_HTML:
....code...
break;
case TAG_HEAD:
....code...

etc. ...
}
}

which will be far easier to deal with than strings and faster when it
comes to actually doing something based on what kind of tag it is.
Otherwise you will need to be performing string compares everywhere
through your code.

-Chris Williams

Nov 14 '05 #5
Try using Crystal REVS for C (www.sgvsarc.com) to take care of
indentation and formatting. Its flowcharts will help you review your
code.

Nov 14 '05 #6
"Jason Taylor" <ja***********@ yahoo.com> writes:
Try using Crystal REVS for C (www.sgvsarc.com) to take care of
indentation and formatting. Its flowcharts will help you review your
code.


If you don't mind paying $399 for a single license (slightly
discounted for multiple licenses) -- and it seems to be for Windows
only.

Jason, you wouldn't happen to work for the company that sells this
thing, would you?

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #7
Keith, Why make a personal remark?

I just happen to be a very satisfied user of Crystal REVS for C. I
think it's a great product!

Allow me to ask a few simple questions:

Have you tried using it? You can download a free eval copy.

Its flowcharts and automatic formatting have saved me a lot of time.

Would you rather do all the low-level editing manually? Would you
rather have code that is hard to read?

Would you rather spend more time to understand a function instead of
using flowcharts?

How much do you value your time? How does $399 compare?

Did you take a look at Crystal FLOW for C? It costs $219 - single
quantity.

A Windows only product seems fine to me. There must be at least a few
hundred thousand individuals who design with C/C++ primarily in
Windows.

Nov 14 '05 #8
"Jason Taylor" <ja***********@ yahoo.com> writes:
Keith, Why make a personal remark?
Because you came into this newsgroup (where you haven't posted
previously as far as I can tell) and posted what appear to be
advertisements for a commercial product. I've seen a lot of
advertisements posing as testimonials from satisfied customers.
I just happen to be a very satisfied user of Crystal REVS for C. I
think it's a great product!
Ok, I'll take your word for it. Even so, the topicality of your posts
is questionable, but I'm not going to make a big deal out of it.
Allow me to ask a few simple questions:

Have you tried using it? You can download a free eval copy.
No. Since I don't develop C software under Windows, I don't have much
use for it.
Its flowcharts and automatic formatting have saved me a lot of time.
I don't like flowcharts, but de gustibus et cetera.
Would you rather do all the low-level editing manually? Would you
rather have code that is hard to read?


Yes and no, respectively.

[...]

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #9
On 24 Dec 2004 12:01:01 -0800, in comp.lang.c , "Jason Taylor"
<ja***********@ yahoo.com> wrote:
A Windows only product seems fine to me.
But offtopic here - this is a platfom neutral group.
There must be at least a few
hundred thousand individuals who design with C/C++ primarily in
Windows.


So what? Theres a billion speak various dialects of indian, but thats not
topical here either.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt >
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

51
5222
by: Mudge | last post by:
Please, someone, tell me why OO in PHP is better than procedural.
9
3850
by: bigoxygen | last post by:
Hi. I'm using a 3 tier FrontController Design for my web application right now. The problem is that I'm finding to have to duplicate a lot of code for similar functions; for example, listing users, and listing assignments use similar type commands. Is there a "better" way I can organize my code?
4
2420
by: jason | last post by:
Hello. Newbie on SQL and suffering through this. I have two tables created as such: drop table table1; go drop table table2; go
16
3090
by: Dario de Judicibus | last post by:
I'm getting crazy. Look at this code: #include <string.h> #include <stdio.h> #include <iostream.h> using namespace std ; char ini_code = {0xFF, 0xFE} ; char line_sep = {0x20, 0x28} ;
109
5751
by: Andrew Thompson | last post by:
It seems most people get there JS off web sites, which is entirely logical. But it is also a great pity since most of that code is of such poor quality. I was looking through the JS FAQ for any question that identifies the warning signs to look out for, the things that most easily and clearly identify the author of code as something less...
5
4040
by: ED | last post by:
I currently have vba code that ranks employees based on their average job time ordered by their region, zone, and job code. I currently have vba code that will cycle through a query and ranks each employee based on their region, zone, job code and avg job time. (See code below). My problem is that I do not know how to rank the ties. Right...
0
2080
by: Namratha Shah \(Nasha\) | last post by:
Hey Guys, Today we are going to look at Code Access Security. Code access security is a feature of .NET that manages code depending on its trust level. If the CLS trusts the code enough to allow it ro run then it will execute, the code execution depends on the permission provided to the assembly. If the code is not trusted wnough to run...
18
3144
by: Joe Fallon | last post by:
I have some complex logic which is fairly simply to build up into a string. I needed a way to Eval this string and return a Boolean result. This code works fine to achieve that goal. My question is what happens to the dynamically created assembly when the method is done running? Does GC take care of it? Or is it stuck in RAM until the...
37
5927
by: Alan Silver | last post by:
Hello, Newbie here, so please forgive what is probably a basic question ... I see a lot of discussion about "code behind", which if I have understood correctly, means that the script code goes in a separate file from the HTML. Apart from the obvious advantage if you have a separate designer and programmer, are there any other advantages...
0
7518
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7444
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7954
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7467
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6039
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5367
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3497
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3478
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1054
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.