473,387 Members | 1,435 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

C program to count occurences of substrings in strings

JD
Hi guys

I'm trying to write a program that counts the occurrences of HTML tags
in a text file. This is what I have so far:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MB 1048576

int CountString(char *, char *);

int main(int argc, char **argv)
{
char buf[MB];
FILE *f;
char *name;
char *p;
int lines;
int count[6] = {0, 0, 0, 0, 0, 0};
int i;

if (argc == 1) {
printf("You need to specify a file on the command line\n");
return 0;
}

name = argv[1];

if ((f = fopen(name, "r")) == NULL) {
printf("Couldn't open '%s' for reading!\n", name);
return 1;
}

lines = 0;
i = 0;

while(fgets(buf, MB, f) != NULL) {

lines++;
if ((p = strrchr(buf, '\n')) != NULL) { *p = '\0'; }
/* printf("%s\n", buf); */

count[0] += CountString(buf, "<table");
count[1] += CountString(buf, "</table>");
count[2] += CountString(buf, "<tr");
count[3] += CountString(buf, "</tr>");
count[4] += CountString(buf, "<td");
count[5] += CountString(buf, "</td>");
}

for (i = 0; i < 6; i++) {
printf("count[%d] = %d\n", i, count[i]);
}

fclose(f);
return 0;
}

int CountString(char *buf, char *str)
{
int length = strlen(str);
char *p = buf;
int count = 0;

while (strlen(p) >= length) {
if (strncmp(buf, str, length) == 0) { count++; }
p++;
}

return count;
}

If I run it on this test page:
<html>
<head>
<title>Test</title>
</head>
<body>

<table width="100%" border="1" cellspacing="0" cellpadding="0">
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
</table>
</body>
</html>

It gives:

count[0] = 59
count[1] = 1
count[2] = 0
count[3] = 0
count[4] = 0
count[5] = 0

Which is clearly not correct. Can anyone give me any pointers as to what
I'm doing wrong?

Thanks
Nov 15 '05 #1
1 3123
JD wrote:
int CountString(char *buf, char *str)
{
int length = strlen(str);
char *p = buf;
int count = 0;

while (strlen(p) >= length) {
if (strncmp(buf, str, length) == 0) { count++; }
p++;
}

return count;
}

If I run it on this test page:
<html>
<head>
<title>Test</title>
</head>
<body>

<table width="100%" border="1" cellspacing="0" cellpadding="0">
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
</table>
</body>
</html>

It gives:

count[0] = 59
count[1] = 1
count[2] = 0
count[3] = 0
count[4] = 0
count[5] = 0

Which is clearly not correct. Can anyone give me any pointers as to what
I'm doing wrong?

Thanks


You might want to try:

int CountString(char *buf, char *str)
{
int length = strlen(str);
char *p = buf;
int count = 0;

while (strlen(p) >= length) {
if (strncmp(p, str, length) == 0) { count++; }
p++;
}

return count;
}

You only advanced 'p', 'buf' remains at the same location in the
string.
I personally prefer:

while (strlen(buf) >= length) {
if (strncmp(buf, str, length) == 0) { count++; }
buf++;
}

The 'p' pointer is not strictly necessary.

Nov 15 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: spam | last post by:
Is there a well-known algorithm for replacing many substrings in a string? For example, I'd like to take the string "abc def ghi jkl mno pqr" and replace, say, every instance of "abc", "ghi", and...
7
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...
9
by: C3 | last post by:
I have to process some data in C that is given to me as a char * array. I have a fairly large number of substrings (well, they're not actually printable, but let's treat them as strings) that I...
19
by: linzhenhua1205 | last post by:
I want to parse a string like C program parse the command line into argc & argv. I hope don't use the array the allocate a fix memory first, and don't use the memory allocate function like malloc....
3
by: Girish Sahani | last post by:
Given a length k string,i want to search for 2 substrings (overlap possible) in a list consisting of length k-1 strings. These 2 substrings when 'united' give the original string. e.g given...
0
by: uninvitedm | last post by:
Heya I've got a table of invoices, which have dates and customer_id's. What I need to get is the number of occurances for this customer for a 12-month range window. For example, if there's an...
4
by: rajarora | last post by:
Hi All, I need to have a function that should remove the multiple occurrences of all substrings present in a string. That is the function will be having an input like this:- char* data=NULL;...
18
by: Neehar | last post by:
Hello For one of the interviews I took recently, I was given an offline programming quiz. In 30 minutes I had to write code in C++ to counts the number of times each unique word appears in a...
5
by: Larry | last post by:
Dear all, I'm new to Python. I have a file (an image file actually) that I need to read pixel by pixel. It's an 8-bit integer type. I need to get the statistics like mean, standard deviation,...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.