I have to process some data in C that is given to me as a char * array. I
have a fairly large number of substrings (well, they're not actually
printable, but let's treat them as strings) that I have to search for in the
data.
I need to keep a count of how often each of these substrings occurs in my
original data and then print it out at the end.
This is a fairly mundane task, but since I have so many substrings, it's a
pain having to #define them all. Can anybody suggest an efficient way of
doing this task?
BTW, I have to use C. If I could use Perl, I'd be in heaven.
cheers, 9 1781
C3 wrote: I have to process some data in C that is given to me as a char * array. I have a fairly large number of substrings (well, they're not actually printable, but let's treat them as strings) that I have to search for in the data.
I need to keep a count of how often each of these substrings occurs in my original data and then print it out at the end.
This is a fairly mundane task, but since I have so many substrings, it's a pain having to #define them all. Can anybody suggest an efficient way of doing this task?
BTW, I have to use C. If I could use Perl, I'd be in heaven.
/* BEGIN new.c output */
The substring "string", occurs 4 times.
The substring "C", occurs 3 times.
The substring "kibo", occurs 0 times.
The substring "hen", occurs 1 times.
The substring "of", occurs 5 times.
/* END new.c output */
/* BEGIN new.c */
#include <stdio.h>
#include <string.h>
int main(void)
{
struct substring {
char *string;
long unsigned count;
} substring[] = {{"string"},{"C"},{"kibo"},{"hen"},{"of"}},
*str_ptr;
char *data[] = {
"I have to process some data in C that is given to me "
"as a char * array.\n I have a fairly large number of "
"substrings\n(well, they're not actually printable, "
"but let's treat them as strings)\nthat I have to search "
"for in the data.",
"I need to keep a count of how often each of these "
"substrings occurs in my original data and then print it "
"out at the end.",
"This is a fairly mundane task, but since I have so many "
"substrings, it's a pain having to #define them all.\n"
"Can anybody suggest an efficient way of doing this task?\n",
"BTW, I have to use C.\n"
"If I could use Perl, I'd be in heaven.\ncheers,"
};
size_t nstring, ndata;
char **data_ptr, *ptr;
long unsigned count;
nstring = sizeof substring / sizeof *substring;
for (str_ptr = substring; nstring-- != 0; ++str_ptr) {
count = 0;
data_ptr = data;
ndata = sizeof data / sizeof *data;
while (ndata-- != 0) {
ptr = *data_ptr;
ptr = strstr(*data_ptr, str_ptr -> string);
while (ptr != NULL) {
++count;
ptr = strstr(ptr + 1, str_ptr -> string);
}
++data_ptr;
}
str_ptr -> count = count;
}
puts("/* BEGIN new.c output */\n");
nstring = sizeof substring / sizeof *substring;
for (str_ptr = substring; nstring-- != 0; ++str_ptr) {
printf("The substring \"%s\", occurs %lu times.\n",
str_ptr -> string, str_ptr -> count);
}
puts("\n/* END new.c output */");
return 0;
}
/* END new.c */
--
pete
This has the guts of what I'm after, but I need to give it a fairly long
list of substrings (byte sequences), which can't be done dynamically due to
the struct.
I have already read my substrings into char[] arrays, now I just need to do
the substring matching. Any help you could give me would be appreciated.
cheers,
C3 wrote: This has the guts of what I'm after, but I need to give it a fairly long list of substrings (byte sequences), which can't be done dynamically due to the struct.
I have already read my substrings into char[] arrays, now I just need to do the substring matching. Any help you could give me would be appreciated.
Write a new function that uses function strstr to do the matching. This
new function will have parameters that will point to the substr array,
a parameter that will describe the size of the array, and a pointer to
the data string that you wish to search. Depending on how you want
to store the results of the search, other parameters may be needed.
Here is an example of a function, PrintResults, that will simply print
the results of the search. (Case issue avoided for simplicity)
#include <stdio.h>
#include <string.h>
void PrintResults(char *substr[], size_t cnt,const char *string);
int main(void)
{
char *substr[] = {"george","bill","adam","a"};
char *teststr = "this is george and this is adam.\n"
"there are many georges and some \n"
"adams, but george is the best because\n"
"he is the mostest. george is the king\n"
"of the jungle";
printf("In the string:\n\n\"%s\"\n\nSearch found:\n",teststr);
PrintResults(substr,(sizeof substr)/(sizeof *substr),teststr);
return 0;
}
void PrintResults(char *substr[], size_t cnt,const char *string)
{
size_t i;
unsigned subcnt;
const char *s1, *s2;
if(!substr || !string || *string == '\0') return;
for(i = 0; i < cnt;i++)
{
subcnt = 0;
if(*substr[i] != '\0')
for(s1 = string;(s2 = strstr(s1,substr[i])); s1 = (s2+1))
subcnt++;
printf("%u occurrances of \"%s\"\n",subcnt,substr[i]);
}
return;
}
--
Al Bowers
Tampa, Fl USA
mailto: xa******@myrapidsys.com (remove the x to send email) http://www.geocities.com/abowers822/
C3 wrote: This has the guts of what I'm after, but I need to give it a fairly long list of substrings (byte sequences), which can't be done dynamically due to the struct.
I have already read my substrings into char[] arrays, now I just need to do the substring matching. Any help you could give me would be appreciated.
What is 'this'? What is 'the struct'? You can compare strings
with strcmp().
--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
C3 wrote: This has the guts of what I'm after, but I need to give it a fairly long list of substrings (byte sequences), which can't be done dynamically due to the struct.
I have already read my substrings into char[] arrays, now I just need to do the substring matching. Any help you could give me would be appreciated.
cheers,
Look at lex (which generates C code)
Unfortunately, it appears this won't work for another reason. My substrings
may contain the null character. I have the length, so it shouldn't be a
problem.
On Mon, 04 Oct 2004 10:08:31 +1000, C3 wrote: Unfortunately, it appears this won't work for another reason. My substrings may contain the null character. I have the length, so it shouldn't be a problem.
I was going to ask about that. I'm glad you figured it out yourself.
I don't know how big the data sets are or how many substrings you have,
but you could just go through the data one character at a time, and for
each byte call memcmp() once for each substring.
char *data, **substring;
long i, j, nsubstrings, *count, data_length, *substring_length;
/* allocate space and populate substrings and so on */
....
for (j=0; j<data_length; j++)
{
for(i=0; i<nsubstrings; i++)
if (memcmp(data[j], substring[i], substring_length[i]) == 0))
++count[i];
}
This is just a sketch of an idea to be taken for what it is worth.
In particular, it might be better to use size_t for some of the types.
--Mac
Thanks for the suggestion. I guess there's no Perlesque q&d way to do what I
need.
cheers,
Thank you. I've always believed this to be the case, for oh so very obvious
reasons. How did you know? :-)
cheers, char *substr[] = {"george","bill","adam","a"}; char *teststr = "this is george and this is adam.\n" "there are many georges and some \n" "adams, but george is the best because\n" "he is the mostest. george is the king\n" "of the jungle"; This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Michi |
last post by:
I was wondering what the best solution is for making large numbers of
TEXT (or BLOB?) fields searchable. For example, if I have a forum,
what is the best way to be able to search for specific...
|
by: Will McGugan |
last post by:
Hi,
Is there a simple way of replacing a large number of substrings in a
string? I was hoping that str.replace could take a dictionary and use it
to replace the occurrences of the keys with the...
|
by: David Wender |
last post by:
I want to create a dataview with a sort on multiple columns. However,
when I use FindRows, I only want to search some of the columns, not
all. Is this possible? I have not been able to make it...
|
by: girish |
last post by:
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, gives me , , , , , ] etc....
|
by: Matik |
last post by:
Hey,
First, sorry if this post appear twice, because, I can not find my post
recently send, trying to post it once again.
I'm out of ideas, so, I thought, will search help here again :(
I'm...
|
by: Tyler |
last post by:
Hello All:
After trying to find an open source alternative to Matlab (or IDL), I
am currently getting acquainted with Python and, in particular SciPy,
NumPy, and Matplotlib. While I await the...
|
by: theCancerus |
last post by:
Hi All,
I am not sure if this is the right place to ask this question but i am
very sure you may have faced this problem, i have already found some
post related to this but not the answer i am...
|
by: boss1 |
last post by:
i have a text file like below: named "rez.text" and
i wnt to transfer each column into oracle database which have the same column name as views here.Every time it will goto database atomatecally...
|
by: Costa |
last post by:
I am looking for a c/c++ text search engine library that supports:
- free text searching - not only beginning of words but substrings as
well
- wildcard searching - I want strings such as...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |