Connecting Tech Pros Worldwide Forums | Help | Site Map

# of occurences of string in another string

Jason Gleason
Guest
 
Posts: n/a
#1: Nov 16 '05
What's the most efficient way to get the number of occurences of a certain
string in another string..for instance i'm using the following code right
now...

private int CharacterCounter(String text,String Character)

{

int count = 0;

for (int i = 0; i < text.Length; i++)

{

if(text.Substring(i,1) == Character)

{

count++;

}

}

return count;

}

The problem with this way is it's not the fastest doing big strings multiple
times over the course of a program run. Is there an easier/faster way to do
it using regular expressions?







Justin Rogers
Guest
 
Posts: n/a
#2: Nov 16 '05

re: # of occurences of string in another string


Regular expressions are going to be slower.

I wrote something a while back that stores the offset of all string
occurences within another string. You can see that here:
http://weblogs.asp.net/justin_rogers.../14/89545.aspx
The relavent code is in SplitByString and appears below as well.

while(index < testString.Length) {
int indexOf = testString.IndexOf(split, index);
if ( indexOf != -1 ) {
offsets[offset++] = indexOf;
index = (indexOf+1);
} else {
index = testString.Length;
}
}

Now, what about fixing that code up to just count?


int index = 0;
while(index < testString.Length) {
int indexOf = testString.IndexOf(split, index);
if ( indexOf != -1 ) {
offsets[offset++] = indexOf;
index = (indexOf+1);
} else {
index = testString.Length;
}
}



"Jason Gleason" <jason.gleason@gensurvey.com> wrote in message
news:%23RwZgkJLEHA.1144@TK2MSFTNGP12.phx.gbl...[color=blue]
> What's the most efficient way to get the number of occurences of a certain
> string in another string..for instance i'm using the following code right
> now...
>
> private int CharacterCounter(String text,String Character)
>
> {
>
> int count = 0;
>
> for (int i = 0; i < text.Length; i++)
>
> {
>
> if(text.Substring(i,1) == Character)
>
> {
>
> count++;
>
> }
>
> }
>
> return count;
>
> }
>
> The problem with this way is it's not the fastest doing big strings multiple
> times over the course of a program run. Is there an easier/faster way to do
> it using regular expressions?
>
>
>
>
>
>[/color]


Justin Rogers
Guest
 
Posts: n/a
#3: Nov 16 '05

re: # of occurences of string in another string


Okay, I hit the send before done button again...

int index = 0;
int count = 0;

while(index < testString) {
int indexOf = testString.IndexOf(splitString, index);
if ( indexOf != -1 ) {
count++; index = (indexOf + splitString.Length);
} else { index = testString.Length; }
}

That should get you the number of occurences correctly.
Also note that an issue in my original code didn't take into
account split string length for purposes of offseting. That
makes a big difference in the output of something like (match
all occurences of (aa) in (aaaa). Normally that should be two,
but my old method would have returned three. I guess that is
a highly ambiguous case.


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"Justin Rogers" <Justin@games4dotnet.com> wrote in message
news:OsvrSYKLEHA.340@TK2MSFTNGP11.phx.gbl...[color=blue]
> Regular expressions are going to be slower.
>
> I wrote something a while back that stores the offset of all string
> occurences within another string. You can see that here:
> http://weblogs.asp.net/justin_rogers.../14/89545.aspx
> The relavent code is in SplitByString and appears below as well.
>
> while(index < testString.Length) {
> int indexOf = testString.IndexOf(split, index);
> if ( indexOf != -1 ) {
> offsets[offset++] = indexOf;
> index = (indexOf+1);
> } else {
> index = testString.Length;
> }
> }
>
> Now, what about fixing that code up to just count?
>
>
> int index = 0;
> while(index < testString.Length) {
> int indexOf = testString.IndexOf(split, index);
> if ( indexOf != -1 ) {
> offsets[offset++] = indexOf;
> index = (indexOf+1);
> } else {
> index = testString.Length;
> }
> }
>
>
>
> "Jason Gleason" <jason.gleason@gensurvey.com> wrote in message
> news:%23RwZgkJLEHA.1144@TK2MSFTNGP12.phx.gbl...[color=green]
> > What's the most efficient way to get the number of occurences of a certain
> > string in another string..for instance i'm using the following code right
> > now...
> >
> > private int CharacterCounter(String text,String Character)
> >
> > {
> >
> > int count = 0;
> >
> > for (int i = 0; i < text.Length; i++)
> >
> > {
> >
> > if(text.Substring(i,1) == Character)
> >
> > {
> >
> > count++;
> >
> > }
> >
> > }
> >
> > return count;
> >
> > }
> >
> > The problem with this way is it's not the fastest doing big strings multiple
> > times over the course of a program run. Is there an easier/faster way to do
> > it using regular expressions?
> >
> >
> >
> >
> >
> >[/color]
>
>[/color]


Jimi
Guest
 
Posts: n/a
#4: Nov 16 '05

re: # of occurences of string in another string


> Jason Gleasonwrote:
Is there an easier/faster way to do[color=blue]
> it using regular expressions?[/color]

Regular expressions are great fun for your spare time, but in addition
to being cryptic and unmaintainable, they'll kill the performance of
your app.

mikeb
Guest
 
Posts: n/a
#5: Nov 16 '05

re: # of occurences of string in another string


Jason Gleason wrote:[color=blue]
> What's the most efficient way to get the number of occurences of a certain
> string in another string..for instance i'm using the following code right
> now...
>
> private int CharacterCounter(String text,String Character)
>
> {
>
> int count = 0;
>
> for (int i = 0; i < text.Length; i++)
>
> {
>
> if(text.Substring(i,1) == Character)
>
> {
>
> count++;
>
> }
>
> }
>
> return count;
>
> }
>
> The problem with this way is it's not the fastest doing big strings multiple
> times over the course of a program run. Is there an easier/faster way to do
> it using regular expressions?
>[/color]

If you're searching for exact matches, regex's will not buy you anything
over a well-designed string search. if you're going to search for
patterns, regex's are a good tool.

If you want to count instances of a single character in a string (like
your example) you might be able to tune it a bit (for example, "text[i]"
is probably faster than "text.Substring(i,1)"), but your basic algorithm
is probably fine - if you're going to search the text string only once.

If you're going to count character instances in the same text string
multiple times, it would probably pay to run through the string once,
tallying up all the counts for the various characters into an array
that's indexed by the character value. Subsequent character counts are
just an indexed array access after that.

If you want to change the search from simply counting instances of a
particular character to counting actual instances of strings, you'll
probably want to do some research on the Boyer-Moore string search
algorithm. The framework's String.IndexOf() method might be a good
starting point for you, but I don't think it'll be as fast as a
hand-coded Boyer-Moore, since IndexOf() takes into account culture
information.

Do a net search on Boyer-Moore - it's a not-too-complex, but very, very
good algorithm for performing exact string matches.

--
mikeb
Closed Thread