Connecting Tech Pros Worldwide Help | Site Map

C# String Comparison, IndexOf and Related

BILL
Guest
 
Posts: n/a
#1: Nov 16 '05
Hi Everyone,

I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:

* ToUpper/ToLower and IndexOf would be quite slow, right? as strings
are immutable and these search strings are larger to begin with.

* RegEx could be the answer, but I'm not sure pattern matching would
be the right solution for this problem

* Any unsafe code, Boyer-Moore using pointers or inline assembly (if
that's possible), would seem the best, but well, it's unsafe code

* I've found a MapTable example here in the C# nj (thanks maptable
person), and think this might be the best solution

Any help is appreciated, thanks in advance!
BILL
Lateralus [MCAD]
Guest
 
Posts: n/a
#2: Nov 16 '05

re: C# String Comparison, IndexOf and Related


Bill,
I did some tests. I created a 5 MB file and loaded it into a
streamreader. I assigned all of the text from the file into a string object.
I did a tolower and it returned the index of the specified substring
immediately. I also used some of the globalization classes that allows you
to do indexof with an ignorecase parameter. That also returned the index
immediately. I don't have any numbers as far as time that it took to run but
during debugging it literally stepped over the line of code doing the
comparison with no pause whatsoever.

here is the globalization code. I used a very simple text comparison below.

CultureInfo culture = new CultureInfo("en-us");

int index = culture.CompareInfo.IndexOf("this is a
TEST","test",System.Globalization.CompareOptions.I gnoreCase);

HTH
--
Lateralus [MCAD]


"BILL" <titirein@yahoo.com> wrote in message
news:cd9ff955.0408271546.3d336489@posting.google.c om...[color=blue]
> Hi Everyone,
>
> I've been looking through these .NET groups and can't find the exact
> answer I want, so I'm asking.
>
> Can someone let me know the best way (you feel) to search a C# string
> for an occurance of a CASE INSENSITIVE substring, returning the found
> position. I'm speaking of larger strings to search as well ~50K-500K.
> Here's what I have so far:
>
> * ToUpper/ToLower and IndexOf would be quite slow, right? as strings
> are immutable and these search strings are larger to begin with.
>
> * RegEx could be the answer, but I'm not sure pattern matching would
> be the right solution for this problem
>
> * Any unsafe code, Boyer-Moore using pointers or inline assembly (if
> that's possible), would seem the best, but well, it's unsafe code
>
> * I've found a MapTable example here in the C# nj (thanks maptable
> person), and think this might be the best solution
>
> Any help is appreciated, thanks in advance!
> BILL[/color]


BILL
Guest
 
Posts: n/a
#3: Nov 16 '05

re: C# String Comparison, IndexOf and Related


Thanks Lateralus - although I was a bit skeptical of the results,
after doing similar tests, I think I've changed my thinking on the
matter. I ran some IndexOf/ToUpper and related code on a few older
boxes I have here (eg, 500Mhz AMD, 512M) and didn't see any real
performance degradation either.

So - here's my question to everyone- if I'm not looking to do
heavy-duty work with these strings I think I'm best off using .NET
methods. The original question might have resulted from my being
trained as an anal-C++-guy, if so ... sorry all :)


"Lateralus [MCAD]" <dnorm252_at_yahoo.com> wrote in message news:<eLdKCmKjEHA.1904@TK2MSFTNGP09.phx.gbl>...[color=blue]
> Bill,
> I did some tests. I created a 5 MB file and loaded it into a
> streamreader. I assigned all of the text from the file into a string object.
> I did a tolower and it returned the index of the specified substring
> immediately. I also used some of the globalization classes that allows you
> to do indexof with an ignorecase parameter. That also returned the index
> immediately. I don't have any numbers as far as time that it took to run but
> during debugging it literally stepped over the line of code doing the
> comparison with no pause whatsoever.
>
> here is the globalization code. I used a very simple text comparison below.
>
> CultureInfo culture = new CultureInfo("en-us");
>
> int index = culture.CompareInfo.IndexOf("this is a
> TEST","test",System.Globalization.CompareOptions.I gnoreCase);
>
> HTH
> --
> Lateralus [MCAD]
>
>
> "BILL" <titirein@yahoo.com> wrote in message
> news:cd9ff955.0408271546.3d336489@posting.google.c om...[color=green]
> > Hi Everyone,
> >
> > I've been looking through these .NET groups and can't find the exact
> > answer I want, so I'm asking.
> >
> > Can someone let me know the best way (you feel) to search a C# string
> > for an occurance of a CASE INSENSITIVE substring, returning the found
> > position. I'm speaking of larger strings to search as well ~50K-500K.
> > Here's what I have so far:
> >
> > * ToUpper/ToLower and IndexOf would be quite slow, right? as strings
> > are immutable and these search strings are larger to begin with.
> >
> > * RegEx could be the answer, but I'm not sure pattern matching would
> > be the right solution for this problem
> >
> > * Any unsafe code, Boyer-Moore using pointers or inline assembly (if
> > that's possible), would seem the best, but well, it's unsafe code
> >
> > * I've found a MapTable example here in the C# nj (thanks maptable
> > person), and think this might be the best solution
> >
> > Any help is appreciated, thanks in advance!
> > BILL[/color][/color]
Lateralus [MCAD]
Guest
 
Posts: n/a
#4: Nov 16 '05

re: C# String Comparison, IndexOf and Related


Bill,
I can understand where youre coming from. Whenever our applications need
heavy string manipulation on large amounts of data we would always write the
dll in C++. There is nothing scientific about my next statement because I
never ran any "true" tests. We had a c++ dll that would manipulate large
strings up to 10MB in size. When it was rewritten in c# we didn't notice any
degredation in the speed becides it's first time executing since it gets
compiled the first time. So basically I found that the systems I've worked
on there is no need to turn to C++ as there was in the past. Of course there
are going to be times that you will need to, but for this one I think you're
ok with C#.

--
Lateralus [MCAD]


"BILL" <titirein@yahoo.com> wrote in message
news:cd9ff955.0408281520.10b3b67a@posting.google.c om...[color=blue]
> Thanks Lateralus - although I was a bit skeptical of the results,
> after doing similar tests, I think I've changed my thinking on the
> matter. I ran some IndexOf/ToUpper and related code on a few older
> boxes I have here (eg, 500Mhz AMD, 512M) and didn't see any real
> performance degradation either.
>
> So - here's my question to everyone- if I'm not looking to do
> heavy-duty work with these strings I think I'm best off using .NET
> methods. The original question might have resulted from my being
> trained as an anal-C++-guy, if so ... sorry all :)
>
>
> "Lateralus [MCAD]" <dnorm252_at_yahoo.com> wrote in message
> news:<eLdKCmKjEHA.1904@TK2MSFTNGP09.phx.gbl>...[color=green]
>> Bill,
>> I did some tests. I created a 5 MB file and loaded it into a
>> streamreader. I assigned all of the text from the file into a string
>> object.
>> I did a tolower and it returned the index of the specified substring
>> immediately. I also used some of the globalization classes that allows
>> you
>> to do indexof with an ignorecase parameter. That also returned the index
>> immediately. I don't have any numbers as far as time that it took to run
>> but
>> during debugging it literally stepped over the line of code doing the
>> comparison with no pause whatsoever.
>>
>> here is the globalization code. I used a very simple text comparison
>> below.
>>
>> CultureInfo culture = new CultureInfo("en-us");
>>
>> int index = culture.CompareInfo.IndexOf("this is a
>> TEST","test",System.Globalization.CompareOptions.I gnoreCase);
>>
>> HTH
>> --
>> Lateralus [MCAD]
>>
>>
>> "BILL" <titirein@yahoo.com> wrote in message
>> news:cd9ff955.0408271546.3d336489@posting.google.c om...[color=darkred]
>> > Hi Everyone,
>> >
>> > I've been looking through these .NET groups and can't find the exact
>> > answer I want, so I'm asking.
>> >
>> > Can someone let me know the best way (you feel) to search a C# string
>> > for an occurance of a CASE INSENSITIVE substring, returning the found
>> > position. I'm speaking of larger strings to search as well ~50K-500K.
>> > Here's what I have so far:
>> >
>> > * ToUpper/ToLower and IndexOf would be quite slow, right? as strings
>> > are immutable and these search strings are larger to begin with.
>> >
>> > * RegEx could be the answer, but I'm not sure pattern matching would
>> > be the right solution for this problem
>> >
>> > * Any unsafe code, Boyer-Moore using pointers or inline assembly (if
>> > that's possible), would seem the best, but well, it's unsafe code
>> >
>> > * I've found a MapTable example here in the C# nj (thanks maptable
>> > person), and think this might be the best solution
>> >
>> > Any help is appreciated, thanks in advance!
>> > BILL[/color][/color][/color]


BILL
Guest
 
Posts: n/a
#5: Nov 16 '05

re: C# String Comparison, IndexOf and Related


Lateralus - Thanks! It's hard to leave my C++/MASM behind, but you're
->absolutely<- right, I'll attack these problems when needed now. Any
different opinions on this thread are always welcome, but I think I've
found my answer...
BILL


"Lateralus [MCAD]" <dnorm252_at_yahoo.com> wrote in message news:<#WWfv1WjEHA.3896@TK2MSFTNGP15.phx.gbl>...[color=blue]
> Bill,
> I can understand where youre coming from. Whenever our applications need
> heavy string manipulation on large amounts of data we would always write the
> dll in C++. There is nothing scientific about my next statement because I
> never ran any "true" tests. We had a c++ dll that would manipulate large
> strings up to 10MB in size. When it was rewritten in c# we didn't notice any
> degredation in the speed becides it's first time executing since it gets
> compiled the first time. So basically I found that the systems I've worked
> on there is no need to turn to C++ as there was in the past. Of course there
> are going to be times that you will need to, but for this one I think you're
> ok with C#.
>
> --
> Lateralus [MCAD]
>[/color]
<snip>
Jon Skeet [C# MVP]
Guest
 
Posts: n/a
#6: Nov 16 '05

re: C# String Comparison, IndexOf and Related


BILL <titirein@yahoo.com> wrote:[color=blue]
> I've been looking through these .NET groups and can't find the exact
> answer I want, so I'm asking.
>
> Can someone let me know the best way (you feel) to search a C# string
> for an occurance of a CASE INSENSITIVE substring, returning the found
> position. I'm speaking of larger strings to search as well ~50K-500K.
> Here's what I have so far:[/color]

<snip>

In addition to the previous comments, you may wish to consider using
CompareInfo.IndexOf (source, value, CompareOptions.IgnoreCase)

You can get a CompareInfo reference from a CultureInfo - you could use
the current culture (CultureInfo.CurrentCulture) or the invariant one
(CultureInfo.InvariantCulture).

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Closed Thread


Similar C# / C Sharp bytes