By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,827 Members | 2,195 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,827 IT Pros & Developers. It's quick & easy.

C# String Comparison, IndexOf and Related

P: n/a
Hi Everyone,

I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:

* ToUpper/ToLower and IndexOf would be quite slow, right? as strings
are immutable and these search strings are larger to begin with.

* RegEx could be the answer, but I'm not sure pattern matching would
be the right solution for this problem

* Any unsafe code, Boyer-Moore using pointers or inline assembly (if
that's possible), would seem the best, but well, it's unsafe code

* I've found a MapTable example here in the C# nj (thanks maptable
person), and think this might be the best solution

Any help is appreciated, thanks in advance!
BILL
Nov 16 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Bill,
I did some tests. I created a 5 MB file and loaded it into a
streamreader. I assigned all of the text from the file into a string object.
I did a tolower and it returned the index of the specified substring
immediately. I also used some of the globalization classes that allows you
to do indexof with an ignorecase parameter. That also returned the index
immediately. I don't have any numbers as far as time that it took to run but
during debugging it literally stepped over the line of code doing the
comparison with no pause whatsoever.

here is the globalization code. I used a very simple text comparison below.

CultureInfo culture = new CultureInfo("en-us");

int index = culture.CompareInfo.IndexOf("this is a
TEST","test",System.Globalization.CompareOptions.I gnoreCase);

HTH
--
Lateralus [MCAD]
"BILL" <ti******@yahoo.com> wrote in message
news:cd**************************@posting.google.c om...
Hi Everyone,

I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:

* ToUpper/ToLower and IndexOf would be quite slow, right? as strings
are immutable and these search strings are larger to begin with.

* RegEx could be the answer, but I'm not sure pattern matching would
be the right solution for this problem

* Any unsafe code, Boyer-Moore using pointers or inline assembly (if
that's possible), would seem the best, but well, it's unsafe code

* I've found a MapTable example here in the C# nj (thanks maptable
person), and think this might be the best solution

Any help is appreciated, thanks in advance!
BILL

Nov 16 '05 #2

P: n/a
Thanks Lateralus - although I was a bit skeptical of the results,
after doing similar tests, I think I've changed my thinking on the
matter. I ran some IndexOf/ToUpper and related code on a few older
boxes I have here (eg, 500Mhz AMD, 512M) and didn't see any real
performance degradation either.

So - here's my question to everyone- if I'm not looking to do
heavy-duty work with these strings I think I'm best off using .NET
methods. The original question might have resulted from my being
trained as an anal-C++-guy, if so ... sorry all :)
"Lateralus [MCAD]" <dnorm252_at_yahoo.com> wrote in message news:<eL**************@TK2MSFTNGP09.phx.gbl>...
Bill,
I did some tests. I created a 5 MB file and loaded it into a
streamreader. I assigned all of the text from the file into a string object.
I did a tolower and it returned the index of the specified substring
immediately. I also used some of the globalization classes that allows you
to do indexof with an ignorecase parameter. That also returned the index
immediately. I don't have any numbers as far as time that it took to run but
during debugging it literally stepped over the line of code doing the
comparison with no pause whatsoever.

here is the globalization code. I used a very simple text comparison below.

CultureInfo culture = new CultureInfo("en-us");

int index = culture.CompareInfo.IndexOf("this is a
TEST","test",System.Globalization.CompareOptions.I gnoreCase);

HTH
--
Lateralus [MCAD]
"BILL" <ti******@yahoo.com> wrote in message
news:cd**************************@posting.google.c om...
Hi Everyone,

I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:

* ToUpper/ToLower and IndexOf would be quite slow, right? as strings
are immutable and these search strings are larger to begin with.

* RegEx could be the answer, but I'm not sure pattern matching would
be the right solution for this problem

* Any unsafe code, Boyer-Moore using pointers or inline assembly (if
that's possible), would seem the best, but well, it's unsafe code

* I've found a MapTable example here in the C# nj (thanks maptable
person), and think this might be the best solution

Any help is appreciated, thanks in advance!
BILL

Nov 16 '05 #3

P: n/a
Bill,
I can understand where youre coming from. Whenever our applications need
heavy string manipulation on large amounts of data we would always write the
dll in C++. There is nothing scientific about my next statement because I
never ran any "true" tests. We had a c++ dll that would manipulate large
strings up to 10MB in size. When it was rewritten in c# we didn't notice any
degredation in the speed becides it's first time executing since it gets
compiled the first time. So basically I found that the systems I've worked
on there is no need to turn to C++ as there was in the past. Of course there
are going to be times that you will need to, but for this one I think you're
ok with C#.

--
Lateralus [MCAD]
"BILL" <ti******@yahoo.com> wrote in message
news:cd**************************@posting.google.c om...
Thanks Lateralus - although I was a bit skeptical of the results,
after doing similar tests, I think I've changed my thinking on the
matter. I ran some IndexOf/ToUpper and related code on a few older
boxes I have here (eg, 500Mhz AMD, 512M) and didn't see any real
performance degradation either.

So - here's my question to everyone- if I'm not looking to do
heavy-duty work with these strings I think I'm best off using .NET
methods. The original question might have resulted from my being
trained as an anal-C++-guy, if so ... sorry all :)
"Lateralus [MCAD]" <dnorm252_at_yahoo.com> wrote in message
news:<eL**************@TK2MSFTNGP09.phx.gbl>...
Bill,
I did some tests. I created a 5 MB file and loaded it into a
streamreader. I assigned all of the text from the file into a string
object.
I did a tolower and it returned the index of the specified substring
immediately. I also used some of the globalization classes that allows
you
to do indexof with an ignorecase parameter. That also returned the index
immediately. I don't have any numbers as far as time that it took to run
but
during debugging it literally stepped over the line of code doing the
comparison with no pause whatsoever.

here is the globalization code. I used a very simple text comparison
below.

CultureInfo culture = new CultureInfo("en-us");

int index = culture.CompareInfo.IndexOf("this is a
TEST","test",System.Globalization.CompareOptions.I gnoreCase);

HTH
--
Lateralus [MCAD]
"BILL" <ti******@yahoo.com> wrote in message
news:cd**************************@posting.google.c om...
> Hi Everyone,
>
> I've been looking through these .NET groups and can't find the exact
> answer I want, so I'm asking.
>
> Can someone let me know the best way (you feel) to search a C# string
> for an occurance of a CASE INSENSITIVE substring, returning the found
> position. I'm speaking of larger strings to search as well ~50K-500K.
> Here's what I have so far:
>
> * ToUpper/ToLower and IndexOf would be quite slow, right? as strings
> are immutable and these search strings are larger to begin with.
>
> * RegEx could be the answer, but I'm not sure pattern matching would
> be the right solution for this problem
>
> * Any unsafe code, Boyer-Moore using pointers or inline assembly (if
> that's possible), would seem the best, but well, it's unsafe code
>
> * I've found a MapTable example here in the C# nj (thanks maptable
> person), and think this might be the best solution
>
> Any help is appreciated, thanks in advance!
> BILL

Nov 16 '05 #4

P: n/a
Lateralus - Thanks! It's hard to leave my C++/MASM behind, but you're
->absolutely<- right, I'll attack these problems when needed now. Any
different opinions on this thread are always welcome, but I think I've
found my answer...
BILL
"Lateralus [MCAD]" <dnorm252_at_yahoo.com> wrote in message news:<#W**************@TK2MSFTNGP15.phx.gbl>...
Bill,
I can understand where youre coming from. Whenever our applications need
heavy string manipulation on large amounts of data we would always write the
dll in C++. There is nothing scientific about my next statement because I
never ran any "true" tests. We had a c++ dll that would manipulate large
strings up to 10MB in size. When it was rewritten in c# we didn't notice any
degredation in the speed becides it's first time executing since it gets
compiled the first time. So basically I found that the systems I've worked
on there is no need to turn to C++ as there was in the past. Of course there
are going to be times that you will need to, but for this one I think you're
ok with C#.

--
Lateralus [MCAD]

<snip>
Nov 16 '05 #5

P: n/a
BILL <ti******@yahoo.com> wrote:
I've been looking through these .NET groups and can't find the exact
answer I want, so I'm asking.

Can someone let me know the best way (you feel) to search a C# string
for an occurance of a CASE INSENSITIVE substring, returning the found
position. I'm speaking of larger strings to search as well ~50K-500K.
Here's what I have so far:


<snip>

In addition to the previous comments, you may wish to consider using
CompareInfo.IndexOf (source, value, CompareOptions.IgnoreCase)

You can get a CompareInfo reference from a CultureInfo - you could use
the current culture (CultureInfo.CurrentCulture) or the invariant one
(CultureInfo.InvariantCulture).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.