On Sun, 23 May 2004 09:46:10 +0200, "Alexander Muylaert"
<Re********@pandora.be> wrote:
Does anyone know a good starting point about high speed string processing in
C#?
What I need is a very fast routine for a case insensitive "contains". e ==
E == é == ë == ...
There are some algorithms in Cormen's (et. al) "Algorithms",
and in Knuth's "The Art of Computer Programming: Searching and
Sorting". You can also find the same algorithms in books on compiler
design. But these deal specifically with matching the string, and not
the replacement and case insensitivity.
I'd recommend using a separate buffer for what the user is
typing in. Every time a character is pressed, convert it to lower
case, and change your funky characters to the regular ones (é to e).
Also I wouldn't try to find each string in what the user typed, rather
I'd try to find what the user typed in the list of strings starting at
different offsets. Having the list of strings sorted, and doing a
binary search will make that a manageable task.
Reverse all of the strings in your list, and search backwards
through what the user typed. E.G. when the user types 'r' you only
have to search through your list for strings that end with 'r'. This
way you never have to back up.
That should be fast enough, but if it's not, sort the list by
length first, then alphabetically. Keep a list of indexes into the
list as to where the first word of each length is, then in your binary
search only search the strings which match the length you're looking
for.
500000 is a lot of strings. But, in a binary search, at most
you have to look at 19 strings to find a match, or determine there is
no match. Which is very manageable for processing while a user is
typing.
If you want to go one step further, use a separate thread for
matching the string. People tend to pause and burst when typing.