473,378 Members | 1,482 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Search a String

S
Any idea on how I would be able to do a search within C# that does
ranges or words
For example

I want to search for Chicken in the string
string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0

The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

string s1 = "This is Great Chicken";
string s2 = "Chickin";

return ???

or

Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Person enter the search phrase misspelled the word Chicken;

string s1 = "This is Great Chickin";
string s2 = "Chicken";

Oct 2 '08 #1
14 2941
On Oct 2, 2:26*pm, S <scottremi...@yahoo.comwrote:
Any idea on how I would be able to do a search within C# that does
ranges or words

For example

I want to search for Chicken in the string

string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0

The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

string s1 = "This is Great Chicken";
string s2 = "Chickin";

return ???

or

Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Person enter the search phrase misspelled the word Chicken;

string s1 = "This is Great Chickin";
string s2 = "Chicken";
Check out the Regex namespace.

If you search something such as "Chickin" in a string which does not
contain this token, such as "This is Great Chicken", what would you
expect to return? I wouldn't expect it to return anything. Maybe you
wanna do something like google's key words suggestion if user entry
has some typos?

Oct 2 '08 #2
On Oct 2, 2:51*pm, Author <gnewsgr...@gmail.comwrote:
On Oct 2, 2:26*pm, S <scottremi...@yahoo.comwrote:
Any idea on how I would be able to do a search within C# that does
ranges or words
For example
I want to search for Chicken in the string
string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0
The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);
string s1 = "This is Great Chicken";
string s2 = "Chickin";
return ???
or
Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";
Person enter the search phrase misspelled the word Chicken;
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Check out the Regex namespace.
Correction: I meant the System.Text.RegularExpressions Namespace
http://msdn.microsoft.com/en-us/libr...pressions.aspx

Regex is a class of this namespace.
If you search something such as "Chickin" in a string which does not
contain this token, such as "This is Great Chicken", what would you
expect to return? *I wouldn't expect it to return anything. *Maybe you
wanna do something like google's key words suggestion if user entry
has some typos?

Oct 2 '08 #3
On Thu, 02 Oct 2008 11:26:38 -0700, S <sc**********@yahoo.comwrote:
Any idea on how I would be able to do a search within C# that does
ranges or words
There are a variety of techniques for dealing with partial matching as you
describe.

I've actually implemented a very simple approach myself in the past, in
which I do a sort of character-by-character merge between the target text
and the search word, finding the spot in the target text where the search
word fits the best with the least number of excluded characters. This
would work perfectly for the examples you gave, but whether that would
address your more general goals I can't say.

If that sounds promising, I might be able to dig up the code and post it
here. It's pretty inefficient, but it's small and simple so what the
heck? :)

You might also Google for "soundex". There's even a Wikipedia article on
the topic:
http://en.wikipedia.org/wiki/Soundex

That's a commonly used technique for matching words based on phonetics and
if you want a solution that is based more on language than on typography,
it might be a better approach.

A Google search on "dictionary spell check algorithm" also turned up a
number of promising links.

Pete
Oct 2 '08 #4
On Oct 2, 2:26 pm, S <scottremi...@yahoo.comwrote:
Any idea on how I would be able to do a search within C# that does
ranges or words

For example

I want to search for Chicken in the string

string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0

The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

string s1 = "This is Great Chicken";
string s2 = "Chickin";

return ???

or

Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Person enter the search phrase misspelled the word Chicken;

string s1 = "This is Great Chickin";
string s2 = "Chicken";
Hi,

I do not how to do it in code, I'm pretty sure that no version of the
framework provides such a feature.
In T-SQL you have this feature though take a look for SOUNDEX function
Oct 2 '08 #5
On Oct 2, 2:51 pm, Author <gnewsgr...@gmail.comwrote:
On Oct 2, 2:26 pm, S <scottremi...@yahoo.comwrote:
Any idea on how I would be able to do a search within C# that does
ranges or words
For example
I want to search for Chicken in the string
string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0
The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);
string s1 = "This is Great Chicken";
string s2 = "Chickin";
return ???
or
Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";
Person enter the search phrase misspelled the word Chicken;
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Check out the Regex namespace.

If you search something such as "Chickin" in a string which does not
contain this token, such as "This is Great Chicken", what would you
expect to return? I wouldn't expect it to return anything. Maybe you
wanna do something like google's key words suggestion if user entry
has some typos?


I do not know how to use a Regex in this case.can you elaborate?
Oct 2 '08 #6
"S" <sc**********@yahoo.comwrote in message
news:de**********************************@64g2000h sm.googlegroups.com...
The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);
The answer is that what you're looking for is not at all simple. I'd
recommend you do a Google search to see if you can find sample code for a
spell checker.
Oct 2 '08 #7
Jeff Johnson kirjoitti:
"S" <sc**********@yahoo.comwrote in message
news:de**********************************@64g2000h sm.googlegroups.com...
>The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

The answer is that what you're looking for is not at all simple. I'd
recommend you do a Google search to see if you can find sample code for a
spell checker.

Spelling checker does not help. For example the text contains "Cut" and
the user searches for "Cat". Both are spelled right, so the search gives
no result.

But, with addition to already mentioned Soundex, you might like to check
Levenshtein distance
(http://en.wikipedia.org/wiki/Levenshtein_distance). It tells how
similar two texts are; i.e. how many simple edit commands are needed to
make the second text from the first. For example distance between
Chicken and Chikin is one.

About Soundex: I have not studied it much, but I have a feeling Knuth
built it for english words. How well does it apply to other languages
or names?
-
Arto Viitanen
Oct 3 '08 #8
"Arto Viitanen" <ar***********@microteam.fiwrote in message
news:48***********************@news.tdc.fi...
>>The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

The answer is that what you're looking for is not at all simple. I'd
recommend you do a Google search to see if you can find sample code for a
spell checker.
Spelling checker does not help. For example the text contains "Cut" and
the user searches for "Cat". Both are spelled right, so the search gives
no result.
I agree, IF you say, "Hey, Mr. Spell Checker algorithm, tell me what's
misspelled in this sentence." However, if you are in control of the code,
I'd think you'd be able to say, "Hey, Mr. Spell Checker algorithm, make a
list of all the probable misspellings of 'chicken' and tell me if this
string contains any of them." See where I'm going?
But, with addition to already mentioned Soundex, you might like to check
Levenshtein distance
(http://en.wikipedia.org/wiki/Levenshtein_distance). It tells how
similar two texts are; i.e. how many simple edit commands are needed to
make the second text from the first. For example distance between
Chicken and Chikin is one.
Sounds good.
About Soundex: I have not studied it much, but I have a feeling Knuth
built it for english words. How well does it apply to other languages
or names?
I've heard people say that Soundex is an ancient crutch and shouldn't be
relied upon too much.
Oct 3 '08 #9
Peter Duniho wrote:
On Thu, 02 Oct 2008 11:26:38 -0700, S <sc**********@yahoo.comwrote:
>Any idea on how I would be able to do a search within C# that does
ranges or words

There are a variety of techniques for dealing with partial matching as
you describe.

I've actually implemented a very simple approach myself in the past, in
which I do a sort of character-by-character merge between the target
text and the search word, finding the spot in the target text where the
search word fits the best with the least number of excluded characters.
This would work perfectly for the examples you gave, but whether that
would address your more general goals I can't say.
[SNIP]
Pete
That sounds interesting. Is there any chance of seeing it as a
CodeProject article? :D
Oct 3 '08 #10
On Thu, 02 Oct 2008 22:21:06 -0700, Arto Viitanen
<ar***********@microteam.fiwrote:
[...]
Spelling checker does not help. For example the text contains "Cut" and
the user searches for "Cat". Both are spelled right, so the search gives
no result.
A spell checker itself wouldn't help. But if you apply the same algorithm
that a spell checker would, but with a dictionary that only has the word
you're specifically looking for at the moment, presumably those algorithms
would spit out some reasonable assessment of "closeness" for the word as
compared to the text being searched.

After all, for a spell checker to evaluate what the "best match" is for
misspelled words, it has to be able to compute some metric so that
possible matches can be ordered according to "closeness".

For all I know, the mentioned "Levenshtein distance" computation is in
fact something that a spell checker might use. But regardless, it seems
both would be good starting points for the question.

Pete
Oct 3 '08 #11
On Fri, 03 Oct 2008 12:05:12 -0700, Fernando A. Gómez F.
<fe****************@gmail.comwrote:
[...]
> I've actually implemented a very simple approach myself in the past,
in which I do a sort of character-by-character merge between the target
text and the search word, finding the spot in the target text where the
search word fits the best with the least number of excluded
characters. This would work perfectly for the examples you gave, but
whether that would address your more general goals I can't say.
[SNIP]
>Pete

That sounds interesting. Is there any chance of seeing it as a
CodeProject article? :D
Nope. But I'll look for the code and try to remember to post it here.

After that, someone else can post it to CodeProject if they like.

Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute force
worked great. I haven't looked at the Levenshtein article yet, but I
suspect that the algorithm described there is similar to what I did, but
with some actual thought and cleverness applied to it so that it has a
reasonable computational cost. :)

Pete
Oct 3 '08 #12
Peter Duniho wrote:
On Fri, 03 Oct 2008 12:05:12 -0700, Fernando A. Gómez F.
<fe****************@gmail.comwrote:
>[...]
>> I've actually implemented a very simple approach myself in the past,
in which I do a sort of character-by-character merge between the
target text and the search word, finding the spot in the target text
where the search word fits the best with the least number of excluded
characters. This would work perfectly for the examples you gave, but
whether that would address your more general goals I can't say.
[SNIP]
>>Pete

That sounds interesting. Is there any chance of seeing it as a
CodeProject article? :D

Nope. But I'll look for the code and try to remember to post it here.

After that, someone else can post it to CodeProject if they like.
That would be nice, thank you.
Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute force
worked great. I haven't looked at the Levenshtein article yet, but I
suspect that the algorithm described there is similar to what I did, but
with some actual thought and cleverness applied to it so that it has a
reasonable computational cost. :)
I see. However perhaps it could be improved. It would be interesting, at
least, to see the approach you used, so if you have the time and mood,
I'd be most grateful if you could post such code.

Thanks. Best Regards.
Oct 3 '08 #13
On Fri, 03 Oct 2008 16:48:27 -0700, Fernando A. Gómez F.
<fe****************@gmail.comwrote:
[...]
>Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute force
worked great. I haven't looked at the Levenshtein article yet, but I
suspect that the algorithm described there is similar to what I did,
but with some actual thought and cleverness applied to it so that it
has a reasonable computational cost. :)

I see. However perhaps it could be improved. It would be interesting, at
least, to see the approach you used, so if you have the time and mood,
I'd be most grateful if you could post such code.
Here you go (see below). Note that in my original problem, I was
comparing two complete strings and producing a metric that indicated how
similar the strings were, rather than searching for one string in
another. But you can easily modify this approach by using the same basic
technique, but only looking at substrings of the string being searched.

Though, actually...now that I look at this implementation again, I think
you could probably feed it a string to be searched and a string to look
for, and it would work okay. It could even be modified to return the
character index of the starting point of the best match for the string
being looked for. You'd want to limit how far it looks ahead, maybe
simply by delimiting attempted matches, or otherwise take into account
gaps within the matched characters, because otherwise it will consider
itself to have successfully matched the string even if each character it
found was actually just located in a different word. But other than that,
it's probably pretty close to what could work.

The basic idea below is to start out doing a basic string compare, but
when reaching a character that doesn't match, scan forward in each string
to try to match the current character in the other string (for two scans
each mis-match). If a matching character can be found, continue trying to
match characters again, looking for the point at which a comparison of the
remaining characters returns the highest "score". In this context, the
score is the number of characters that the algorithm could match before
running out of characters in one of the strings.

You'll note that the algorithm basically enumerates all of the possible
ways that the strings could match. There's essentially a binary search
tree that bifurcates each time there's a character mis-match, which means
that the cost of the algorithm has a worst-case cost that is exponential
by the length of the longest search. In practice, it will hit the
worst-case scenario only rarely, but it's at least worth understanding,
especially since you don't have to get very far into the algorithm for
even an average-case scenario to be relatively expensive as compared to
some other algorithm that might be designed more optimally.

Like I said, I was dealing with such short strings, I made no attempt at
all to be clever. It worked very well for my needs at the time, but I
make no promises about anyone else's. :)

Pete

static int ScoreStrings(string str1, string str2)
{
int ich = 0, cchMax = Math.Min(str1.Length, str2.Length);
int scoreMax;

// Scan the strings, stopping at the first difference
while (ich < cchMax && str1[ich] == str2[ich])
{
ich++;
}

// If we've exhausted one of the strings, that's the best we're
// going to do
if (ich == cchMax)
{
return ich;
}

// For each string, scan forward looking for the current
character
// in the other. Check each possible match, and use the best
score
// we find.

scoreMax = Math.Max(ich,
Math.Max(ScoreStringsScan(str1.Substring(ich), str2, ich),
ScoreStringsScan(str2.Substring(ich), str1, ich)));

return scoreMax;
}

static int ScoreStringsScan(string str1, string str2, int ichStart)
{
int scoreMax = 0;

for (int ichScan = ichStart; ichScan < str2.Length; ichScan++)
{
if (str1[0] == str2[ichScan])
{
int score = ScoreStrings(str1,
str2.Substring(ichScan));

if (score scoreMax)
{
scoreMax = score;
}
}
}

return scoreMax;
}
Oct 4 '08 #14
Peter Duniho wrote:
On Fri, 03 Oct 2008 16:48:27 -0700, Fernando A. Gómez F.
<fe****************@gmail.comwrote:
>[...]
>>Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute
force worked great. I haven't looked at the Levenshtein article yet,
but I suspect that the algorithm described there is similar to what I
did, but with some actual thought and cleverness applied to it so
that it has a reasonable computational cost. :)

I see. However perhaps it could be improved. It would be interesting,
at least, to see the approach you used, so if you have the time and
mood, I'd be most grateful if you could post such code.

Here you go (see below). Note that in my original problem, I was
comparing two complete strings and producing a metric that indicated how
similar the strings were, rather than searching for one string in
another. But you can easily modify this approach by using the same
basic technique, but only looking at substrings of the string being
searched.

Though, actually...now that I look at this implementation again, I think
you could probably feed it a string to be searched and a string to look
for, and it would work okay. It could even be modified to return the
character index of the starting point of the best match for the string
being looked for. You'd want to limit how far it looks ahead, maybe
simply by delimiting attempted matches, or otherwise take into account
gaps within the matched characters, because otherwise it will consider
itself to have successfully matched the string even if each character it
found was actually just located in a different word. But other than
that, it's probably pretty close to what could work.

The basic idea below is to start out doing a basic string compare, but
when reaching a character that doesn't match, scan forward in each
string to try to match the current character in the other string (for
two scans each mis-match). If a matching character can be found,
continue trying to match characters again, looking for the point at
which a comparison of the remaining characters returns the highest
"score". In this context, the score is the number of characters that
the algorithm could match before running out of characters in one of the
strings.

You'll note that the algorithm basically enumerates all of the possible
ways that the strings could match. There's essentially a binary search
tree that bifurcates each time there's a character mis-match, which
means that the cost of the algorithm has a worst-case cost that is
exponential by the length of the longest search. In practice, it will
hit the worst-case scenario only rarely, but it's at least worth
understanding, especially since you don't have to get very far into the
algorithm for even an average-case scenario to be relatively expensive
as compared to some other algorithm that might be designed more optimally.

Like I said, I was dealing with such short strings, I made no attempt at
all to be clever. It worked very well for my needs at the time, but I
make no promises about anyone else's. :)

Pete
Thanks a lot Pete, it looks interesting.

Regards,
Fernando.
Oct 6 '08 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Anand Pillai | last post by:
To search a word in a group of words, say a paragraph or a web page, would a string search or a regexp search be faster? The string search would of course be, if str.find(substr) != -1:...
4
by: Ken Fine | last post by:
I'm looking to find or create an ASP script that will take a string, examine it for a search term, and if it finds the search term in the string, return the highlighted search term along with the...
22
by: Phlip | last post by:
C++ers: Here's an open ended STL question. What's the smarmiest most templated way to use <string>, <algorithms> etc. to turn this: " able search baker search charlie " into this: " able...
32
by: tshad | last post by:
Can you do a search for more that one string in another string? Something like: someString.IndexOf("something1","something2","something3",0) or would you have to do something like: if...
0
by: Hriday | last post by:
Hi there, I am working on a web application in ASP.NET My web server and AD machine are in the same domain but located on diffrent phisical machine, I am not able to search user's info by the...
1
by: Eric | last post by:
Hi: I have two files. I search pattern ":" from emails text file and save email contents into a database. Another search pattern " field is blank. Please try again.", vbExclamation + vbOKOnly...
4
by: BenCoo | last post by:
Hello, In a Binary Search Tree I get the error : Object must be of type String if I run the form only with the "Dim bstLidnummer As New BinarySearchTree" it works fine. Thanks for any...
0
by: | last post by:
I have a question about spawning and displaying subordinate list controls within a list control. I'm also interested in feedback about the design of my search application. Lots of code is at the...
0
by: JamesOo | last post by:
I have the code below, but I need to make it searchable in query table, below code only allowed seach the table which in show mdb only. (i.e. have 3 table, but only can search either one only,...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.