473,799 Members | 3,245 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Search a String

S
Any idea on how I would be able to do a search within C# that does
ranges or words
For example

I want to search for Chicken in the string
string s1 = "This is Great Chicken";
string s2 = "Chicken";
return s1.IndexOf(s2) >= 0

The trick is how would I search for: Person doing the search
misspelled the word Chickin (Happens all the time);

string s1 = "This is Great Chicken";
string s2 = "Chickin";

return ???

or

Worst yet
string s1 = "This is Great Chickin";
string s2 = "Chicken";

Person enter the search phrase misspelled the word Chicken;

string s1 = "This is Great Chickin";
string s2 = "Chicken";

Oct 2 '08
14 2981
On Thu, 02 Oct 2008 22:21:06 -0700, Arto Viitanen
<ar***********@ microteam.fiwro te:
[...]
Spelling checker does not help. For example the text contains "Cut" and
the user searches for "Cat". Both are spelled right, so the search gives
no result.
A spell checker itself wouldn't help. But if you apply the same algorithm
that a spell checker would, but with a dictionary that only has the word
you're specifically looking for at the moment, presumably those algorithms
would spit out some reasonable assessment of "closeness" for the word as
compared to the text being searched.

After all, for a spell checker to evaluate what the "best match" is for
misspelled words, it has to be able to compute some metric so that
possible matches can be ordered according to "closeness" .

For all I know, the mentioned "Levenshtei n distance" computation is in
fact something that a spell checker might use. But regardless, it seems
both would be good starting points for the question.

Pete
Oct 3 '08 #11
On Fri, 03 Oct 2008 12:05:12 -0700, Fernando A. Gómez F.
<fe************ ****@gmail.comw rote:
[...]
> I've actually implemented a very simple approach myself in the past,
in which I do a sort of character-by-character merge between the target
text and the search word, finding the spot in the target text where the
search word fits the best with the least number of excluded
characters. This would work perfectly for the examples you gave, but
whether that would address your more general goals I can't say.
[SNIP]
>Pete

That sounds interesting. Is there any chance of seeing it as a
CodeProject article? :D
Nope. But I'll look for the code and try to remember to post it here.

After that, someone else can post it to CodeProject if they like.

Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute force
worked great. I haven't looked at the Levenshtein article yet, but I
suspect that the algorithm described there is similar to what I did, but
with some actual thought and cleverness applied to it so that it has a
reasonable computational cost. :)

Pete
Oct 3 '08 #12
Peter Duniho wrote:
On Fri, 03 Oct 2008 12:05:12 -0700, Fernando A. Gómez F.
<fe************ ****@gmail.comw rote:
>[...]
>> I've actually implemented a very simple approach myself in the past,
in which I do a sort of character-by-character merge between the
target text and the search word, finding the spot in the target text
where the search word fits the best with the least number of excluded
characters. This would work perfectly for the examples you gave, but
whether that would address your more general goals I can't say.
[SNIP]
>>Pete

That sounds interesting. Is there any chance of seeing it as a
CodeProject article? :D

Nope. But I'll look for the code and try to remember to post it here.

After that, someone else can post it to CodeProject if they like.
That would be nice, thank you.
Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute force
worked great. I haven't looked at the Levenshtein article yet, but I
suspect that the algorithm described there is similar to what I did, but
with some actual thought and cleverness applied to it so that it has a
reasonable computational cost. :)
I see. However perhaps it could be improved. It would be interesting, at
least, to see the approach you used, so if you have the time and mood,
I'd be most grateful if you could post such code.

Thanks. Best Regards.
Oct 3 '08 #13
On Fri, 03 Oct 2008 16:48:27 -0700, Fernando A. Gómez F.
<fe************ ****@gmail.comw rote:
[...]
>Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute force
worked great. I haven't looked at the Levenshtein article yet, but I
suspect that the algorithm described there is similar to what I did,
but with some actual thought and cleverness applied to it so that it
has a reasonable computational cost. :)

I see. However perhaps it could be improved. It would be interesting, at
least, to see the approach you used, so if you have the time and mood,
I'd be most grateful if you could post such code.
Here you go (see below). Note that in my original problem, I was
comparing two complete strings and producing a metric that indicated how
similar the strings were, rather than searching for one string in
another. But you can easily modify this approach by using the same basic
technique, but only looking at substrings of the string being searched.

Though, actually...now that I look at this implementation again, I think
you could probably feed it a string to be searched and a string to look
for, and it would work okay. It could even be modified to return the
character index of the starting point of the best match for the string
being looked for. You'd want to limit how far it looks ahead, maybe
simply by delimiting attempted matches, or otherwise take into account
gaps within the matched characters, because otherwise it will consider
itself to have successfully matched the string even if each character it
found was actually just located in a different word. But other than that,
it's probably pretty close to what could work.

The basic idea below is to start out doing a basic string compare, but
when reaching a character that doesn't match, scan forward in each string
to try to match the current character in the other string (for two scans
each mis-match). If a matching character can be found, continue trying to
match characters again, looking for the point at which a comparison of the
remaining characters returns the highest "score". In this context, the
score is the number of characters that the algorithm could match before
running out of characters in one of the strings.

You'll note that the algorithm basically enumerates all of the possible
ways that the strings could match. There's essentially a binary search
tree that bifurcates each time there's a character mis-match, which means
that the cost of the algorithm has a worst-case cost that is exponential
by the length of the longest search. In practice, it will hit the
worst-case scenario only rarely, but it's at least worth understanding,
especially since you don't have to get very far into the algorithm for
even an average-case scenario to be relatively expensive as compared to
some other algorithm that might be designed more optimally.

Like I said, I was dealing with such short strings, I made no attempt at
all to be clever. It worked very well for my needs at the time, but I
make no promises about anyone else's. :)

Pete

static int ScoreStrings(st ring str1, string str2)
{
int ich = 0, cchMax = Math.Min(str1.L ength, str2.Length);
int scoreMax;

// Scan the strings, stopping at the first difference
while (ich < cchMax && str1[ich] == str2[ich])
{
ich++;
}

// If we've exhausted one of the strings, that's the best we're
// going to do
if (ich == cchMax)
{
return ich;
}

// For each string, scan forward looking for the current
character
// in the other. Check each possible match, and use the best
score
// we find.

scoreMax = Math.Max(ich,
Math.Max(ScoreS tringsScan(str1 .Substring(ich) , str2, ich),
ScoreStringsSca n(str2.Substrin g(ich), str1, ich)));

return scoreMax;
}

static int ScoreStringsSca n(string str1, string str2, int ichStart)
{
int scoreMax = 0;

for (int ichScan = ichStart; ichScan < str2.Length; ichScan++)
{
if (str1[0] == str2[ichScan])
{
int score = ScoreStrings(st r1,
str2.Substring( ichScan));

if (score scoreMax)
{
scoreMax = score;
}
}
}

return scoreMax;
}
Oct 4 '08 #14
Peter Duniho wrote:
On Fri, 03 Oct 2008 16:48:27 -0700, Fernando A. Gómez F.
<fe************ ****@gmail.comw rote:
>[...]
>>Note that I wasn't kidding when I said it's not efficient code. The
application I needed it for involved very short strings and brute
force worked great. I haven't looked at the Levenshtein article yet,
but I suspect that the algorithm described there is similar to what I
did, but with some actual thought and cleverness applied to it so
that it has a reasonable computational cost. :)

I see. However perhaps it could be improved. It would be interesting,
at least, to see the approach you used, so if you have the time and
mood, I'd be most grateful if you could post such code.

Here you go (see below). Note that in my original problem, I was
comparing two complete strings and producing a metric that indicated how
similar the strings were, rather than searching for one string in
another. But you can easily modify this approach by using the same
basic technique, but only looking at substrings of the string being
searched.

Though, actually...now that I look at this implementation again, I think
you could probably feed it a string to be searched and a string to look
for, and it would work okay. It could even be modified to return the
character index of the starting point of the best match for the string
being looked for. You'd want to limit how far it looks ahead, maybe
simply by delimiting attempted matches, or otherwise take into account
gaps within the matched characters, because otherwise it will consider
itself to have successfully matched the string even if each character it
found was actually just located in a different word. But other than
that, it's probably pretty close to what could work.

The basic idea below is to start out doing a basic string compare, but
when reaching a character that doesn't match, scan forward in each
string to try to match the current character in the other string (for
two scans each mis-match). If a matching character can be found,
continue trying to match characters again, looking for the point at
which a comparison of the remaining characters returns the highest
"score". In this context, the score is the number of characters that
the algorithm could match before running out of characters in one of the
strings.

You'll note that the algorithm basically enumerates all of the possible
ways that the strings could match. There's essentially a binary search
tree that bifurcates each time there's a character mis-match, which
means that the cost of the algorithm has a worst-case cost that is
exponential by the length of the longest search. In practice, it will
hit the worst-case scenario only rarely, but it's at least worth
understanding, especially since you don't have to get very far into the
algorithm for even an average-case scenario to be relatively expensive
as compared to some other algorithm that might be designed more optimally.

Like I said, I was dealing with such short strings, I made no attempt at
all to be clever. It worked very well for my needs at the time, but I
make no promises about anyone else's. :)

Pete
Thanks a lot Pete, it looks interesting.

Regards,
Fernando.
Oct 6 '08 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
39359
by: Anand Pillai | last post by:
To search a word in a group of words, say a paragraph or a web page, would a string search or a regexp search be faster? The string search would of course be, if str.find(substr) != -1: domything() And the regexp search assuming no case restriction would be,
4
2681
by: Ken Fine | last post by:
I'm looking to find or create an ASP script that will take a string, examine it for a search term, and if it finds the search term in the string, return the highlighted search term along with the words that surround it. In other words, I want the search term highlighted and shown in an excerpt of the context in which it appears. Any suggestions or pointers? This behavior is most often seen as part of a search engine. In my case, I want...
22
11124
by: Phlip | last post by:
C++ers: Here's an open ended STL question. What's the smarmiest most templated way to use <string>, <algorithms> etc. to turn this: " able search baker search charlie " into this: " able replace baker replace charlie "
32
14905
by: tshad | last post by:
Can you do a search for more that one string in another string? Something like: someString.IndexOf("something1","something2","something3",0) or would you have to do something like: if ((someString.IndexOf("something1",0) >= 0) || ((someString.IndexOf("something2",0) >= 0) ||
0
1098
by: Hriday | last post by:
Hi there, I am working on a web application in ASP.NET My web server and AD machine are in the same domain but located on diffrent phisical machine, I am not able to search user's info by the following code if I pass only path as parameter to DirectoyEntry class. The same code is working if I pass all the three parameter as path,userId and password. but I have to use windows authenticatoin and can't get password from Windows...
1
2727
by: Eric | last post by:
Hi: I have two files. I search pattern ":" from emails text file and save email contents into a database. Another search pattern " field is blank. Please try again.", vbExclamation + vbOKOnly Me.txtEmail.SetFocus Exit Sub End If Me.txtStatusBar.Value = "Parsing..." strEmail = Me.txtEmail.Value
4
2172
by: BenCoo | last post by:
Hello, In a Binary Search Tree I get the error : Object must be of type String if I run the form only with the "Dim bstLidnummer As New BinarySearchTree" it works fine. Thanks for any help on this, Benny
0
2083
by: | last post by:
I have a question about spawning and displaying subordinate list controls within a list control. I'm also interested in feedback about the design of my search application. Lots of code is at the end of this message, but I will start with an overview of the problem. I've made a content management solution for my work with a decently structured relational database system. The CMS stores articles. The CMS also stores related items --...
0
2730
by: JamesOo | last post by:
I have the code below, but I need to make it searchable in query table, below code only allowed seach the table which in show mdb only. (i.e. have 3 table, but only can search either one only, cannot serch by combine 3 table) Example I have the query table below, how do I make the code to seach based on the query from this: SELECT Product.ID, Product.Description, Quantity.Quantity, Quantity.SeialNo, Quantity.SupplierID,...
0
10785
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information inside an image, hide your complete image as text ,search for a particular image inside a directory, minimize the size of the image. However this is not a new concept, there is a concept called Steganography which enables to conceal your secret...
0
9687
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9541
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10484
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10251
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10228
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9072
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7565
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
4141
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2938
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.