# String matching (Inteligent)

 P: n/a Hey I need a string comparison algorithm which can comapre 2 strings and come with a number, eg %, which indicates how much the string a alike. I could of course just comapre the number of occurences of each letter, but then aab woudl be equal aba. This is not good enough. So the algorithm also has to compare closeness og letters etc. I hope i can find a "finished" or skeleton algorithm since i am convicnced that other, which have the time, could implement this much better than me. Optimally i want something like this: double closeness = SuperComparere("My string", "My strong"); // returns eg. 93% Anyone heard about above and know sources for solutions? Kindly Regards Anders Sep 26 '06 #1
 P: n/a <"anderskj1" wrote: I need a string comparison algorithm which can comapre 2 strings and come with a number, eg %, which indicates how much the string a alike. I could of course just comapre the number of occurences of each letter, but then aab woudl be equal aba. This is not good enough. So the algorithm also has to compare closeness og letters etc. I hope i can find a "finished" or skeleton algorithm since i am convicnced that other, which have the time, could implement this much better than me. Optimally i want something like this: double closeness = SuperComparere("My string", "My strong"); // returns eg. 93% Anyone heard about above and know sources for solutions? I haven't done anything similar myself, but http://en.wikipedia.org/wiki/Fuzzy_string_searching might be useful to you. -- Jon Skeet - http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too Sep 26 '06 #2

 P: n/a I need a string comparison algorithm which can comapre 2 strings and come with a number, eg %, which indicates how much the string a alike. What you are referring to is a rather difficult topic. I think (but am not sure) that many SQL dbms systems provide something alike, but they need to index the searchable data every once in a while. (PS: found it at wikipedia, http://en.wikipedia.org/wiki/SQL_Ser...l_Text_Search). Anyways, the only thing I can really contribute to this thread is that I think you should search for some code (.NET or 3rd party) that does this, as it's a difficult task. Perhaps the Regex libraries provide some help. Good luck. Let us know what you chose to do. Sep 26 '06 #3

 P: n/a This is a common operation when working with DNA/protein sequences. There are many free tools/algorithms you should be able to steal inspiration from. One starting point could be: http://en.wikipedia.org/wiki/Sequence_alignment /Per Sep 26 '06 #4

 P: n/a Hi Take a look on Google for Levenstein (think that's spelt correctly) Distance Calculation - from memory it sounds remarkably close to what you're after Cheers Martin pe*****@gmail.com wrote: This is a common operation when working with DNA/protein sequences. There are many free tools/algorithms you should be able to steal inspiration from. One starting point could be: http://en.wikipedia.org/wiki/Sequence_alignment /Per Sep 26 '06 #5

