By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,557 Members | 1,406 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,557 IT Pros & Developers. It's quick & easy.

String matching algorythm

P: 3
Hello,

I'm writting a program in C++.
One part of my program is to compare two strings and as a result I have to have a number (range:0-1) which represents a similarity between those two strings.

I don't need a solution in C++, I just need a hint which algorythm to use to implement this concept.
I searched Web, but there is a million of algorythms and I don't know which to use.

Thanx for any help!!!
Aug 9 '07 #1
Share this Question
Share on Google+
5 Replies


Expert 10K+
P: 11,448
Hello,

I'm writting a program in C++.
One part of my program is to compare two strings and as a result I have to have a number (range:0-1) which represents a similarity between those two strings.

I don't need a solution in C++, I just need a hint which algorythm to use to implement this concept.
I searched Web, but there is a million of algorythms and I don't know which to use.

Thanx for any help!!!
Google for "gestalt matching". This algorithm is constructed around the LCS
algorithm (Largest Common Substring). It's a fun algorithm to implement.

kind regards,

Jos
Aug 9 '07 #2

P: 3
Hello,

I'm writting a program in C++.
One part of my program is to compare two strings and as a result I have to have a number (range:0-1) which represents a similarity between those two strings.

I don't need a solution in C++, I just need a hint which algorythm to use to implement this concept.
I searched Web, but there is a million of algorythms and I don't know which to use.

Thanx for any help!!!
If someone has a C++ solution, I wouldn't mind :P
Aug 9 '07 #3

Expert 10K+
P: 11,448
If someone has a C++ solution, I wouldn't mind :P
If you had read the posting guidelines (top right 'Help' link) you would've known
that we don't spoonfeed complete boilerplate code here. We're very willing to
help you when you get stuck trying to implement it *yourself*.

kind regards,

Jos
Aug 9 '07 #4

P: 3
Google for "gestalt matching". This algorithm is constructed around the LCS
algorithm (Largest Common Substring). It's a fun algorithm to implement.

kind regards,

Jos

That's great start.
But gestalt matching matches the patterns. I need a simple string matcher that compares common prefixes and suffixes of literals.

For example, I compared 'Birthdate' (column name from first table) and 'Date' (column name from second table) and gestalt matching gave 100% similarity.
But obviously these two literals does not represent same thing. It should be about 66%.

Do you know something suitable for this concept?
Aug 9 '07 #5

Expert 10K+
P: 11,448
That's great start.
But gestalt matching matches the patterns. I need a simple string matcher that compares common prefixes and suffixes of literals.
We're talking different gestalt matchers then: for two words X and Y it tries to
find a longest substring S such that X = LxSRx and Y = LySRy. the number
2*|S|/(|X|+|Y|) where |Z| denotes the length of a string Z is a measure for how
'identical' the two strings are. Pure gestalt matching uses a recursive application
where Lx, Ly and Rx, Ry are taken in consideration as well and the measure
of being 'identical' is added to the total.

Two unequal strings never give a 100% gestalt match.

kind regards,

Jos
Aug 9 '07 #6

Post your reply

Sign in to post your reply or Sign up for a free account.