473,396 Members | 1,915 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

String Comparison

Hello,

DB2 V8 FP 11.

Given two strings, I need an UDF to compare both and return the
percentage of matching characters.

For example:

ABCDEFGHIJ
ACBDEFGHIJ

The strings are 80% alike.

I can think of an easy UDF that compares each byte and returns the
number of different chars, in %. Just wondering if there is any other
built-in function to do this, or any other better way.

Thanks in advance,

May 12 '06 #1
4 8686
DIFFERENCE to compare the SOUND of 2 string.

May 12 '06 #2
Thanks for the hint.

I saw this function before posting. Unfortunately, it does not meet my
requirements. For example and according to the docs:

SELECT EMPNO, LASTNAME FROM EMPLOYEE
WHERE SOUNDEX(LASTNAME) = SOUNDEX('Loucesy')

EMPNO LASTNAME
------ ---------------
000110 LUCCHESSI

But for my application, 'Loucesy' and 'LUCCHESSI' are very different
strings, even though their sounds are similar.

Any other ideas?

Thanks

May 12 '06 #3
Michel Esber wrote:
Thanks for the hint.

I saw this function before posting. Unfortunately, it does not meet my
requirements. For example and according to the docs:

SELECT EMPNO, LASTNAME FROM EMPLOYEE
WHERE SOUNDEX(LASTNAME) = SOUNDEX('Loucesy')

EMPNO LASTNAME
------ ---------------
000110 LUCCHESSI

But for my application, 'Loucesy' and 'LUCCHESSI' are very different
strings, even though their sounds are similar.

Any other ideas?


I *think* what you're looking for is the "distance" between strings;
i.e. the number of changes one must make to get from one string to
another. The Levenshtein Distance algorithm provides a way to calculate
this. See:

Levenshtein Distance Article
http://en.wikipedia.org/wiki/Levenshtein_distance

Example implementations in several languages
http://en.wikisource.org/wiki/Levenshtein_distance

This algorithm returns 2 for the distance between "ABCDEFGHIJ" and
"ACBDEFGHIJ" (indicating that 2 alterations, an insertion and a
deletion, have to be made to get from one to the other). There are
refinements of the Levenshtein Distance algorithm that include swapping
characters as an operation which could return 1 for the distance.

To get a percentage similarity you could do something fairly crude like
comparing the distance to the length of the string, e.g.:

100 * (len - distance) / len

Which in this case would give 80%.

Unfortunately, looking at the implementations, the algorithm is
probably quite hard to implement efficiently in an SQL UDF. You'd
likely be better off implementing it as an external UDF in C or Java
(there are C++ and Java implementations at the link above, as well as
Lisp, Python, Ruby, Perl, Haskell, etc.)
HTH,

Dave.

--

May 12 '06 #4
Dave, that is exactly what I needed.

Thanks a lot.

May 12 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: David Graham | last post by:
Hi I have been busy going through the last weeks postings in an attempt to absorb javascript syntax (I guess it's not possible to just absorb this stuff in a passive way - I'm getting way out of...
2
by: Neil Zanella | last post by:
Hello, Consider the following program. There are two C style string stack variables and one C style string heap variable. The compiler may or may not optimize the space taken up by the two stack...
8
by: Grant Wagner | last post by:
I'm a bit confused by String() (typeof 'string') vs new String() (typeof 'object'). When you need to access a method or property of a -String-, what type is JavaScript expecting (or rather, what...
51
by: Alan | last post by:
hi all, I want to define a constant length string, say 4 then in a function at some time, I want to set the string to a constant value, say a below is my code but it fails what is the correct...
46
by: yadurajj | last post by:
Hello i am newbie trying to learn C..I need to know about string comparisons in C, without using a library function,...recently I was asked this in an interview..I can write a small program but I...
5
by: MaSTeR | last post by:
Can anyone provide a practical short example of why in C# I shouldn't compare two strings with == ? If I write this in JAVA String string1 = "Widget"; if (string1 == "Widget") ...
4
by: Peter Kirk | last post by:
Hi I am looking at some code which in many places performs string comparison using == instead of Equals. Am I right in assuming that this will in fact work "as expected" when it is strings...
4
by: Jim Langston | last post by:
Is there any builtin lowercase std::string compare? Right now I'm doing this: if ( _stricmp( AmmoTypeText.c_str(), "GunBullet" ) == 0 ) AmmoType = Item_Ammo_GunBullet; Is there anything the...
26
by: Neville Lang | last post by:
Hi all, I am having a memory blank at the moment. I have been writing in C# for a number of years and now need to do something in VB.NET, so forgive me such a primitive question. In C#, I...
6
by: aznimah | last post by:
hi, i'm work on image comparison. i'm using the similarity measurement which i need to: 1) convert the image into the binary form since the algorithm that i've use works with binary data for the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.