473,698 Members | 2,339 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Use of Soundex fields in access

Hi All,

I have database with names on which I want to use the soundex option.
So I have created two seperate fields for the Lastname and Firstname in
which
I save the Soundex version of a new name I save in the database.
I have the soundex code with the 6 numeric option. So I save for
example in the field
LastnameSE = 600192 and in the FirstnameSE = 545910.

Should these Soundex fields be numeric or text fields?

Marco

Jul 30 '06
32 8593
OK,

Lyle thanks for the response.
Any comments on my other post?

http://groups.google.nl/group/comp.d...02848020326d6a

Lyle Fairfield schreef:
vo***********@g mail.com wrote in news:1154363487 .281052.20630
@i3g2000cwc.goo glegroups.com:
you don't think this is soundex?

It's not the soundex of those who wish to believe there is a TRUE and HOLY
"SOUNDEX".

It is a soundex for those who think of soundex as term for a number of
functions, including used defined soundex functions which may be written
with a particular purpose for a particular database in mind, which may be
useful in identifying names with different spellings as similar in sound. I
am one of those and if anyone thinks I am trespassing on this great and
good definition of Soundex, protected and pure in the bowels of the United
States National Archives and Records Administration, and the US Patent
Office, then I am entirely willing to use the term "phonetic algorithm" or
"DoTheWordsSoun dTheSame alogorithm" in its place.

On the point of your original question, if you save 123456 as text you will
use six or twelve bytes depending upon your implemetation of unicode. If
you use a long integer you will use four. I'd go for the four, on the basis
of size, speed and a congenital disposition towards pissing off pedants.
\--
Lyle Fairfield
Jul 31 '06 #21
In article <11************ *********@b28g2 000cwb.googlegr oups.com>,
vo***********@g mail.com says...
OK,

Lyle thanks for the response.
Any comments on my other post?
Too bad it will not compile for me on Access 2003.
This is where it chokes: Dim DM_Map As DM_Structure
It does not know what DM_Structure is. Where is it defined?

I visited the National Archives about 1988 to view passenger lists.
They handed everyone a brochure explaining Soundex. All the films
were in Soundex order. So this is not the classical Soundex, but falls
into the class of soundex-like pattern matching algorithms. Just my
characterizatio n, anyway.

Mike Gramelspacher
Jul 31 '06 #22
So would I better use another soundex algo then?

Marco

Mike Gramelspacher schreef:
In article <11************ *********@b28g2 000cwb.googlegr oups.com>,
vo***********@g mail.com says...
OK,

Lyle thanks for the response.
Any comments on my other post?

Too bad it will not compile for me on Access 2003.
This is where it chokes: Dim DM_Map As DM_Structure
It does not know what DM_Structure is. Where is it defined?

I visited the National Archives about 1988 to view passenger lists.
They handed everyone a brochure explaining Soundex. All the films
were in Soundex order. So this is not the classical Soundex, but falls
into the class of soundex-like pattern matching algorithms. Just my
characterizatio n, anyway.

Mike Gramelspacher
Jul 31 '06 #23
In article <11************ **********@75g2 000cwc.googlegr oups.com>,
vo***********@g mail.com says...
So would I better use another soundex algo then?

Marco
It probably depends which algorithm works best for the names you need to
encode. I think I read that Soundex was patented around 1918 and was
used for the U.S. Census. The microfilms I saw were indexed by Soundex,
so obviously you need to use the same algorithm to encode the name for
which you are searching as was used to encode the microfilms. Soundex
is not the best for all names. Others have been developed over the
years. I also have code for the Metaphone Algorithm.

Regarding my question for obtaining the Deitch-Mokotoff code, you
nevered answered. Where can a person download or buy an implementation
of Deitch-Mokotoff?

Mike Gramelspacher
Jul 31 '06 #24
You can have it no problem,

if you could just send me the Metaphome Algoritm in return, thanks:

'************** *************** *********
' Name: SoundEx - Daitch-Mokotoff algorithm, 6 character result
' Description:
' Encodes an alphabetic name to a six character Daitch-Mokotoff code
' following the Daitch-Mokotoff (D-M) rules available at the sites
' listed in the source code. The D-M algorithm resolves some
' deficiencies that occur in the older Miracode/Soundex system (also
' known as the "Russell"/NARA system - used by the US Census Bureau).
' The benefits include: 1) Six meaningful letter sounds (versus four
' so that Peters is different from Peterson). 2) The initial letter
' is also sound encoded. 3) More sound variations (10 basic codes
' versus seven and double code sounds). 4) Improves sound matching
' for Jewish, Slavic, and Germanic names.
'
' By: Greg Julius, Copyright 2000, Gr*********@cyc onsult.com
' The author would appreciate getting bug reports/fixes and
' any improvements made to this code.
'
' Permission to use and modify is given, please give the author
' credit and pass along the modifications.
'
' Usage: strSoundExResul t = SoundEx(strStri ngToEncode)
' Inputs: An aphabetic string, usually representing a person/place
name
' Returns: Six digit D-M code string
'
' Requires: Code runs in Visual Basic Module
'
' Side Effects: None known.
'
' Searching on the internet finds these sites. Variously they explain
' some history on the D-M Sound Encoding, How D-M coding works, show
the
' D-M sound table, and provide some examples.
' http://www.everton.com/oe3-10/soundex.htm
' http://www.jewishgen.org/infofiles/soundex.txt
' http://www.avotaynu.com/soundex.html
' http://www.gcis.net/cjhs/aguideto.htm
'
' The following web-based D-M calculators to test SoundEx results,
' Some errors in the samples were found on two of the above sites.
' http://jgsr.net/database/DM6.cgi (only D-M soundex)
' http://www.jewishgen.org/jos (both D-M and NARA/Russell)
'
' Reference:
' Steuart, Bradley W. The Soundex Daitch-Mokotoff Reference Guide. 2 v.
' Precision Indexing, 1994. Provides Soundex codes to over 125,000
surnames.

Option Explicit

Private Type DM_Structure
DM_String As String
DM_Matchlen As Integer
DM_Start As String
DM_Vowel As String
DM_Other As String
End Type

Private DM_Table As Collection
Private DM_TableLoaded As Boolean ' false until initialized.
'

Public Function SOUNDEX(strToEn code As String) As String
' Usage: SoundEx = SoundEx(strToEn code)
' Purpose: Return six character D-M encoding of input name
' Inputs:
' strToEncode - An aphabetic string, usually representing a
person/place name
' Returns (Function):
' SoundEx - Six digit D-M code string
Dim intEncodeStrLen As Integer
Dim intDMArray() As Integer
Dim strToEncodelen As Integer
Dim i As Integer
Dim strEncodedStrin g As String
Dim strLastCode As String
Dim DM_Map As DM_Structure

Call LoadDMTable ' Load the DM_Table - only loaded once

' Clean the incoming name. Upper case, nothing but letters
strToEncode = RemoveNotChars( Trim(UCase(strT oEncode)),
"ABCDEFGHIJKLMN OPQRSTUVWXYZ")
intEncodeStrLen = Len(strToEncode )

strEncodedStrin g = ""
strLastCode = ""
'Potentially search the whole string for meaningful sounds. Stop
after 6 are found
For i = 1 To intEncodeStrLen
If Len(strEncodedS tring) >= 6 Then
SOUNDEX = Left(strEncoded String, 6)
Exit Function
End If

'Lookup in the DM_Table
Call FindDMMatch(i, strToEncode, DM_Map)
If DM_Map.DM_Match len = -1 Then
'Should not happen if table is complete and the input is clean
MsgBox "No Match found via DM lookup", vbOKOnly, "SoundEx
Error"
SOUNDEX = "000000"
Exit Function
End If

'Depending upon where the found sound is, encode from the DM_Map
If i = 1 Then
'Start of string, use the start value if valid
Call AddToEncodedStr ing(strEncodedS tring, strLastCode,
DM_Map.DM_Start )
ElseIf DM_Map.DM_Match len = 2 And DM_Map.DM_Other = "-1" _
And InStr("AEIOUJY" , Mid(strToEncode , i +
DM_Map.DM_Match len, 1)) <0 Then
'A vowel pair preceeding another vowel, use the vowel value if
valid
Call AddToEncodedStr ing(strEncodedS tring, strLastCode,
DM_Map.DM_Vowel )
ElseIf DM_Map.DM_Strin g = "H" _
And InStr("AEIOUJY" , Mid(strToEncode , i +
DM_Map.DM_Match len, 1)) <0 Then
'An H preceeding another vowel, use the vowel value if valid
Call AddToEncodedStr ing(strEncodedS tring, strLastCode,
DM_Map.DM_Vowel )
Else
'Use all other case value
Call AddToEncodedStr ing(strEncodedS tring, strLastCode,
DM_Map.DM_Other )
End If
i = i + DM_Map.DM_Match len - 1 ' adjust indexed based upon
matched string length

Next i

SOUNDEX = Left(strEncoded String & "000000", 6) ' Ensure the string
is at least 6 long

End Function

Private Sub AddToEncodedStr ing(strToAddTo As String, strLastCode As
String, strToAdd As String)
' Usage: Called by SoundEx Function
' Purpose: Append sound to encoded string if rules permit
' Inputs:
' strToAdd - Encoded sound value from DM_Map
' Returns (Modified Parameters):
' strToAddTo - Passed string containing sounds encoded so far
' strLastCode - Passed string containing last sound passed to this
routine
If strToAdd = strLastCode Then ' Drop duplicate sounds
Exit Sub
End If

strLastCode = strToAdd

If strToAdd = "-1" Then ' Value from table means ignore sound
Exit Sub
End If

strToAddTo = strToAddTo & strToAdd ' Append new sound

End Sub

Private Sub LoadDMTable()
' Usage: Called by SoundEx Function
' Purpose: Generate DM_Table values
' Inputs: None - values generated by routine
' Returns (Module level):
' DM_Table - Collection of items. The key of which is the
sound to encode
' and the item data is the DM_Map structure
values
' DM_TableLoaded - Boolean value to flag if the DM_Table has been
loaded.
If DM_TableLoaded = True Then
Exit Sub ' Already loaded
End If

Set DM_Table = New Collection
'Load each of the D-M sounds and their rules to the DM_Table
LoadDMElements ("AI,AJ,AY;0 ;1;-1")
LoadDMElements ("AU;0;7;-1")
LoadDMElements ("A;0;-1;-1")
LoadDMElements ("B;7;7;7")
LoadDMElements ("CHS;5;54;5 4")
LoadDMElements ("CH;5;5;5")
LoadDMElements ("CK;5;5;5")
LoadDMElements ("CZ,CS,CSZ,CZS ;4;4;4")
LoadDMElements ("C;4;4;4")
LoadDMElements ("DRZ,DRS;4;4;4 ")
LoadDMElements ("DS,DSH,DSZ;4; 4;4")
LoadDMElements ("DZ,DZH,DZS;4; 4;4")
LoadDMElements ("D,DT;3;3;3 ")
LoadDMElements ("EI,EJ,EY;0 ;1;-1")
LoadDMElements ("EU;1;1;-1")
LoadDMElements ("E;0;-1;-1")
LoadDMElements ("FB;7;7;7")
LoadDMElements ("F;7;7;7")
LoadDMElements ("G;5;5;5")
LoadDMElements ("H;5;5;-1")
LoadDMElements ("IA,IE,IO,IU;1 ;-1;-1")
LoadDMElements ("I;0;-1;-1")
LoadDMElements ("J;1;-1;-1")
LoadDMElements ("KS;5;54;54 ")
LoadDMElements ("KH;5;5;5")
LoadDMElements ("K;5;5;5")
LoadDMElements ("L;8;8;8")
LoadDMElements ("MN;-1;66;66")
LoadDMElements ("M;6;6;6")
LoadDMElements ("NM;-1;66;66") ' not a duplicate! look carefully NM
vs MN
LoadDMElements ("N;6;6;6")
LoadDMElements ("OI,OJ,OY;0 ;1;-1")
LoadDMElements ("O;0;-1;-1")
LoadDMElements ("P,PF,PH;7;7;7 ")
LoadDMElements ("Q;5;5;5")
LoadDMElements ("RZ,RS;4;4; 4")
LoadDMElements ("R;9;9;9")
LoadDMElements ("SCHTSCH,SCHTS H,SCHTCH;2;4;4" )
LoadDMElements ("SCH;4;4;4" )
LoadDMElements ("SHTCH,SHCH,SH TSH;2;4;4")
LoadDMElements ("SHT,SCHT,SCHD ;2;43;43")
LoadDMElements ("SH;4;4;4")
LoadDMElements ("STCH,STSCH,SC ;2;4;4")
LoadDMElements ("STRZ,STRS,STS H;2;4;4")
LoadDMElements ("ST;2;43;43 ")
LoadDMElements ("SZCZ,SZCS;2;4 ;4")
LoadDMElements ("SZT,SHD,SZD,S D;2;43;43")
LoadDMElements ("SZ;4;4;4")
LoadDMElements ("S;4;4;4")
LoadDMElements ("TCH,TTCH,TTSC H;4;4;4")
LoadDMElements ("TH;3;3;3")
LoadDMElements ("TRZ,TRS;4;4;4 ")
LoadDMElements ("TSCH,TSH;4;4; 4")
LoadDMElements ("TS,TTS,TTSZ,T C;4;4;4")
LoadDMElements ("TZ,TTZ,TZS,TS Z;4;4;4")
LoadDMElements ("T;3;3;3")
LoadDMElements ("UI,UJ,UY;0 ;1;-1")
LoadDMElements ("U,UE;0;-1;-1")
LoadDMElements ("V;7;7;7")
LoadDMElements ("W;7;7;7")
LoadDMElements ("X;5;54;54" )
LoadDMElements ("Y;1;-1;-1")
LoadDMElements ("ZDZ,ZDZH,ZHDZ H;2;4;4")
LoadDMElements ("ZD,ZHD;2;43;4 3")
LoadDMElements ("ZH,ZS,ZSCH,ZS H;4;4;4")
LoadDMElements ("Z;4;4;4")

DM_TableLoaded = True ' Flag to not load again

End Sub

Private Sub LoadDMElements( strLoadString As String)
' Usage: Called by LoadDMTable Subroutine
' Purpose: Parse and Add DM_Table items for passed D-M sound
' Inputs:
' strLoadString
' Returns (Module level):
' DM_Table - Collection of items. The key of which is the
sound to encode
' and the item data is the DM_Map structure
values
Dim strItemPart As String
Dim strItemKey As String
Dim strKeyParts As String
Dim intPosition As Integer

'Separate the passed sound into its two parts
intPosition = InStr(1, strLoadString, ";")
If intPosition = 0 Then
MsgBox "invalid parameter to LoadDMElements: " & strLoadString
Exit Sub
End If

strKeyParts = Left(strLoadStr ing, intPosition - 1) & ","
strItemPart = Mid(strLoadStri ng, intPosition + 1)

'Add the Item Part (sound values) for each letter combination
Do While True
intPosition = InStr(1, strKeyParts, ",")
If intPosition = 0 Then
Exit Sub
End If

strItemKey = Left(strKeyPart s, intPosition - 1)
strKeyParts = Mid(strKeyParts , intPosition + 1)
DM_Table.Add strItemPart, strItemKey

Loop ' Do While True

End Sub

Private Sub FindDMMatch(int StartMatchPos As Integer, strToTest As
String, dmLocalDM As DM_Structure)
' Usage: Called by SoundEx Function
' Purpose: Find largest matching DM_Table entry at the indicated
position
' of the input name string
' Populate the passed DM_Structure with data from the
DM_Table
' Inputs:
' DM_Table - Module level table of letter combinations and
sound values
' intStartMatchPo s - Place in the passed string to start looking
for letter combinations
' strToTest - String passed containing the name to encode
' Returns (Modified Parameters):
' dmLocalDM - Structure to contain data from the DM_Table
Dim strMatchString As String
Dim i As Integer
Dim strItemData As String
Dim intPosition As Integer

For i = Min(7, Len(strToTest) - intStartMatchPo s + 1) To 1 Step -1
strMatchString = Mid(strToTest, intStartMatchPo s, i)

strItemData = ""
On Error Resume Next ' trap error that happens when item does
not match
strItemData = DM_Table.Item(s trMatchString)
On Error GoTo 0 ' turn off error handling

If strItemData <"" Then ' Parse into DM Map structure
dmLocalDM.DM_St ring = strMatchString
dmLocalDM.DM_Ma tchlen = i

intPosition = InStr(1, strItemData, ";")
dmLocalDM.DM_St art = Left(strItemDat a, intPosition - 1)
strItemData = Mid(strItemData , intPosition + 1)

intPosition = InStr(1, strItemData, ";")
dmLocalDM.DM_Vo wel = Left(strItemDat a, intPosition - 1)
strItemData = Mid(strItemData , intPosition + 1)

dmLocalDM.DM_Ot her = strItemData

Exit Sub ' DMMatched, so return

End If
Next i

dmLocalDM.DM_Ma tchlen = -1 ' Should not happen if table is well
formed!!
MsgBox "String not found in DM table." & vbCr & _
"String: '" & strToTest & "'" & vbCr & _
"at position: " & intStartMatchPo s, vbOKOnly, "FindDMMatc h
Error"

End Sub

Public Function Min(lNumber1 As Long, lNumber2 As Long) As Long
' Usage: Min = Min(lNumber1, lNumber2)
' Purpose: Return minimum of two input numbers
' Inputs:
' lNumber1, lNumber2 - Arbitrary numbers to compare
' Returns (Function):
' Min - Smaller of the two arbitrary numbers passed
If lNumber1 < lNumber2 Then
Min = lNumber1
Else
Min = lNumber2
End If

End Function

Public Function RemoveNotChars( strToCleanUp As String, strCharsToKeep
As String, Optional varStartPositio n) As String
' Usage: RemoveNotChars = RemoveNotChars( strToCleanUp,
strCharsToKeep)
' Purpose: Remove characters in passed string that are not in the
keep string
' Inputs:
' strToCleanUp - String to clean up
' strCharsToKeep - String identifying all characters to keep
' Returns (Function):
' RemoveNotChars - String with all 'not keepable' characters
removed
Dim strBuildString As String
Dim strTestChar As String
Dim intStartPos As Integer
Dim i As Integer

RemoveNotChars = ""

' If no string to clean up, or keep string is empty, then an empty
string is returned
If Len(strToCleanU p) = 0 Or Len(strCharsToK eep) = 0 Then
Exit Function
End If

' Initialize return string in light of starting position
If Not IsMissing(varSt artPosition) Then
If Not IsNumeric(varSt artPosition) Or varStartPositio n <= 0 Then
MsgBox "StartPosit ion must be numeric, greater than zero",
vbOKOnly, "RemoveChar s Error"
Exit Function
End If
intStartPos = varStartPositio n
strBuildString = Left(strToClean Up, varStartPositio n - 1)
Else
intStartPos = 1
strBuildString = ""
End If

For i = intStartPos To Len(strToCleanU p)
strTestChar = Mid(strToCleanU p, i, 1)
If InStr(strCharsT oKeep, strTestChar) <0 Then
strBuildString = strBuildString & strTestChar ' add onto end
End If
Next i

RemoveNotChars = strBuildString

End Function

Public Function SOUNDEXARAB(Sur name As String) As String

Dim Result As String, c As String * 1
Dim Location As Integer

Surname = UCase(Surname)
' remove Çá from the word
'************** *************** *************** ********
If Left(Surname, 2) = "Çá" Then
Surname = Mid(Surname, 3)
End If
'************** *************** *************** ********

' get the code for each character in the word
'************** *************** *************** ********
Result = ""
For Location = 1 To Len(Surname)
Result = Result & Category(Mid(Su rname, Location, 1))
Next Location
'************** *************** *************** ********

'Remove the repeated character
'************** *************** *************** ********
Location = 1
Do While Location < Len(Result)
If Mid(Result, Location, 1) = Mid(Result, Location + 1, 1)
Then
Result = Left(Result, Location) & Mid(Result, Location
+ 2)
Else
Location = Location + 1
End If
Loop
'************** *************** *************** ********

'
'************** *************** *************** ********
If Category(Left(R esult, 1)) = Mid(Result, 2, 1) Then
Result = Left(Result, 1) & Mid(Result, 3)
End If
'************** *************** *************** ********

'remove the unkown characeter
'************** *************** *************** *********
For Location = 1 To Len(Result)
If Mid(Result, Location, 1) = "/" Then
Result = Left(Result, Location - 1) & Mid(Result,
Location + 1)
End If
Next
'************** *************** *************** *********

'get the first 4 haracters
'************** *************** *************** *********
Select Case Len(Result)
Case 4
SOUNDEXARAB = Result
Case Is < 4
SOUNDEXARAB = Result & String(4 - Len(Result), "0")
Case Is 4
SOUNDEXARAB = Left(Result, 4)
End Select
'************** *************** *************** *********
End Function

Private Function Category(c) As String

Select Case True
Case c Like "[åÍ]"
Category = "1"
Case c Like "[ËÓÕ]"
Category = "2"
Case c Like "[ßÞ]"
Category = "3"
Case c Like "[ÐÒ]"
Category = "4"
Case c Like "[ÚÁ]"
Category = "5"
Case c Like "[ÖÙØ]"
Category = "6"
Case c Like "[áäÑ]"
Category = "7"
Case c Like "[ÛÎ]"
Category = "8"
Case c Like "[Ô]"
Category = "9"
Case c Like "[ÊÏÌ]"
Category = "A"
Case c Like "[í]"
Category = "B"
Case c Like "[ãÈ]"
Category = "C"
Case c Like "[Ý]"
Category = "D"

Case Else
Category = ""

End Select
End Function
Marco

Mike Gramelspacher schreef:
In article <11************ **********@75g2 000cwc.googlegr oups.com>,
vo***********@g mail.com says...
So would I better use another soundex algo then?

Marco

It probably depends which algorithm works best for the names you need to
encode. I think I read that Soundex was patented around 1918 and was
used for the U.S. Census. The microfilms I saw were indexed by Soundex,
so obviously you need to use the same algorithm to encode the name for
which you are searching as was used to encode the microfilms. Soundex
is not the best for all names. Others have been developed over the
years. I also have code for the Metaphone Algorithm.

Regarding my question for obtaining the Deitch-Mokotoff code, you
nevered answered. Where can a person download or buy an implementation
of Deitch-Mokotoff?

Mike Gramelspacher
Jul 31 '06 #25
vo***********@g mail.com wrote in
news:11******** *************@i 3g2000cwc.googl egroups.com:
you don't think this is soundex?
[code deleted]

I didn't see this CASE SELECT anywhere:

Case "B", "F", "P", "V"
code = 1
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
code = 2
Case "D", "T"
code = 3
Case "L"
code = 4
Case "M", "N"
code = 5
Case "R"
code = 6
Case Else
code = 0

If it's not encoding characters according to that rule, it's not
Soundex. It may be something *like* Soundex, but it ain't Soundex.

Oh, the first letter must be encoded as the first letter of the
result, which your function categorically is not doing (if it were
there'd be no question in the first place, since you'd have to use a
text field).

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 31 '06 #26
Lyle Fairfield <ly***********@ aim.comwrote in
news:Xn******** *************** **********@216. 221.81.119:
vo***********@g mail.com wrote in news:1154363487 .281052.20630
@i3g2000cwc.goo glegroups.com:
>you don't think this is soundex?

It's not the soundex of those who wish to believe there is a TRUE
and HOLY "SOUNDEX".
Words mean something, Lyle. If you search Google for a Soundex
function, you're going to find hundreds of examples of a particular
substitution algorithm that is referred to as Soundex. Some db
engines provide Soundex built in, and that is the traditional
Soundex.

If the OP had said "Soundex-like" instead of "Soundex" there'd be no
post from me pointing out a mis-use of terminology.
It is a soundex for those who think of soundex as term for a
number of functions, . . .
I see you are using lower case. . .
. . . including used defined soundex functions which may be
written with a particular purpose for a particular database in
mind, which may be useful in identifying names with different
spellings as similar in sound. . . .
There can certainly be language variations for Soundex, such that
the substitution table is different.
. . . I
am one of those and if anyone thinks I am trespassing on this
great and good definition of Soundex, protected and pure in the
bowels of the United States National Archives and Records
Administration, and the US Patent Office, then I am entirely
willing to use the term "phonetic algorithm" or
"DoTheWordsSoun dTheSame alogorithm" in its place.
I use Soundex, Soundex2, and a host of other substitution functions
that implement different substitution algorithms. All of them can't
be Soundex.
On the point of your original question, if you save 123456 as text
you will use six or twelve bytes depending upon your implemetation
of unicode. If you use a long integer you will use four. I'd go
for the four, on the basis of size, speed and a congenital
disposition towards pissing off pedants.
If his soundex-like function returns leading zeroes, he can't store
it as a numeric value.

And I think that encoding the first letter by the same rules as the
other letters is pretty much useless.

If I'm not mistaken, his function can return zero before the last
character, which the real Soundex cannot. As well, if I"m
remembering correctly, it can return multiple zeroes in a row.

So, there are at least 3 major differences between his function and
Soundex. I'd be perfectly content to call Soundex and Soundex2 the
same, since the only difference is the substitution table and the
length (Soundex2 encodes to 9 numbers and to a length of 6
characters). But once you change the basic assumptions about
repeated characters and choose to encode a 0 somewhere before the
end of the string, then you're really using something very
different.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 31 '06 #27
OK David,

point taken but what interests me is, is it any better than Soundex or
haven't I
been paying attention?

I'm just looking for the best thing to search me database....

Marco

David W. Fenton schreef:
Lyle Fairfield <ly***********@ aim.comwrote in
news:Xn******** *************** **********@216. 221.81.119:
vo***********@g mail.com wrote in news:1154363487 .281052.20630
@i3g2000cwc.goo glegroups.com:
you don't think this is soundex?
It's not the soundex of those who wish to believe there is a TRUE
and HOLY "SOUNDEX".

Words mean something, Lyle. If you search Google for a Soundex
function, you're going to find hundreds of examples of a particular
substitution algorithm that is referred to as Soundex. Some db
engines provide Soundex built in, and that is the traditional
Soundex.

If the OP had said "Soundex-like" instead of "Soundex" there'd be no
post from me pointing out a mis-use of terminology.
It is a soundex for those who think of soundex as term for a
number of functions, . . .

I see you are using lower case. . .
. . . including used defined soundex functions which may be
written with a particular purpose for a particular database in
mind, which may be useful in identifying names with different
spellings as similar in sound. . . .

There can certainly be language variations for Soundex, such that
the substitution table is different.
. . . I
am one of those and if anyone thinks I am trespassing on this
great and good definition of Soundex, protected and pure in the
bowels of the United States National Archives and Records
Administration, and the US Patent Office, then I am entirely
willing to use the term "phonetic algorithm" or
"DoTheWordsSoun dTheSame alogorithm" in its place.

I use Soundex, Soundex2, and a host of other substitution functions
that implement different substitution algorithms. All of them can't
be Soundex.
On the point of your original question, if you save 123456 as text
you will use six or twelve bytes depending upon your implemetation
of unicode. If you use a long integer you will use four. I'd go
for the four, on the basis of size, speed and a congenital
disposition towards pissing off pedants.

If his soundex-like function returns leading zeroes, he can't store
it as a numeric value.

And I think that encoding the first letter by the same rules as the
other letters is pretty much useless.

If I'm not mistaken, his function can return zero before the last
character, which the real Soundex cannot. As well, if I"m
remembering correctly, it can return multiple zeroes in a row.

So, there are at least 3 major differences between his function and
Soundex. I'd be perfectly content to call Soundex and Soundex2 the
same, since the only difference is the substitution table and the
length (Soundex2 encodes to 9 numbers and to a length of 6
characters). But once you change the basic assumptions about
repeated characters and choose to encode a 0 somewhere before the
end of the string, then you're really using something very
different.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Aug 1 '06 #28
In article <11************ **********@s13g 2000cwa.googleg roups.com>,
vo***********@g mail.com says...
You can have it no problem,

if you could just send me the Metaphome Algoritm in return, thanks:
OK, here it is. I have seen places on the Internet where the advantages
and disadvantages of various Soundex-like algorithms are discussed. It
just depends on the types of names that need to be encoded.

Option Compare Database
Option Explicit

'Metaphone algorithm translated from C to Delphi by Tom White <w...
@intellex.com>
'Translated to Visual Basic by Dave White 9/10/01
'
'v1.1 fixes a few bugs
'
' Checks length of string before removing trailing S (>1)
' PH used to translate to H, now translates to F
'
'Original C version by Michael Kuhn <rhlab!mk...@uu net.uu.net>
'
'
Function InStrC(ByVal SearchIn As String, _
ByVal SoughtCharacter s As String) As Integer
'--- Returns the position of the first character in SearchIn that is
contained
'--- in the string SoughtCharacter s. Returns 0 if none found.
Dim i As Integer

On Error Resume Next
SoughtCharacter s = UCase(SoughtCha racters)
SearchIn = UCase(SearchIn)
For i = 1 To Len(SearchIn)
If InStr(SoughtCha racters, Mid(SearchIn, i, 1)) 0 Then
InStrC = i: Exit Function
End If
Next i
InStrC = 0
End Function

Function Metaphone(ByVal A As Variant) As String
Dim b, c, d, e As String
Dim inp, outp As String
Dim vowels, frontv, varson, dbl As String
Dim excppair, nxtltr As String
Dim T, ii, jj, lng, lastchr As Integer
Dim curltr, prevltr, nextltr, nextltr2, nextltr3 As String
Dim vowelafter, vowelbefore, frontvafter, silent, hard As Integer
Dim alphachr As String

On Error Resume Next
If IsNull(A) Then A = ""
A = CStr(A)
inp = UCase(A)
vowels = "AEIOU"
frontv = "EIY"
varson = "CSPTG"
dbl = "." 'Lets us allow certain letters to be doubled
excppair = "AGKPW"
nxtltr = "ENNNR"
alphachr = "ABCDEFGHIJKLMN OPQRSTUVWXYZ"

'--Remove non-alpha characters
outp = ""
For T = 1 To Len(inp)
If InStr(alphachr, Mid(inp, T, 1)) 0 Then outp = outp + Mid(inp,
T, 1)
Next T

inp = outp: outp = ""

If Len(inp) = 0 Then Metaphone = "": Exit Function

'--Check rules at beginning of word
If Len(inp) 1 Then
b = Mid(inp, 1, 1)
c = Mid(inp, 2, 1)
ii = InStr(excppair, b)
jj = InStr(nxtltr, c)
If ii = jj And ii 0 Then
inp = Mid(inp, 2, Len(inp) - 1)
End If
End If

If Mid(inp, 1, 1) = "X" Then Mid(inp, 1, 1) = "S"

If Mid(inp, 1, 2) = "WH" Then inp = "W" + Mid(inp, 3)

If right(inp, 1) = "S" Then inp = left(inp, Len(inp) - 1)

ii = 0
Do
ii = ii + 1
'--Main Loop!
silent = False
hard = False
curltr = Mid(inp, ii, 1)
vowelbefore = False
prevltr = " "
If ii 1 Then
prevltr = Mid(inp, ii - 1, 1)
If InStrC(prevltr, vowels) 0 Then vowelbefore = True
End If

If ((ii = 1) And (InStrC(curltr, vowels) 0)) Then
outp = outp + curltr
GoTo ContinueMainLoo p
End If

vowelafter = False
frontvafter = False
nextltr = " "
If ii < Len(inp) Then
nextltr = Mid(inp, ii + 1, 1)
If InStrC(nextltr, vowels) 0 Then vowelafter = True
If InStrC(nextltr, frontv) 0 Then frontvafter = True
End If

'--Skip double letters EXCEPT ones in variable double
If InStrC(curltr, dbl) = 0 Then
If curltr = nextltr Then GoTo ContinueMainLoo p
End If

nextltr2 = " "
If Len(inp) - ii 1 Then
nextltr2 = Mid(inp, ii + 2, 1)
End If

nextltr3 = " "
If (Len(inp) - ii) 2 Then
nextltr3 = Mid(inp, ii + 3, 1)
End If

Select Case curltr
Case "B":
silent = False
If (ii = Len(inp)) And (prevltr = "M") Then silent = True
If Not (silent) Then outp = outp + curltr
Case "C":
If Not ((ii 2) And (prevltr = "S") And frontvafter) Then
If ((ii 1) And (nextltr = "I") And (nextltr2 = "A"))
Then
outp = outp + "X"
Else
If frontvafter Then
outp = outp + "S"
Else
If ((ii 2) And (prevltr = "S") And (nextltr =
"H")) Then
outp = outp + "K"
Else
If nextltr = "H" Then
If ((ii = 1) And (InStrC(nextltr 2, vowels) = 0))
Then
outp = outp + "K"
Else
outp = outp + "X"
End If
Else
If prevltr = "C" Then
outp = outp + "C"
Else
outp = outp + "K"
End If
End If
End If
End If
End If
End If
Case "D":
If ((nextltr = "G") And (InStrC(nextltr 2, frontv) 0))
Then
outp = outp + "J"
Else
outp = outp + "T"
End If
Case "G":
silent = False
If ((ii < Len(inp)) And (nextltr = "H") And _
(InStrC(nextltr 2, vowels) = 0)) Then
silent = True
End If
If ((ii = Len(inp) - 4) And (nextltr = "N") And _
(nextltr2 = "E") And (nextltr3 = "D")) Then
silent = True
ElseIf ((ii = Len(inp) - 2) And (nextltr = "N")) Then
silent = True
End If
If (prevltr = "D") And frontvafter Then silent = True
If prevltr = "G" Then
hard = True
End If

If Not (silent) Then
If frontvafter And (Not (hard)) Then
outp = outp + "J"
Else
outp = outp + "K"
End If
End If

Case "H":
silent = False
If InStrC(prevltr, varson) 0 Then silent = True
If vowelbefore And (Not (vowelafter)) Then silent = True
If Not silent Then outp = outp + curltr

Case "F", "J", "L", "M", "N", "R": outp = outp + curltr

Case "K": If prevltr <"C" Then outp = outp + curltr

Case "P": If nextltr = "H" Then outp = outp + "F" Else outp = outp +
"P"

Case "Q": outp = outp + "K"

Case "S":
If ((ii 2) And (nextltr = "I") And _
((nextltr2 = "O") Or nextltr2 = "A")) Then
outp = outp + "X"
End If
If (nextltr = "H") Then
outp = outp + "X"
Else
outp = outp + "S"
End If

Case "T":
If ((ii 0) And (nextltr = "I") And _
((nextltr2 = "O") Or (nextltr2 = "A"))) Then
outp = outp + "X"
End If
If nextltr = "H" Then
If ((ii 1) Or (InStrC(nextltr 2, vowels) 0)) Then
outp = outp + "0"
Else
outp = outp + "T"
End If
ElseIf Not ((ii < Len(inp) - 3) And _
(nextltr = "C") And (nextltr2 = "H")) Then
outp = outp + "T"
End If

Case "V": outp = outp + "F"

Case "W", "Y"
If (ii < Len(inp) - 1) And vowelafter Then outp = outp + curltr

Case "X": outp = outp + "KS"

Case "Z": outp = outp + "S"

End Select

ContinueMainLoo p:
Loop Until (ii Len(inp))

Metaphone = outp

End Function

Aug 1 '06 #29
vo***********@g mail.com wrote in
news:11******** *************@b 28g2000cwb.goog legroups.com:
point taken but what interests me is, is it any better than
Soundex or haven't I
been paying attention?
Well, it depends on your purposes. I would find a traditional
Soundex that also encodes the first letter to be completely useless,
as Soundex itself already casts the net too wide.

I find Soundex2 much more useful in finding duplicates.

The point is you want as few false positives as possible, and
encoding the first letter by the same rules as the others will lead
to the possibility of more false positives.

On the other hand, an algorithm that does something like replace
initial PH with F would make some sense to me.

So, it all depends on the specifics of your particular algorithm,
the exact implementation and the data you're evaluating. Don't
underplay the last of those -- a relatively minor proportion of data
from a language with different pronunciation rules can completely
negate the usefulness of an algorithm that works just fine with a
more homogeneous data set.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Aug 1 '06 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
3342
by: Ricky Romaya | last post by:
Hi, I'm curious about soundex. All I know that it's a way for making spelling- error-tolerant word matching. What I want to know is whether the soundex algorithm are made exclusively for english language, or can it be used for any arbitrary language with satisfactory performance (by 'satisfactory performance' I meant that it can detect at least 80% spelling-errors). What about PHP soundex support? TIA
0
1311
by: Mickey Mouse | last post by:
My MySql experience is limited to simple updates and selects. I am creating a genealogical site based on Old Church birth and christening records. Currently I have over 70,000 entries in the database and I want to link the parents together in order to see all their children. Currently I have the following fields FatherGiven, FatherLastName, MotherGiven, MotherLastName and I want to create a field called FRN which if the Soundex of each of...
3
4530
by: arthur benedetti white | last post by:
has anybody already developed a server side soundex function?
3
6103
by: Vinay Jain | last post by:
Hi I want to write soundex query for example: select * from student where name soundex 'vinay'; In psql it gives error at soundex. Regards Vinay -- Vinay Jain Dissertation Project Trainee DAKE Division
2
4797
LacrosseB0ss
by: LacrosseB0ss | last post by:
Just wondering if anyone out in query-land has used "Soundex" or "Difference" before to find field values that "sound the same". Similar to LIKE but very different. I was shocked when I heard about it, so I've been playing around. Access doesn't seem to like it. Am I just incompetant? Thanks - LB (me using Northwind to goof around with) qrySoundexPlaying: SELECT Customers.* FROM Customers
0
8674
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8603
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9157
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8895
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8861
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6518
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4369
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4619
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3046
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.