469,332 Members | 7,055 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,332 developers. It's quick & easy.

Use of Soundex fields in access

Hi All,

I have database with names on which I want to use the soundex option.
So I have created two seperate fields for the Lastname and Firstname in
which
I save the Soundex version of a new name I save in the database.
I have the soundex code with the 6 numeric option. So I save for
example in the field
LastnameSE = 600192 and in the FirstnameSE = 545910.

Should these Soundex fields be numeric or text fields?

Marco

Jul 30 '06 #1
32 8104
numeric
On 30 Jul 2006 03:30:55 -0700, vo***********@gmail.com wrote:
>Hi All,

I have database with names on which I want to use the soundex option.
So I have created two seperate fields for the Lastname and Firstname in
which
I save the Soundex version of a new name I save in the database.
I have the soundex code with the 6 numeric option. So I save for
example in the field
LastnameSE = 600192 and in the FirstnameSE = 545910.

Should these Soundex fields be numeric or text fields?

Marco
--
ja**************@telusTELUS.net
remove uppercase letters for true email
http://www.geocities.com/jacksonmacd/ for info on MS Access security
Jul 30 '06 #2
Does it matter it is a text field?
Or do you have to make calculations on the fields?

Marco

jacksonmacd schreef:
numeric
On 30 Jul 2006 03:30:55 -0700, vo***********@gmail.com wrote:
Hi All,

I have database with names on which I want to use the soundex option.
So I have created two seperate fields for the Lastname and Firstname in
which
I save the Soundex version of a new name I save in the database.
I have the soundex code with the 6 numeric option. So I save for
example in the field
LastnameSE = 600192 and in the FirstnameSE = 545910.

Should these Soundex fields be numeric or text fields?

Marco

--
ja**************@telusTELUS.net
remove uppercase letters for true email
http://www.geocities.com/jacksonmacd/ for info on MS Access security
Jul 30 '06 #3
I don't think you would do any calculations -- just comparisons. So if
the Soundex function that you are using returns a numeric value, then
you should store its results in a numeric field. If the soundex
function returns a text value, then store it in a text field. Access
would probably adapt automatically if you used the wrong type, but you
would just be asking it to do additional, unnecessary work.
On 30 Jul 2006 07:49:32 -0700, vo***********@gmail.com wrote:
>Does it matter it is a text field?
Or do you have to make calculations on the fields?

Marco

jacksonmacd schreef:
>numeric
On 30 Jul 2006 03:30:55 -0700, vo***********@gmail.com wrote:
>Hi All,

I have database with names on which I want to use the soundex option.
So I have created two seperate fields for the Lastname and Firstname in
which
I save the Soundex version of a new name I save in the database.
I have the soundex code with the 6 numeric option. So I save for
example in the field
LastnameSE = 600192 and in the FirstnameSE = 545910.

Should these Soundex fields be numeric or text fields?

Marco

--
ja**************@telusTELUS.net
remove uppercase letters for true email
http://www.geocities.com/jacksonmacd/ for info on MS Access security
--
ja**************@telusTELUS.net
remove uppercase letters for true email
http://www.geocities.com/jacksonmacd/ for info on MS Access security
Jul 30 '06 #4
vo***********@gmail.com wrote in
news:11**********************@b28g2000cwb.googlegr oups.com:
I have database with names on which I want to use the soundex
option. So I have created two seperate fields for the Lastname and
Firstname in which
I save the Soundex version of a new name I save in the database.
I have the soundex code with the 6 numeric option. So I save for
example in the field
LastnameSE = 600192 and in the FirstnameSE = 545910.
If it returns a number, it's not actually Soundex.
Should these Soundex fields be numeric or text fields?
I can't see that it makes much difference. The only reason to go
numeric is if there can be leading zeros.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 30 '06 #5
jacksonmacd <ja***************@telus.netwrote in
news:ok********************************@4ax.com:
numeric
Never.

Soundex is an alpha-numeric code Therefore it must be stored in
a text field, if stored at all.

From http://vbnet.mvps.org/ and search for soundex.

Every Soundex code consists of a letter and three numbers, such
as B-536 (which also happens to represent names like 'Bender').

Secondly, unless you add, substract, multiply or divide a
number, it should be stored as a text string.

Thirdly, attributes such as soundex which can be readily
calculated should not be stored at all, but returned in a
function from the stored lastname or fisrtname when needed.
>
On 30 Jul 2006 03:30:55 -0700, vo***********@gmail.com wrote:
>>Hi All,

I have database with names on which I want to use the soundex
option. So I have created two seperate fields for the Lastname
and Firstname in which
I save the Soundex version of a new name I save in the
database. I have the soundex code with the 6 numeric option.
So I save for example in the field
LastnameSE = 600192 and in the FirstnameSE = 545910.

Should these Soundex fields be numeric or text fields?

Marco

--
ja**************@telusTELUS.net
remove uppercase letters for true email
http://www.geocities.com/jacksonmacd/ for info on MS Access
security


--
Bob Quintal

PA is y I've altered my email address.

--
Posted via a free Usenet account from http://www.teranews.com

Jul 30 '06 #6
Soundex is an alpha-numeric code Therefore it must be stored in
a text field, if stored at all.
Is this a good thing?

Am I correct that most common adaptations of the algorithm will produce

Fairfield -F614
Pharefield -P614

while if we used all numerics we would have
1614
for both?

I suppose that in the 19th century the F and P were very helpful for
manual or paper searches. But today, many Access users and developers
have progressed into the latter stages of the twentieth century.
Perhaps that alpha differentiation is no longer so helpful.

Of course, there may be some great reason for the initial alpha
character. I shall be happy to hear it.

Jul 30 '06 #7
"Lyle Fairfield" <ly***********@aim.comwrote in
news:11**********************@p79g2000cwp.googlegr oups.com:
>Soundex is an alpha-numeric code Therefore it must be stored
in a text field, if stored at all.

Is this a good thing?
There are much better phonetic coding systems that have been
developed. But in any case something calculable should not be
stored.
>
Am I correct that most common adaptations of the algorithm
will produce

Fairfield -F614
Pharefield -P614
Yes. That is the definition of the Soundex code. But they have
their own names. You might argue that Soundex, like Kleenex, has
become a generic descriptor, but we're getting off the topic.
while if we used all numerics we would have
1614
for both?
Or adaptations that convert the Ph to an F, yielding F614 for
both variants. I'm sure you've seen Mark Twain's plan for the
improvement of spelling.
http://www.netfunny.com/rhf/jokes/87/2094.10.html

I suppose that in the 19th century the F and P were very
helpful for manual or paper searches. But today, many Access
users and developers have progressed into the latter stages of
the twentieth century. Perhaps that alpha differentiation is
no longer so helpful.

Of course, there may be some great reason for the initial
alpha character. I shall be happy to hear it.
Yes, so would I. However, it doesn't change my opinion that text
is the correct field type.

--
Bob Quintal

PA is y I've altered my email address.

--
Posted via a free Usenet account from http://www.teranews.com

Jul 30 '06 #8
My soundex only returns numbers. I use the Daitch-Mokotoff algorithm, 6
character result

Marco

David W. Fenton schreef:
vo***********@gmail.com wrote in
news:11**********************@b28g2000cwb.googlegr oups.com:
I have database with names on which I want to use the soundex
option. So I have created two seperate fields for the Lastname and
Firstname in which
I save the Soundex version of a new name I save in the database.
I have the soundex code with the 6 numeric option. So I save for
example in the field
LastnameSE = 600192 and in the FirstnameSE = 545910.

If it returns a number, it's not actually Soundex.
Should these Soundex fields be numeric or text fields?

I can't see that it makes much difference. The only reason to go
numeric is if there can be leading zeros.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 30 '06 #9
In article <11*********************@m73g2000cwd.googlegroups. com>,
vo***********@gmail.com says...
My soundex only returns numbers. I use the Daitch-Mokotoff algorithm, 6
character result

Marco
Marco,

Where can a person obtain a Visual Basic implementation of the Daitch-
Mokotoff algorithm? Most of what I do is genealogy related.

Mike Gramelspacher
Jul 30 '06 #10
vo***********@gmail.com wrote in
news:11*********************@m73g2000cwd.googlegro ups.com:
My soundex only returns numbers. I use the Daitch-Mokotoff
algorithm, 6 character result
That's not Soundex, which has a fixed definition, along with
Soundex2.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 30 '06 #11
Bob Quintal <rq******@sPAmpatico.cawrote in
news:Xn**********************@66.150.105.47:
Thirdly, attributes such as soundex which can be readily
calculated should not be stored at all, but returned in a
function from the stored lastname or fisrtname when needed.
Not if you're going to use it for de-duping.

Consider:

A table of people with 350K records.

Before you add a new record, you want to test the name the user
wants to enter against the Soundex and Soundex2 values of the
existing data. Would you then do 4 X 350K calculations to compare
to, or would you calculate the Soundex and Soundex2 values on the
new name and then use a WHERE clause on indexed stored Soundex and
Soundex2 fields to find the matches?

I can't see how anyone who has ever used Soundex/Soundex2 for this
purpose could ever advocate anything *but* storing the data.
Anything else will be far, far too slow to be usable in a real-world
application.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 30 '06 #12
"Lyle Fairfield" <ly***********@aim.comwrote in
news:11**********************@p79g2000cwp.googlegr oups.com:
>Soundex is an alpha-numeric code Therefore it must be stored in
a text field, if stored at all.

Is this a good thing?

Am I correct that most common adaptations of the algorithm will
produce

Fairfield -F614
Pharefield -P614

while if we used all numerics we would have
1614
for both?

I suppose that in the 19th century the F and P were very helpful
for manual or paper searches. But today, many Access users and
developers have progressed into the latter stages of the twentieth
century. Perhaps that alpha differentiation is no longer so
helpful.

Of course, there may be some great reason for the initial alpha
character. I shall be happy to hear it.
I believe the assumption is that the first letter is the least
likely to be mis-typed or have variations.

I already find Soundex implemented with the first letter to be too
loose a match, and use Soundex2 instead. Well, not in place of, but
as an adjunct to, one which provides much more useful matches.

If you encode the first letter as a number you'll end up with almost
useless data, as you'll have way too many matches.

Lyle, have you ever used Soundex? If not, then that might explain
why you don't seem to understand how it works.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 30 '06 #13
Bob Quintal <rq******@sPAmpatico.cawrote in
news:Xn**********************@66.150.105.47:
But in any case something calculable should not be
stored.
I can't see how anyone who has ever seriously used Soundex on data
sets of any size whatsoever could advocate *not* storing the Soundex
values. What the hell are you using them for if you're not storing
them?

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 30 '06 #14
"David W. Fenton" <XX*******@dfenton.com.invalidwrote in
news:Xn**********************************@127.0.0. 1:
Lyle, have you ever used Soundex? If not, then that might explain
why you don't seem to understand how it works.
I worked in FoxPro for many years before Access existed. FoxPro has a
resident Soundex function. So I suppose I have used Soundex for perhaps
eighteen years.

When I needed Soundex in Access I wrote my own Soundex function:
http://groups.google.ca/group/comp.d...a1693c6a1a48a5
As noted, if we give aChar module wide scope it is quite fast for repeated
calls.

Perhaps, I do undertsand how it works. I just don't happen to agree that
Soundex is something permanent and agreed upon and static. It may a useful
concept in some situations, but a clever programmer/developer will mould it
the needs of the application.
Jul 30 '06 #15
"David W. Fenton" <XX*******@dfenton.com.invalidwrote in
news:Xn**********************************@127.0.0. 1:
Bob Quintal <rq******@sPAmpatico.cawrote in
news:Xn**********************@66.150.105.47:
>Thirdly, attributes such as soundex which can be readily
calculated should not be stored at all, but returned in a
function from the stored lastname or fisrtname when needed.

Not if you're going to use it for de-duping.

Consider:

A table of people with 350K records.

Before you add a new record, you want to test the name the
user wants to enter against the Soundex and Soundex2 values of
the existing data. Would you then do 4 X 350K calculations to
compare to, or would you calculate the Soundex and Soundex2
values on the new name and then use a WHERE clause on indexed
stored Soundex and Soundex2 fields to find the matches?

I can't see how anyone who has ever used Soundex/Soundex2 for
this purpose could ever advocate anything *but* storing the
data. Anything else will be far, far too slow to be usable in
a real-world application.
Ok, I see your point. Having worked many years ago with soundex
in a FoxBase+ application, where it is possible end efficient to
create an index on a calcuklated value, I keep forgetting that
SQL doesn't have that functionality. and searching through 350K
records without an index does not make sense.

--
Bob Quintal

PA is y I've altered my email address.

--
Posted via a free Usenet account from http://www.teranews.com

Jul 30 '06 #16
Bob Quintal <rq******@sPAmpatico.cawrote in
news:Xn**********************@66.150.105.47:
Ok, I see your point. Having worked many years ago with soundex
in a FoxBase+ application, where it is possible end efficient to
create an index on a calcuklated value, I keep forgetting that
SQL doesn't have that functionality. and searching through 350K
records without an index does not make sense.
It was grand, eh? and ... that's not even taking into account conditional
indexes, WHERE LastName = "J" or indexes on the value of related fields in
other Tables ... no, REALLY, others who are reading this, you could (maybe
can?)!

--
Lyle Fairfield
Jul 31 '06 #17
Lyle Fairfield <ly***********@aim.comwrote in
news:Xn*********************************@216.221.8 1.119:
Bob Quintal <rq******@sPAmpatico.cawrote in
news:Xn**********************@66.150.105.47:
>Ok, I see your point. Having worked many years ago with
soundex in a FoxBase+ application, where it is possible end
efficient to create an index on a calcuklated value, I keep
forgetting that SQL doesn't have that functionality. and
searching through 350K records without an index does not make
sense.

It was grand, eh? and ... that's not even taking into account
conditional indexes, WHERE LastName = "J" or indexes on the
value of related fields in other Tables ... no, REALLY, others
who are reading this, you could (maybe can?)!
Things don't always improve.

--
Bob Quintal

PA is y I've altered my email address.

--
Posted via a free Usenet account from http://www.teranews.com

Jul 31 '06 #18
you don't think this is soundex?

Public Function SOUNDEX(strToEncode As String) As String
' Usage: SoundEx = SoundEx(strToEncode)
' Purpose: Return six character D-M encoding of input name
' Inputs:
' strToEncode - An aphabetic string, usually representing a
person/place name
' Returns (Function):
' SoundEx - Six digit D-M code string
Dim intEncodeStrLen As Integer
Dim intDMArray() As Integer
Dim strToEncodelen As Integer
Dim I As Integer
Dim strEncodedString As String
Dim strLastCode As String
Dim DM_Map As DM_Structure

Call LoadDMTable ' Load the DM_Table - only loaded once

' Clean the incoming name. Upper case, nothing but letters
strToEncode = RemoveNotChars(Trim(UCase(strToEncode)),
"ABCDEFGHIJKLMNOPQRSTUVWXYZ")
intEncodeStrLen = Len(strToEncode)

strEncodedString = ""
strLastCode = ""
'Potentially search the whole string for meaningful sounds. Stop
after 6 are found
For I = 1 To intEncodeStrLen
If Len(strEncodedString) >= 6 Then
SOUNDEX = Left(strEncodedString, 6)
Exit Function
End If

'Lookup in the DM_Table
Call FindDMMatch(I, strToEncode, DM_Map)
If DM_Map.DM_Matchlen = -1 Then
'Should not happen if table is complete and the input is clean
MsgBox "No Match found via DM lookup", vbOKOnly, "SoundEx
Error"
SOUNDEX = "000000"
Exit Function
End If

'Depending upon where the found sound is, encode from the DM_Map
If I = 1 Then
'Start of string, use the start value if valid
Call AddToEncodedString(strEncodedString, strLastCode,
DM_Map.DM_Start)
ElseIf DM_Map.DM_Matchlen = 2 And DM_Map.DM_Other = "-1" _
And InStr("AEIOUJY", Mid(strToEncode, I +
DM_Map.DM_Matchlen, 1)) <0 Then
'A vowel pair preceeding another vowel, use the vowel value if
valid
Call AddToEncodedString(strEncodedString, strLastCode,
DM_Map.DM_Vowel)
ElseIf DM_Map.DM_String = "H" _
And InStr("AEIOUJY", Mid(strToEncode, I +
DM_Map.DM_Matchlen, 1)) <0 Then
'An H preceeding another vowel, use the vowel value if valid
Call AddToEncodedString(strEncodedString, strLastCode,
DM_Map.DM_Vowel)
Else
'Use all other case value
Call AddToEncodedString(strEncodedString, strLastCode,
DM_Map.DM_Other)
End If
I = I + DM_Map.DM_Matchlen - 1 ' adjust indexed based upon
matched string length

Next I

SOUNDEX = Left(strEncodedString & "000000", 6) ' Ensure the string
is at least 6 long

End Function

Marco
David W. Fenton schreef:
vo***********@gmail.com wrote in
news:11*********************@m73g2000cwd.googlegro ups.com:
My soundex only returns numbers. I use the Daitch-Mokotoff
algorithm, 6 character result

That's not Soundex, which has a fixed definition, along with
Soundex2.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 31 '06 #19
vo***********@gmail.com wrote in news:1154363487.281052.20630
@i3g2000cwc.googlegroups.com:
you don't think this is soundex?
It's not the soundex of those who wish to believe there is a TRUE and HOLY
"SOUNDEX".

It is a soundex for those who think of soundex as term for a number of
functions, including used defined soundex functions which may be written
with a particular purpose for a particular database in mind, which may be
useful in identifying names with different spellings as similar in sound. I
am one of those and if anyone thinks I am trespassing on this great and
good definition of Soundex, protected and pure in the bowels of the United
States National Archives and Records Administration, and the US Patent
Office, then I am entirely willing to use the term "phonetic algorithm" or
"DoTheWordsSoundTheSame alogorithm" in its place.

On the point of your original question, if you save 123456 as text you will
use six or twelve bytes depending upon your implemetation of unicode. If
you use a long integer you will use four. I'd go for the four, on the basis
of size, speed and a congenital disposition towards pissing off pedants.
\--
Lyle Fairfield
Jul 31 '06 #20
OK,

Lyle thanks for the response.
Any comments on my other post?

http://groups.google.nl/group/comp.d...02848020326d6a

Lyle Fairfield schreef:
vo***********@gmail.com wrote in news:1154363487.281052.20630
@i3g2000cwc.googlegroups.com:
you don't think this is soundex?

It's not the soundex of those who wish to believe there is a TRUE and HOLY
"SOUNDEX".

It is a soundex for those who think of soundex as term for a number of
functions, including used defined soundex functions which may be written
with a particular purpose for a particular database in mind, which may be
useful in identifying names with different spellings as similar in sound. I
am one of those and if anyone thinks I am trespassing on this great and
good definition of Soundex, protected and pure in the bowels of the United
States National Archives and Records Administration, and the US Patent
Office, then I am entirely willing to use the term "phonetic algorithm" or
"DoTheWordsSoundTheSame alogorithm" in its place.

On the point of your original question, if you save 123456 as text you will
use six or twelve bytes depending upon your implemetation of unicode. If
you use a long integer you will use four. I'd go for the four, on the basis
of size, speed and a congenital disposition towards pissing off pedants.
\--
Lyle Fairfield
Jul 31 '06 #21
In article <11*********************@b28g2000cwb.googlegroups. com>,
vo***********@gmail.com says...
OK,

Lyle thanks for the response.
Any comments on my other post?
Too bad it will not compile for me on Access 2003.
This is where it chokes: Dim DM_Map As DM_Structure
It does not know what DM_Structure is. Where is it defined?

I visited the National Archives about 1988 to view passenger lists.
They handed everyone a brochure explaining Soundex. All the films
were in Soundex order. So this is not the classical Soundex, but falls
into the class of soundex-like pattern matching algorithms. Just my
characterization, anyway.

Mike Gramelspacher
Jul 31 '06 #22
So would I better use another soundex algo then?

Marco

Mike Gramelspacher schreef:
In article <11*********************@b28g2000cwb.googlegroups. com>,
vo***********@gmail.com says...
OK,

Lyle thanks for the response.
Any comments on my other post?

Too bad it will not compile for me on Access 2003.
This is where it chokes: Dim DM_Map As DM_Structure
It does not know what DM_Structure is. Where is it defined?

I visited the National Archives about 1988 to view passenger lists.
They handed everyone a brochure explaining Soundex. All the films
were in Soundex order. So this is not the classical Soundex, but falls
into the class of soundex-like pattern matching algorithms. Just my
characterization, anyway.

Mike Gramelspacher
Jul 31 '06 #23
In article <11**********************@75g2000cwc.googlegroups. com>,
vo***********@gmail.com says...
So would I better use another soundex algo then?

Marco
It probably depends which algorithm works best for the names you need to
encode. I think I read that Soundex was patented around 1918 and was
used for the U.S. Census. The microfilms I saw were indexed by Soundex,
so obviously you need to use the same algorithm to encode the name for
which you are searching as was used to encode the microfilms. Soundex
is not the best for all names. Others have been developed over the
years. I also have code for the Metaphone Algorithm.

Regarding my question for obtaining the Deitch-Mokotoff code, you
nevered answered. Where can a person download or buy an implementation
of Deitch-Mokotoff?

Mike Gramelspacher
Jul 31 '06 #24
You can have it no problem,

if you could just send me the Metaphome Algoritm in return, thanks:

'**************************************
' Name: SoundEx - Daitch-Mokotoff algorithm, 6 character result
' Description:
' Encodes an alphabetic name to a six character Daitch-Mokotoff code
' following the Daitch-Mokotoff (D-M) rules available at the sites
' listed in the source code. The D-M algorithm resolves some
' deficiencies that occur in the older Miracode/Soundex system (also
' known as the "Russell"/NARA system - used by the US Census Bureau).
' The benefits include: 1) Six meaningful letter sounds (versus four
' so that Peters is different from Peterson). 2) The initial letter
' is also sound encoded. 3) More sound variations (10 basic codes
' versus seven and double code sounds). 4) Improves sound matching
' for Jewish, Slavic, and Germanic names.
'
' By: Greg Julius, Copyright 2000, Gr*********@cyconsult.com
' The author would appreciate getting bug reports/fixes and
' any improvements made to this code.
'
' Permission to use and modify is given, please give the author
' credit and pass along the modifications.
'
' Usage: strSoundExResult = SoundEx(strStringToEncode)
' Inputs: An aphabetic string, usually representing a person/place
name
' Returns: Six digit D-M code string
'
' Requires: Code runs in Visual Basic Module
'
' Side Effects: None known.
'
' Searching on the internet finds these sites. Variously they explain
' some history on the D-M Sound Encoding, How D-M coding works, show
the
' D-M sound table, and provide some examples.
' http://www.everton.com/oe3-10/soundex.htm
' http://www.jewishgen.org/infofiles/soundex.txt
' http://www.avotaynu.com/soundex.html
' http://www.gcis.net/cjhs/aguideto.htm
'
' The following web-based D-M calculators to test SoundEx results,
' Some errors in the samples were found on two of the above sites.
' http://jgsr.net/database/DM6.cgi (only D-M soundex)
' http://www.jewishgen.org/jos (both D-M and NARA/Russell)
'
' Reference:
' Steuart, Bradley W. The Soundex Daitch-Mokotoff Reference Guide. 2 v.
' Precision Indexing, 1994. Provides Soundex codes to over 125,000
surnames.

Option Explicit

Private Type DM_Structure
DM_String As String
DM_Matchlen As Integer
DM_Start As String
DM_Vowel As String
DM_Other As String
End Type

Private DM_Table As Collection
Private DM_TableLoaded As Boolean ' false until initialized.
'

Public Function SOUNDEX(strToEncode As String) As String
' Usage: SoundEx = SoundEx(strToEncode)
' Purpose: Return six character D-M encoding of input name
' Inputs:
' strToEncode - An aphabetic string, usually representing a
person/place name
' Returns (Function):
' SoundEx - Six digit D-M code string
Dim intEncodeStrLen As Integer
Dim intDMArray() As Integer
Dim strToEncodelen As Integer
Dim i As Integer
Dim strEncodedString As String
Dim strLastCode As String
Dim DM_Map As DM_Structure

Call LoadDMTable ' Load the DM_Table - only loaded once

' Clean the incoming name. Upper case, nothing but letters
strToEncode = RemoveNotChars(Trim(UCase(strToEncode)),
"ABCDEFGHIJKLMNOPQRSTUVWXYZ")
intEncodeStrLen = Len(strToEncode)

strEncodedString = ""
strLastCode = ""
'Potentially search the whole string for meaningful sounds. Stop
after 6 are found
For i = 1 To intEncodeStrLen
If Len(strEncodedString) >= 6 Then
SOUNDEX = Left(strEncodedString, 6)
Exit Function
End If

'Lookup in the DM_Table
Call FindDMMatch(i, strToEncode, DM_Map)
If DM_Map.DM_Matchlen = -1 Then
'Should not happen if table is complete and the input is clean
MsgBox "No Match found via DM lookup", vbOKOnly, "SoundEx
Error"
SOUNDEX = "000000"
Exit Function
End If

'Depending upon where the found sound is, encode from the DM_Map
If i = 1 Then
'Start of string, use the start value if valid
Call AddToEncodedString(strEncodedString, strLastCode,
DM_Map.DM_Start)
ElseIf DM_Map.DM_Matchlen = 2 And DM_Map.DM_Other = "-1" _
And InStr("AEIOUJY", Mid(strToEncode, i +
DM_Map.DM_Matchlen, 1)) <0 Then
'A vowel pair preceeding another vowel, use the vowel value if
valid
Call AddToEncodedString(strEncodedString, strLastCode,
DM_Map.DM_Vowel)
ElseIf DM_Map.DM_String = "H" _
And InStr("AEIOUJY", Mid(strToEncode, i +
DM_Map.DM_Matchlen, 1)) <0 Then
'An H preceeding another vowel, use the vowel value if valid
Call AddToEncodedString(strEncodedString, strLastCode,
DM_Map.DM_Vowel)
Else
'Use all other case value
Call AddToEncodedString(strEncodedString, strLastCode,
DM_Map.DM_Other)
End If
i = i + DM_Map.DM_Matchlen - 1 ' adjust indexed based upon
matched string length

Next i

SOUNDEX = Left(strEncodedString & "000000", 6) ' Ensure the string
is at least 6 long

End Function

Private Sub AddToEncodedString(strToAddTo As String, strLastCode As
String, strToAdd As String)
' Usage: Called by SoundEx Function
' Purpose: Append sound to encoded string if rules permit
' Inputs:
' strToAdd - Encoded sound value from DM_Map
' Returns (Modified Parameters):
' strToAddTo - Passed string containing sounds encoded so far
' strLastCode - Passed string containing last sound passed to this
routine
If strToAdd = strLastCode Then ' Drop duplicate sounds
Exit Sub
End If

strLastCode = strToAdd

If strToAdd = "-1" Then ' Value from table means ignore sound
Exit Sub
End If

strToAddTo = strToAddTo & strToAdd ' Append new sound

End Sub

Private Sub LoadDMTable()
' Usage: Called by SoundEx Function
' Purpose: Generate DM_Table values
' Inputs: None - values generated by routine
' Returns (Module level):
' DM_Table - Collection of items. The key of which is the
sound to encode
' and the item data is the DM_Map structure
values
' DM_TableLoaded - Boolean value to flag if the DM_Table has been
loaded.
If DM_TableLoaded = True Then
Exit Sub ' Already loaded
End If

Set DM_Table = New Collection
'Load each of the D-M sounds and their rules to the DM_Table
LoadDMElements ("AI,AJ,AY;0;1;-1")
LoadDMElements ("AU;0;7;-1")
LoadDMElements ("A;0;-1;-1")
LoadDMElements ("B;7;7;7")
LoadDMElements ("CHS;5;54;54")
LoadDMElements ("CH;5;5;5")
LoadDMElements ("CK;5;5;5")
LoadDMElements ("CZ,CS,CSZ,CZS;4;4;4")
LoadDMElements ("C;4;4;4")
LoadDMElements ("DRZ,DRS;4;4;4")
LoadDMElements ("DS,DSH,DSZ;4;4;4")
LoadDMElements ("DZ,DZH,DZS;4;4;4")
LoadDMElements ("D,DT;3;3;3")
LoadDMElements ("EI,EJ,EY;0;1;-1")
LoadDMElements ("EU;1;1;-1")
LoadDMElements ("E;0;-1;-1")
LoadDMElements ("FB;7;7;7")
LoadDMElements ("F;7;7;7")
LoadDMElements ("G;5;5;5")
LoadDMElements ("H;5;5;-1")
LoadDMElements ("IA,IE,IO,IU;1;-1;-1")
LoadDMElements ("I;0;-1;-1")
LoadDMElements ("J;1;-1;-1")
LoadDMElements ("KS;5;54;54")
LoadDMElements ("KH;5;5;5")
LoadDMElements ("K;5;5;5")
LoadDMElements ("L;8;8;8")
LoadDMElements ("MN;-1;66;66")
LoadDMElements ("M;6;6;6")
LoadDMElements ("NM;-1;66;66") ' not a duplicate! look carefully NM
vs MN
LoadDMElements ("N;6;6;6")
LoadDMElements ("OI,OJ,OY;0;1;-1")
LoadDMElements ("O;0;-1;-1")
LoadDMElements ("P,PF,PH;7;7;7")
LoadDMElements ("Q;5;5;5")
LoadDMElements ("RZ,RS;4;4;4")
LoadDMElements ("R;9;9;9")
LoadDMElements ("SCHTSCH,SCHTSH,SCHTCH;2;4;4")
LoadDMElements ("SCH;4;4;4")
LoadDMElements ("SHTCH,SHCH,SHTSH;2;4;4")
LoadDMElements ("SHT,SCHT,SCHD;2;43;43")
LoadDMElements ("SH;4;4;4")
LoadDMElements ("STCH,STSCH,SC;2;4;4")
LoadDMElements ("STRZ,STRS,STSH;2;4;4")
LoadDMElements ("ST;2;43;43")
LoadDMElements ("SZCZ,SZCS;2;4;4")
LoadDMElements ("SZT,SHD,SZD,SD;2;43;43")
LoadDMElements ("SZ;4;4;4")
LoadDMElements ("S;4;4;4")
LoadDMElements ("TCH,TTCH,TTSCH;4;4;4")
LoadDMElements ("TH;3;3;3")
LoadDMElements ("TRZ,TRS;4;4;4")
LoadDMElements ("TSCH,TSH;4;4;4")
LoadDMElements ("TS,TTS,TTSZ,TC;4;4;4")
LoadDMElements ("TZ,TTZ,TZS,TSZ;4;4;4")
LoadDMElements ("T;3;3;3")
LoadDMElements ("UI,UJ,UY;0;1;-1")
LoadDMElements ("U,UE;0;-1;-1")
LoadDMElements ("V;7;7;7")
LoadDMElements ("W;7;7;7")
LoadDMElements ("X;5;54;54")
LoadDMElements ("Y;1;-1;-1")
LoadDMElements ("ZDZ,ZDZH,ZHDZH;2;4;4")
LoadDMElements ("ZD,ZHD;2;43;43")
LoadDMElements ("ZH,ZS,ZSCH,ZSH;4;4;4")
LoadDMElements ("Z;4;4;4")

DM_TableLoaded = True ' Flag to not load again

End Sub

Private Sub LoadDMElements(strLoadString As String)
' Usage: Called by LoadDMTable Subroutine
' Purpose: Parse and Add DM_Table items for passed D-M sound
' Inputs:
' strLoadString
' Returns (Module level):
' DM_Table - Collection of items. The key of which is the
sound to encode
' and the item data is the DM_Map structure
values
Dim strItemPart As String
Dim strItemKey As String
Dim strKeyParts As String
Dim intPosition As Integer

'Separate the passed sound into its two parts
intPosition = InStr(1, strLoadString, ";")
If intPosition = 0 Then
MsgBox "invalid parameter to LoadDMElements: " & strLoadString
Exit Sub
End If

strKeyParts = Left(strLoadString, intPosition - 1) & ","
strItemPart = Mid(strLoadString, intPosition + 1)

'Add the Item Part (sound values) for each letter combination
Do While True
intPosition = InStr(1, strKeyParts, ",")
If intPosition = 0 Then
Exit Sub
End If

strItemKey = Left(strKeyParts, intPosition - 1)
strKeyParts = Mid(strKeyParts, intPosition + 1)
DM_Table.Add strItemPart, strItemKey

Loop ' Do While True

End Sub

Private Sub FindDMMatch(intStartMatchPos As Integer, strToTest As
String, dmLocalDM As DM_Structure)
' Usage: Called by SoundEx Function
' Purpose: Find largest matching DM_Table entry at the indicated
position
' of the input name string
' Populate the passed DM_Structure with data from the
DM_Table
' Inputs:
' DM_Table - Module level table of letter combinations and
sound values
' intStartMatchPos - Place in the passed string to start looking
for letter combinations
' strToTest - String passed containing the name to encode
' Returns (Modified Parameters):
' dmLocalDM - Structure to contain data from the DM_Table
Dim strMatchString As String
Dim i As Integer
Dim strItemData As String
Dim intPosition As Integer

For i = Min(7, Len(strToTest) - intStartMatchPos + 1) To 1 Step -1
strMatchString = Mid(strToTest, intStartMatchPos, i)

strItemData = ""
On Error Resume Next ' trap error that happens when item does
not match
strItemData = DM_Table.Item(strMatchString)
On Error GoTo 0 ' turn off error handling

If strItemData <"" Then ' Parse into DM Map structure
dmLocalDM.DM_String = strMatchString
dmLocalDM.DM_Matchlen = i

intPosition = InStr(1, strItemData, ";")
dmLocalDM.DM_Start = Left(strItemData, intPosition - 1)
strItemData = Mid(strItemData, intPosition + 1)

intPosition = InStr(1, strItemData, ";")
dmLocalDM.DM_Vowel = Left(strItemData, intPosition - 1)
strItemData = Mid(strItemData, intPosition + 1)

dmLocalDM.DM_Other = strItemData

Exit Sub ' DMMatched, so return

End If
Next i

dmLocalDM.DM_Matchlen = -1 ' Should not happen if table is well
formed!!
MsgBox "String not found in DM table." & vbCr & _
"String: '" & strToTest & "'" & vbCr & _
"at position: " & intStartMatchPos, vbOKOnly, "FindDMMatch
Error"

End Sub

Public Function Min(lNumber1 As Long, lNumber2 As Long) As Long
' Usage: Min = Min(lNumber1, lNumber2)
' Purpose: Return minimum of two input numbers
' Inputs:
' lNumber1, lNumber2 - Arbitrary numbers to compare
' Returns (Function):
' Min - Smaller of the two arbitrary numbers passed
If lNumber1 < lNumber2 Then
Min = lNumber1
Else
Min = lNumber2
End If

End Function

Public Function RemoveNotChars(strToCleanUp As String, strCharsToKeep
As String, Optional varStartPosition) As String
' Usage: RemoveNotChars = RemoveNotChars(strToCleanUp,
strCharsToKeep)
' Purpose: Remove characters in passed string that are not in the
keep string
' Inputs:
' strToCleanUp - String to clean up
' strCharsToKeep - String identifying all characters to keep
' Returns (Function):
' RemoveNotChars - String with all 'not keepable' characters
removed
Dim strBuildString As String
Dim strTestChar As String
Dim intStartPos As Integer
Dim i As Integer

RemoveNotChars = ""

' If no string to clean up, or keep string is empty, then an empty
string is returned
If Len(strToCleanUp) = 0 Or Len(strCharsToKeep) = 0 Then
Exit Function
End If

' Initialize return string in light of starting position
If Not IsMissing(varStartPosition) Then
If Not IsNumeric(varStartPosition) Or varStartPosition <= 0 Then
MsgBox "StartPosition must be numeric, greater than zero",
vbOKOnly, "RemoveChars Error"
Exit Function
End If
intStartPos = varStartPosition
strBuildString = Left(strToCleanUp, varStartPosition - 1)
Else
intStartPos = 1
strBuildString = ""
End If

For i = intStartPos To Len(strToCleanUp)
strTestChar = Mid(strToCleanUp, i, 1)
If InStr(strCharsToKeep, strTestChar) <0 Then
strBuildString = strBuildString & strTestChar ' add onto end
End If
Next i

RemoveNotChars = strBuildString

End Function

Public Function SOUNDEXARAB(Surname As String) As String

Dim Result As String, c As String * 1
Dim Location As Integer

Surname = UCase(Surname)
' remove from the word
'************************************************* ***
If Left(Surname, 2) = "" Then
Surname = Mid(Surname, 3)
End If
'************************************************* ***

' get the code for each character in the word
'************************************************* ***
Result = ""
For Location = 1 To Len(Surname)
Result = Result & Category(Mid(Surname, Location, 1))
Next Location
'************************************************* ***

'Remove the repeated character
'************************************************* ***
Location = 1
Do While Location < Len(Result)
If Mid(Result, Location, 1) = Mid(Result, Location + 1, 1)
Then
Result = Left(Result, Location) & Mid(Result, Location
+ 2)
Else
Location = Location + 1
End If
Loop
'************************************************* ***

'
'************************************************* ***
If Category(Left(Result, 1)) = Mid(Result, 2, 1) Then
Result = Left(Result, 1) & Mid(Result, 3)
End If
'************************************************* ***

'remove the unkown characeter
'************************************************* ****
For Location = 1 To Len(Result)
If Mid(Result, Location, 1) = "/" Then
Result = Left(Result, Location - 1) & Mid(Result,
Location + 1)
End If
Next
'************************************************* ****

'get the first 4 haracters
'************************************************* ****
Select Case Len(Result)
Case 4
SOUNDEXARAB = Result
Case Is < 4
SOUNDEXARAB = Result & String(4 - Len(Result), "0")
Case Is 4
SOUNDEXARAB = Left(Result, 4)
End Select
'************************************************* ****
End Function

Private Function Category(c) As String

Select Case True
Case c Like "[]"
Category = "1"
Case c Like "[]"
Category = "2"
Case c Like "[]"
Category = "3"
Case c Like "[]"
Category = "4"
Case c Like "[]"
Category = "5"
Case c Like "[]"
Category = "6"
Case c Like "[]"
Category = "7"
Case c Like "[]"
Category = "8"
Case c Like "[]"
Category = "9"
Case c Like "[]"
Category = "A"
Case c Like "[]"
Category = "B"
Case c Like "[]"
Category = "C"
Case c Like "[]"
Category = "D"

Case Else
Category = ""

End Select
End Function
Marco

Mike Gramelspacher schreef:
In article <11**********************@75g2000cwc.googlegroups. com>,
vo***********@gmail.com says...
So would I better use another soundex algo then?

Marco

It probably depends which algorithm works best for the names you need to
encode. I think I read that Soundex was patented around 1918 and was
used for the U.S. Census. The microfilms I saw were indexed by Soundex,
so obviously you need to use the same algorithm to encode the name for
which you are searching as was used to encode the microfilms. Soundex
is not the best for all names. Others have been developed over the
years. I also have code for the Metaphone Algorithm.

Regarding my question for obtaining the Deitch-Mokotoff code, you
nevered answered. Where can a person download or buy an implementation
of Deitch-Mokotoff?

Mike Gramelspacher
Jul 31 '06 #25
vo***********@gmail.com wrote in
news:11*********************@i3g2000cwc.googlegrou ps.com:
you don't think this is soundex?
[code deleted]

I didn't see this CASE SELECT anywhere:

Case "B", "F", "P", "V"
code = 1
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
code = 2
Case "D", "T"
code = 3
Case "L"
code = 4
Case "M", "N"
code = 5
Case "R"
code = 6
Case Else
code = 0

If it's not encoding characters according to that rule, it's not
Soundex. It may be something *like* Soundex, but it ain't Soundex.

Oh, the first letter must be encoded as the first letter of the
result, which your function categorically is not doing (if it were
there'd be no question in the first place, since you'd have to use a
text field).

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 31 '06 #26
Lyle Fairfield <ly***********@aim.comwrote in
news:Xn*********************************@216.221.8 1.119:
vo***********@gmail.com wrote in news:1154363487.281052.20630
@i3g2000cwc.googlegroups.com:
>you don't think this is soundex?

It's not the soundex of those who wish to believe there is a TRUE
and HOLY "SOUNDEX".
Words mean something, Lyle. If you search Google for a Soundex
function, you're going to find hundreds of examples of a particular
substitution algorithm that is referred to as Soundex. Some db
engines provide Soundex built in, and that is the traditional
Soundex.

If the OP had said "Soundex-like" instead of "Soundex" there'd be no
post from me pointing out a mis-use of terminology.
It is a soundex for those who think of soundex as term for a
number of functions, . . .
I see you are using lower case. . .
. . . including used defined soundex functions which may be
written with a particular purpose for a particular database in
mind, which may be useful in identifying names with different
spellings as similar in sound. . . .
There can certainly be language variations for Soundex, such that
the substitution table is different.
. . . I
am one of those and if anyone thinks I am trespassing on this
great and good definition of Soundex, protected and pure in the
bowels of the United States National Archives and Records
Administration, and the US Patent Office, then I am entirely
willing to use the term "phonetic algorithm" or
"DoTheWordsSoundTheSame alogorithm" in its place.
I use Soundex, Soundex2, and a host of other substitution functions
that implement different substitution algorithms. All of them can't
be Soundex.
On the point of your original question, if you save 123456 as text
you will use six or twelve bytes depending upon your implemetation
of unicode. If you use a long integer you will use four. I'd go
for the four, on the basis of size, speed and a congenital
disposition towards pissing off pedants.
If his soundex-like function returns leading zeroes, he can't store
it as a numeric value.

And I think that encoding the first letter by the same rules as the
other letters is pretty much useless.

If I'm not mistaken, his function can return zero before the last
character, which the real Soundex cannot. As well, if I"m
remembering correctly, it can return multiple zeroes in a row.

So, there are at least 3 major differences between his function and
Soundex. I'd be perfectly content to call Soundex and Soundex2 the
same, since the only difference is the substitution table and the
length (Soundex2 encodes to 9 numbers and to a length of 6
characters). But once you change the basic assumptions about
repeated characters and choose to encode a 0 somewhere before the
end of the string, then you're really using something very
different.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Jul 31 '06 #27
OK David,

point taken but what interests me is, is it any better than Soundex or
haven't I
been paying attention?

I'm just looking for the best thing to search me database....

Marco

David W. Fenton schreef:
Lyle Fairfield <ly***********@aim.comwrote in
news:Xn*********************************@216.221.8 1.119:
vo***********@gmail.com wrote in news:1154363487.281052.20630
@i3g2000cwc.googlegroups.com:
you don't think this is soundex?
It's not the soundex of those who wish to believe there is a TRUE
and HOLY "SOUNDEX".

Words mean something, Lyle. If you search Google for a Soundex
function, you're going to find hundreds of examples of a particular
substitution algorithm that is referred to as Soundex. Some db
engines provide Soundex built in, and that is the traditional
Soundex.

If the OP had said "Soundex-like" instead of "Soundex" there'd be no
post from me pointing out a mis-use of terminology.
It is a soundex for those who think of soundex as term for a
number of functions, . . .

I see you are using lower case. . .
. . . including used defined soundex functions which may be
written with a particular purpose for a particular database in
mind, which may be useful in identifying names with different
spellings as similar in sound. . . .

There can certainly be language variations for Soundex, such that
the substitution table is different.
. . . I
am one of those and if anyone thinks I am trespassing on this
great and good definition of Soundex, protected and pure in the
bowels of the United States National Archives and Records
Administration, and the US Patent Office, then I am entirely
willing to use the term "phonetic algorithm" or
"DoTheWordsSoundTheSame alogorithm" in its place.

I use Soundex, Soundex2, and a host of other substitution functions
that implement different substitution algorithms. All of them can't
be Soundex.
On the point of your original question, if you save 123456 as text
you will use six or twelve bytes depending upon your implemetation
of unicode. If you use a long integer you will use four. I'd go
for the four, on the basis of size, speed and a congenital
disposition towards pissing off pedants.

If his soundex-like function returns leading zeroes, he can't store
it as a numeric value.

And I think that encoding the first letter by the same rules as the
other letters is pretty much useless.

If I'm not mistaken, his function can return zero before the last
character, which the real Soundex cannot. As well, if I"m
remembering correctly, it can return multiple zeroes in a row.

So, there are at least 3 major differences between his function and
Soundex. I'd be perfectly content to call Soundex and Soundex2 the
same, since the only difference is the substitution table and the
length (Soundex2 encodes to 9 numbers and to a length of 6
characters). But once you change the basic assumptions about
repeated characters and choose to encode a 0 somewhere before the
end of the string, then you're really using something very
different.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Aug 1 '06 #28
In article <11**********************@s13g2000cwa.googlegroups .com>,
vo***********@gmail.com says...
You can have it no problem,

if you could just send me the Metaphome Algoritm in return, thanks:
OK, here it is. I have seen places on the Internet where the advantages
and disadvantages of various Soundex-like algorithms are discussed. It
just depends on the types of names that need to be encoded.

Option Compare Database
Option Explicit

'Metaphone algorithm translated from C to Delphi by Tom White <w...
@intellex.com>
'Translated to Visual Basic by Dave White 9/10/01
'
'v1.1 fixes a few bugs
'
' Checks length of string before removing trailing S (>1)
' PH used to translate to H, now translates to F
'
'Original C version by Michael Kuhn <rhlab!mk...@uunet.uu.net>
'
'
Function InStrC(ByVal SearchIn As String, _
ByVal SoughtCharacters As String) As Integer
'--- Returns the position of the first character in SearchIn that is
contained
'--- in the string SoughtCharacters. Returns 0 if none found.
Dim i As Integer

On Error Resume Next
SoughtCharacters = UCase(SoughtCharacters)
SearchIn = UCase(SearchIn)
For i = 1 To Len(SearchIn)
If InStr(SoughtCharacters, Mid(SearchIn, i, 1)) 0 Then
InStrC = i: Exit Function
End If
Next i
InStrC = 0
End Function

Function Metaphone(ByVal A As Variant) As String
Dim b, c, d, e As String
Dim inp, outp As String
Dim vowels, frontv, varson, dbl As String
Dim excppair, nxtltr As String
Dim T, ii, jj, lng, lastchr As Integer
Dim curltr, prevltr, nextltr, nextltr2, nextltr3 As String
Dim vowelafter, vowelbefore, frontvafter, silent, hard As Integer
Dim alphachr As String

On Error Resume Next
If IsNull(A) Then A = ""
A = CStr(A)
inp = UCase(A)
vowels = "AEIOU"
frontv = "EIY"
varson = "CSPTG"
dbl = "." 'Lets us allow certain letters to be doubled
excppair = "AGKPW"
nxtltr = "ENNNR"
alphachr = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

'--Remove non-alpha characters
outp = ""
For T = 1 To Len(inp)
If InStr(alphachr, Mid(inp, T, 1)) 0 Then outp = outp + Mid(inp,
T, 1)
Next T

inp = outp: outp = ""

If Len(inp) = 0 Then Metaphone = "": Exit Function

'--Check rules at beginning of word
If Len(inp) 1 Then
b = Mid(inp, 1, 1)
c = Mid(inp, 2, 1)
ii = InStr(excppair, b)
jj = InStr(nxtltr, c)
If ii = jj And ii 0 Then
inp = Mid(inp, 2, Len(inp) - 1)
End If
End If

If Mid(inp, 1, 1) = "X" Then Mid(inp, 1, 1) = "S"

If Mid(inp, 1, 2) = "WH" Then inp = "W" + Mid(inp, 3)

If right(inp, 1) = "S" Then inp = left(inp, Len(inp) - 1)

ii = 0
Do
ii = ii + 1
'--Main Loop!
silent = False
hard = False
curltr = Mid(inp, ii, 1)
vowelbefore = False
prevltr = " "
If ii 1 Then
prevltr = Mid(inp, ii - 1, 1)
If InStrC(prevltr, vowels) 0 Then vowelbefore = True
End If

If ((ii = 1) And (InStrC(curltr, vowels) 0)) Then
outp = outp + curltr
GoTo ContinueMainLoop
End If

vowelafter = False
frontvafter = False
nextltr = " "
If ii < Len(inp) Then
nextltr = Mid(inp, ii + 1, 1)
If InStrC(nextltr, vowels) 0 Then vowelafter = True
If InStrC(nextltr, frontv) 0 Then frontvafter = True
End If

'--Skip double letters EXCEPT ones in variable double
If InStrC(curltr, dbl) = 0 Then
If curltr = nextltr Then GoTo ContinueMainLoop
End If

nextltr2 = " "
If Len(inp) - ii 1 Then
nextltr2 = Mid(inp, ii + 2, 1)
End If

nextltr3 = " "
If (Len(inp) - ii) 2 Then
nextltr3 = Mid(inp, ii + 3, 1)
End If

Select Case curltr
Case "B":
silent = False
If (ii = Len(inp)) And (prevltr = "M") Then silent = True
If Not (silent) Then outp = outp + curltr
Case "C":
If Not ((ii 2) And (prevltr = "S") And frontvafter) Then
If ((ii 1) And (nextltr = "I") And (nextltr2 = "A"))
Then
outp = outp + "X"
Else
If frontvafter Then
outp = outp + "S"
Else
If ((ii 2) And (prevltr = "S") And (nextltr =
"H")) Then
outp = outp + "K"
Else
If nextltr = "H" Then
If ((ii = 1) And (InStrC(nextltr2, vowels) = 0))
Then
outp = outp + "K"
Else
outp = outp + "X"
End If
Else
If prevltr = "C" Then
outp = outp + "C"
Else
outp = outp + "K"
End If
End If
End If
End If
End If
End If
Case "D":
If ((nextltr = "G") And (InStrC(nextltr2, frontv) 0))
Then
outp = outp + "J"
Else
outp = outp + "T"
End If
Case "G":
silent = False
If ((ii < Len(inp)) And (nextltr = "H") And _
(InStrC(nextltr2, vowels) = 0)) Then
silent = True
End If
If ((ii = Len(inp) - 4) And (nextltr = "N") And _
(nextltr2 = "E") And (nextltr3 = "D")) Then
silent = True
ElseIf ((ii = Len(inp) - 2) And (nextltr = "N")) Then
silent = True
End If
If (prevltr = "D") And frontvafter Then silent = True
If prevltr = "G" Then
hard = True
End If

If Not (silent) Then
If frontvafter And (Not (hard)) Then
outp = outp + "J"
Else
outp = outp + "K"
End If
End If

Case "H":
silent = False
If InStrC(prevltr, varson) 0 Then silent = True
If vowelbefore And (Not (vowelafter)) Then silent = True
If Not silent Then outp = outp + curltr

Case "F", "J", "L", "M", "N", "R": outp = outp + curltr

Case "K": If prevltr <"C" Then outp = outp + curltr

Case "P": If nextltr = "H" Then outp = outp + "F" Else outp = outp +
"P"

Case "Q": outp = outp + "K"

Case "S":
If ((ii 2) And (nextltr = "I") And _
((nextltr2 = "O") Or nextltr2 = "A")) Then
outp = outp + "X"
End If
If (nextltr = "H") Then
outp = outp + "X"
Else
outp = outp + "S"
End If

Case "T":
If ((ii 0) And (nextltr = "I") And _
((nextltr2 = "O") Or (nextltr2 = "A"))) Then
outp = outp + "X"
End If
If nextltr = "H" Then
If ((ii 1) Or (InStrC(nextltr2, vowels) 0)) Then
outp = outp + "0"
Else
outp = outp + "T"
End If
ElseIf Not ((ii < Len(inp) - 3) And _
(nextltr = "C") And (nextltr2 = "H")) Then
outp = outp + "T"
End If

Case "V": outp = outp + "F"

Case "W", "Y"
If (ii < Len(inp) - 1) And vowelafter Then outp = outp + curltr

Case "X": outp = outp + "KS"

Case "Z": outp = outp + "S"

End Select

ContinueMainLoop:
Loop Until (ii Len(inp))

Metaphone = outp

End Function

Aug 1 '06 #29
vo***********@gmail.com wrote in
news:11*********************@b28g2000cwb.googlegro ups.com:
point taken but what interests me is, is it any better than
Soundex or haven't I
been paying attention?
Well, it depends on your purposes. I would find a traditional
Soundex that also encodes the first letter to be completely useless,
as Soundex itself already casts the net too wide.

I find Soundex2 much more useful in finding duplicates.

The point is you want as few false positives as possible, and
encoding the first letter by the same rules as the others will lead
to the possibility of more false positives.

On the other hand, an algorithm that does something like replace
initial PH with F would make some sense to me.

So, it all depends on the specifics of your particular algorithm,
the exact implementation and the data you're evaluating. Don't
underplay the last of those -- a relatively minor proportion of data
from a language with different pronunciation rules can completely
negate the usefulness of an algorithm that works just fine with a
more homogeneous data set.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Aug 1 '06 #30
In article <Xn**********************************@127.0.0.1> ,
XX*******@dfenton.com.invalid says...
vo***********@gmail.com wrote in
news:11*********************@b28g2000cwb.googlegro ups.com:
point taken but what interests me is, is it any better than
Soundex or haven't I
been paying attention?

Well, it depends on your purposes. I would find a traditional
Soundex that also encodes the first letter to be completely useless,
as Soundex itself already casts the net too wide.

I find Soundex2 much more useful in finding duplicates.

The point is you want as few false positives as possible, and
encoding the first letter by the same rules as the others will lead
to the possibility of more false positives.

On the other hand, an algorithm that does something like replace
initial PH with F would make some sense to me.

So, it all depends on the specifics of your particular algorithm,
the exact implementation and the data you're evaluating. Don't
underplay the last of those -- a relatively minor proportion of data
from a language with different pronunciation rules can completely
negate the usefulness of an algorithm that works just fine with a
more homogeneous data set.

I found a name in a church record written 'Pedoch' and was meant to be
'Bettag'. This was an immigrant German church and the priest who wrote
that was Swiss German. Deitch-Mokotoff gives them both the number
735000 for me. Bingo. It seems to me that in a sound system, locking
in the first letter is wrong. Soundex was perhaps the best that could be
thought up at the time, but other alternatives have been developed. But
still, it depends on the situation. Certainly Soundex is much better
known that the others.

Mike GRamelspacher
Aug 1 '06 #31
Mike Gramelspacher <gr******@psci.netwrote in
news:MP************************@news.psci.net:

[]
I found a name in a church record written 'Pedoch' and was meant
to be 'Bettag'. This was an immigrant German church and the
priest who wrote that was Swiss German. Deitch-Mokotoff gives
them both the number 735000 for me. Bingo. It seems to me that
in a sound system, locking in the first letter is wrong. Soundex
was perhaps the best that could be thought up at the time, but
other alternatives have been developed. But still, it depends on
the situation. Certainly Soundex is much better known that the
others.
Well, as I said, it depends on the data you're working with and what
you're trying to do. The apps I use Soundex/Soundex2 in would not
benefit from replacing the first letter.

There is a great deal of sense to transcribing Bettag as Pedoch, if
you know how German is pronounced. If you're evaluating German
names, you should account for those facts. If you're not, that
equivalence will not necessarily help you.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
Aug 2 '06 #32
In article <Xn**********************************@127.0.0.1> ,
XX*******@dfenton.com.invalid says...
Mike Gramelspacher <gr******@psci.netwrote in
news:MP************************@news.psci.net:

[]
I found a name in a church record written 'Pedoch' and was meant
to be 'Bettag'. This was an immigrant German church and the
priest who wrote that was Swiss German. Deitch-Mokotoff gives
them both the number 735000 for me. Bingo. It seems to me that
in a sound system, locking in the first letter is wrong. Soundex
was perhaps the best that could be thought up at the time, but
other alternatives have been developed. But still, it depends on
the situation. Certainly Soundex is much better known that the
others.

Well, as I said, it depends on the data you're working with and what
you're trying to do. The apps I use Soundex/Soundex2 in would not
benefit from replacing the first letter.

There is a great deal of sense to transcribing Bettag as Pedoch, if
you know how German is pronounced. If you're evaluating German
names, you should account for those facts. If you're not, that
equivalence will not necessarily help you.

I can agree with all that you say. And of course, the biggest advantage
of Soundex is that it can be computed by pencil and paper by almost
anyone after minimal training. Probably an overwhelming consideration at
the time Soundex came into use. I guess that is my parting statement.

Mike Gramelspacher
Aug 2 '06 #33

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Ricky Romaya | last post: by
reply views Thread by Mickey Mouse | last post: by
3 posts views Thread by arthur benedetti white | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by Marylou17 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.