473,396 Members | 1,792 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Using Soundex (OT?)

Hi,

I'm curious about soundex. All I know that it's a way for making spelling-
error-tolerant word matching. What I want to know is whether the soundex
algorithm are made exclusively for english language, or can it be used for
any arbitrary language with satisfactory performance (by 'satisfactory
performance' I meant that it can detect at least 80% spelling-errors). What
about PHP soundex support?

TIA
Jul 17 '05 #1
6 3315
On 05 Feb 2005 19:09:04 GMT, Ricky Romaya <so*******@somewhere.com> wrote:
I'm curious about soundex. All I know that it's a way for making spelling-
error-tolerant word matching. What I want to know is whether the soundex
algorithm are made exclusively for english language, or can it be used for
any arbitrary language with satisfactory performance (by 'satisfactory
performance' I meant that it can detect at least 80% spelling-errors). What
about PHP soundex support?


Soundex is for English words, based on English pronunciation rules. See:
http://en.wikipedia.org/wiki/Soundex

There's also a reference there to Metaphone, which is supposedly better, but
also English-based.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #2
Andy Hassall wrote:
Soundex is for English words, based on English pronunciation rules.
See: http://en.wikipedia.org/wiki/Soundex


You *can* of course cook up your own Soundex-functions with values
created based on other languages the algorithm is very easy. For some
languages it might be rather easy, but possibly not worth the effort;
though the original algorithm is for english, it will work "quite
well" for many other languages too.

It's worthwhile to note that soundex (and similar functions) only work
for individual words, and that by using it you aren't supposed to
detect spelling errors. The best use for soundex is when you're
searching for names, addresses or the like and don't know how it is
actually written, but know what it sounds like - you can have the
soundex values stored in the database with other data and when you do
a search, you first look for the exact string the user entered. If
this doesn't return enough results, you count the soundex value for
the user input and try with that. This way you get results that "sound
same" ... so they're propably close to what you really were looking
for. I think a similar approach is used on the search engine at
www.php.net (I can't be certain though, but it seems like that - see
http://fi.php.net/manual-lookup.php?pattern=sundeks for example:)

--
Markku Uttula

Jul 17 '05 #3
Markku Uttula wrote:
http://fi.php.net/manual-lookup.php?pattern=sundeks for example:)


I hate to comment on my own postings, but I need to add that php.net
manual page for Soundex is quite good to read. It also has links to
some other functions (Metaphone and Levenshtein) that might prove
usefull.

--
Markku Uttula

Jul 17 '05 #4
"Ricky Romaya" <so*******@somewhere.com> wrote in message
news:Xn********************************@66.250.146 .159...
Hi,

I'm curious about soundex. All I know that it's a way for making spelling-
error-tolerant word matching. What I want to know is whether the soundex
algorithm are made exclusively for english language, or can it be used for
any arbitrary language with satisfactory performance (by 'satisfactory
performance' I meant that it can detect at least 80% spelling-errors). What about PHP soundex support?

TIA


Soundex is really only good for surnames. You can't use it for general text
search since it'd yield too many irrelevant results. It was designed for
grouping similiar surnames and not for handling typos. Names that are
spelled very differently could end up with the same value. For example,
Sznyder, Schneider, and Snyder are all given S536, while Smith, Smit, and
Schmidt get S530.

Soundex can handle surnames of foreign origins. For example, the variants of
my own--Leong, Leung, Liang, Long--all have the same soundex value.
Jul 17 '05 #5
Chung Leong wrote:
Soundex is really only good for surnames. You can't use it for general text
search since it'd yield too many irrelevant results. It was designed for
grouping similiar surnames and not for handling typos. Names that are
spelled very differently could end up with the same value. For example,
Sznyder, Schneider, and Snyder are all given S536, while Smith, Smit, and
Schmidt get S530.

Soundex can handle surnames of foreign origins. For example, the variants of
my own--Leong, Leung, Liang, Long--all have the same soundex value.


I found that a combination of the metaphone and Levenshtein function works
better for first names -- I'm using it to suggest alternatives in a
dictionary here:

<http://www.japanesetranslator.co.uk/your-name-in-japanese/>

It's supposed to be a dictionary of English names, but a lot of them are
actually of foreign origin (like most "English" names, I guess).

if I remember correctly, the Soundex function was a bit too clumsy and threw
out hundreds of alternatives for some unrecognized spellings, and none for
others.

Instead I use the metaphone function to search for possible alternatives,
and then sort them based on their Levenshtein distance from the search term.
It works pretty well.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Jul 17 '05 #6
"Markku Uttula" <ma***********@disconova.com> wrote in news:rVfNd.1805
$U*******@reader1.news.jippii.net:
Markku Uttula wrote:
http://fi.php.net/manual-lookup.php?pattern=sundeks for example:)


I hate to comment on my own postings, but I need to add that php.net
manual page for Soundex is quite good to read. It also has links to
some other functions (Metaphone and Levenshtein) that might prove
usefull.

Well, could someone suggest some way to mimic google's 'suggested
keyword' functionality which works across different languages? I've done
some reading about soundex, metaphone, and levenshtein, which IMHO are
designed exclusively for english.

Also, I've read about aspell & pspell on PHP manual. Sadly, it doesn't
work on win32 platform (and not to mention it's an additional module,
which I don't have the authority to install). Anyway to simulate them on
pure PHP?

TIA
Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Julie Paten | last post by:
**** Post for FREE via your newsreader at post.usenet.com **** Hello, I am using sql+ to try and update a table and am having some trouble. Below is a select statement with the result I want...
3
by: arthur benedetti white | last post by:
has anybody already developed a server side soundex function?
3
by: Vinay Jain | last post by:
Hi I want to write soundex query for example: select * from student where name soundex 'vinay'; In psql it gives error at soundex. Regards Vinay -- Vinay Jain Dissertation Project Trainee...
32
by: vonclausowitz | last post by:
Hi All, I have database with names on which I want to use the soundex option. So I have created two seperate fields for the Lastname and Firstname in which I save the Soundex version of a new...
2
by: cj | last post by:
We have a legacy accounting system (not developed in house) here that happens to be written in Visual FoxPro. One of the tables has an index that is actually a coded function COMPANY1 ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.