473,387 Members | 1,844 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Determining Syllables

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents, and, so
I'm currently stuck with using a dictionary - not so bad, until a word isn't
in the dictionary I'm using!

Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.



Nov 15 '05 #1
6 4624
pemo wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents, and, so
I'm currently stuck with using a dictionary - not so bad, until a word isn't
in the dictionary I'm using!

Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.


This is OT. It has nothing to do with C...
--
one's freedom stops where other's begin

Giannis Papadopoulos
http://dop.users.uth.gr/
University of Thessaly
Computer & Communications Engineering dept.
Nov 15 '05 #2
In article <dc**********@news.ox.ac.uk>,
pemo <pe*********@comlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?


I don't know about comp.programming, but this topic certainly
does not belong to comp.lang.c

Perhaps the information on the following page may be of some help:

http://www.tex.ac.uk/cgi-bin/texfaq2html?label=hyphen

Read that then do a web search on "hyphenation Frank Liang".

--
Rouben Rostamian
Nov 15 '05 #3

[followups set to c.p, since this is not a C question]
On Thu, 4 Aug 2005, pemo wrote:

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents,
Not surprising, since that would be equivalent to counting the syllables
in English words, and that's not an algorithmic problem.
English doesn't follow strictly algorithmic rules, because it's not
strictly phonetic. I could come along tomorrow and make up a word, like
"Worcestershire," and make up a pronunciation for it, like "wooster," and
any computer program in the word wouldn't be able to figure that out from
the spelling. Heck, most /humans/ don't know how every English word is
pronounced, and we have many, many man-years to study the problem!

[...] Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.
Yes, a good hyphenation algorithm can be /very/ good. The basic rule of
good hyphenation is to come up with sets of English words that all have a
hyphenation point in the same general context, and then remember the
context. For example, if you see a word ending in -ible, you can hyphenate
it there, unless it ends in c-ible or g-ible, in which case you can't. You
can generally hyphenate before -str, or after hy-. And so on.
The basic research for hyphenation patterns in English has already been
done several times, e.g. by Frank Liang for TeX, but I don't know anywhere
you could get patterns for syllable counting. Still, I'd start by
downloading the TeX hyphenation patterns, and using them to find every
single hyphenation point in your word. Then it would probably be a good
idea to discard any segments that don't contain any vowels (but I'm sure
there are exceptions, and not just "nth" and "ssh").
I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.


Really? One of the inputs to the Flesch readability formula /is/ the
number of syllables in the text. So if you can find a program that claims
to accurately compute Flesch scores, go with it! (I doubt such programs
exist, though. A Google search turned up Flesh,
http://jack.gravco.com/flesh.html, but it thinks "birthday" has one
syllable, so I didn't bother investigating any further.)

Actually, given the application to Flesch readability computations, I
might be interested in the syllable-counting problem. If you get anything
working, would you let me know? And I'll post here if I find anything
clever --- but don't hold your breath.

-Arthur
Nov 15 '05 #4
In article <dc**********@news.ox.ac.uk>,
pemo <pe*********@comlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?


It cannot be done on a stand-alone basis. The same character string
might be multiple words with different pronunciations and
different syllable boundaries, so one would have to be able to
deduce which of the words was meant by examining surrounding context.

I played around informally with syllabification a few decades ago,
and eventually realized that in English (or Canadian English anyhow)
the proper syllabification depended upon the atress points ("accents"),
and was also tied in with whether particular vowels were long or short.
You have to "look ahead": syllables can change depending upon the
suffixes one adds... and if one then adds further suffixes,
they can change again.
--
"[...] it's all part of one's right to be publicly stupid." -- Dave Smey
Nov 15 '05 #5

"pemo" <pe*********@comlab.ox.ac.uk> wrote
Does anyone know of an algorithm that can accurately determine the number
of syllables in a given English word - esp. if that word isn't already
'known' by such an algorithm?

FYI, there are two approaches I'm currently considering.

It's a machine learning problem. Try Hidden Markov Models or neural
networks. However the language you chose to implement such an algorithm in
will be the least of your problems, so comp.lang.c isn't very relevant.

Look at the NETtalk program. That used a neural network to convert text to
speech (phonemes) and could easily be modified to count syallables per word,
I would imagine.
Nov 15 '05 #6
On Thu, 4 Aug 2005 17:51:38 +0100, pemo
<pe*********@comlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward.
Well, you've got a problem right there -- some people say par'-luh-munt
and others say par'-li-a-ment. Lots of proper names (places as well
as people) have that effect -- is Worcester three syllables or two?
Aylesbury (ails'-bri or ails'-buh-ri)? Chol-mon-de-ly or Chum-ley?
Tall-i-a-fe-ro or Tol-i-ver? Michael pronounced mikh'-ah-el or mI'-kel?
Is Catherine kath-uh-rin or kath-rin? Con-sid-er-ing or con-sid-ring?
Equiv-a-lent or equ-v-lent? Al-go-rithm or Al-go-rith-um?
Dic-shun-ar-y or dik-shun-ry?
However, I can't find a program to convert English words into their
IPA equivalents, and, so I'm currently stuck with using a dictionary -
not so bad, until a word isn't in the dictionary I'm using!


Since there is no fixed rendering of English words into phonetic form,
in a lot of cases (even the OED often describes several different
pronunciations) it's not surprising that you can't find one which works
well.

And if you want words which aren't 'known' all bets are off, since that
includes technical and foreign words 'imported' into the language...

(Followups to comp.programming)

Chris C
Nov 15 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: D & J Gilchrist | last post by:
Hi Is there any way of determining the width and height of a graphic in pixels (.jpg) and reporting this? Also, maybe following the above, can a picture-box position be automatically moved to...
12
by: Cliff Wells | last post by:
Hi, I'm writing an application that needs to know if an Internet connection is available. Basically, I want to have something similar to what a lot of email clients have, where the app can work...
0
by: Stephen Nesbitt | last post by:
All: Here's my implementation problem. I have a base class which has the responsibility for providing entry into the logging system. Part of the class responsibility is to ensure that lagger...
2
by: Luca | last post by:
I have the following problem: I'm developing a system where there are some processes that communicate each other via message queues; the message one process can send to another process is as...
1
by: Simon Wigzell | last post by:
I am adapting a javascript pulldown menu system to my dynamic website generator - the arrays that hold the menu items information are read from a database and will be different for different users...
2
by: Phil Galey | last post by:
In VB.NET I find the IO object very handy in replacing most of the functionality of the FileSystemObject. One exception, however, is in determining the size of a file. How can you determine the...
3
by: Fred Nelson | last post by:
I'm devloping a Web Application in VB.NET. In my web.config file I have specified that untrapped errors are to be sent to the page "errorpage.aspx". This is working fine - if an untrapped error...
13
by: mavishster | last post by:
hi everyone. i have to write a prog that divedes words on syllables for one language. the algoritm itself is ok.i take from user the string convert it to char string make the operation and put it to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.