473,480 Members | 1,781 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Determining Syllables

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents, and, so
I'm currently stuck with using a dictionary - not so bad, until a word isn't
in the dictionary I'm using!

Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.



Nov 15 '05 #1
6 4627
pemo wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents, and, so
I'm currently stuck with using a dictionary - not so bad, until a word isn't
in the dictionary I'm using!

Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.


This is OT. It has nothing to do with C...
--
one's freedom stops where other's begin

Giannis Papadopoulos
http://dop.users.uth.gr/
University of Thessaly
Computer & Communications Engineering dept.
Nov 15 '05 #2
In article <dc**********@news.ox.ac.uk>,
pemo <pe*********@comlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?


I don't know about comp.programming, but this topic certainly
does not belong to comp.lang.c

Perhaps the information on the following page may be of some help:

http://www.tex.ac.uk/cgi-bin/texfaq2html?label=hyphen

Read that then do a web search on "hyphenation Frank Liang".

--
Rouben Rostamian
Nov 15 '05 #3

[followups set to c.p, since this is not a C question]
On Thu, 4 Aug 2005, pemo wrote:

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents,
Not surprising, since that would be equivalent to counting the syllables
in English words, and that's not an algorithmic problem.
English doesn't follow strictly algorithmic rules, because it's not
strictly phonetic. I could come along tomorrow and make up a word, like
"Worcestershire," and make up a pronunciation for it, like "wooster," and
any computer program in the word wouldn't be able to figure that out from
the spelling. Heck, most /humans/ don't know how every English word is
pronounced, and we have many, many man-years to study the problem!

[...] Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.
Yes, a good hyphenation algorithm can be /very/ good. The basic rule of
good hyphenation is to come up with sets of English words that all have a
hyphenation point in the same general context, and then remember the
context. For example, if you see a word ending in -ible, you can hyphenate
it there, unless it ends in c-ible or g-ible, in which case you can't. You
can generally hyphenate before -str, or after hy-. And so on.
The basic research for hyphenation patterns in English has already been
done several times, e.g. by Frank Liang for TeX, but I don't know anywhere
you could get patterns for syllable counting. Still, I'd start by
downloading the TeX hyphenation patterns, and using them to find every
single hyphenation point in your word. Then it would probably be a good
idea to discard any segments that don't contain any vowels (but I'm sure
there are exceptions, and not just "nth" and "ssh").
I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.


Really? One of the inputs to the Flesch readability formula /is/ the
number of syllables in the text. So if you can find a program that claims
to accurately compute Flesch scores, go with it! (I doubt such programs
exist, though. A Google search turned up Flesh,
http://jack.gravco.com/flesh.html, but it thinks "birthday" has one
syllable, so I didn't bother investigating any further.)

Actually, given the application to Flesch readability computations, I
might be interested in the syllable-counting problem. If you get anything
working, would you let me know? And I'll post here if I find anything
clever --- but don't hold your breath.

-Arthur
Nov 15 '05 #4
In article <dc**********@news.ox.ac.uk>,
pemo <pe*********@comlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?


It cannot be done on a stand-alone basis. The same character string
might be multiple words with different pronunciations and
different syllable boundaries, so one would have to be able to
deduce which of the words was meant by examining surrounding context.

I played around informally with syllabification a few decades ago,
and eventually realized that in English (or Canadian English anyhow)
the proper syllabification depended upon the atress points ("accents"),
and was also tied in with whether particular vowels were long or short.
You have to "look ahead": syllables can change depending upon the
suffixes one adds... and if one then adds further suffixes,
they can change again.
--
"[...] it's all part of one's right to be publicly stupid." -- Dave Smey
Nov 15 '05 #5

"pemo" <pe*********@comlab.ox.ac.uk> wrote
Does anyone know of an algorithm that can accurately determine the number
of syllables in a given English word - esp. if that word isn't already
'known' by such an algorithm?

FYI, there are two approaches I'm currently considering.

It's a machine learning problem. Try Hidden Markov Models or neural
networks. However the language you chose to implement such an algorithm in
will be the least of your problems, so comp.lang.c isn't very relevant.

Look at the NETtalk program. That used a neural network to convert text to
speech (phonemes) and could easily be modified to count syallables per word,
I would imagine.
Nov 15 '05 #6
On Thu, 4 Aug 2005 17:51:38 +0100, pemo
<pe*********@comlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward.
Well, you've got a problem right there -- some people say par'-luh-munt
and others say par'-li-a-ment. Lots of proper names (places as well
as people) have that effect -- is Worcester three syllables or two?
Aylesbury (ails'-bri or ails'-buh-ri)? Chol-mon-de-ly or Chum-ley?
Tall-i-a-fe-ro or Tol-i-ver? Michael pronounced mikh'-ah-el or mI'-kel?
Is Catherine kath-uh-rin or kath-rin? Con-sid-er-ing or con-sid-ring?
Equiv-a-lent or equ-v-lent? Al-go-rithm or Al-go-rith-um?
Dic-shun-ar-y or dik-shun-ry?
However, I can't find a program to convert English words into their
IPA equivalents, and, so I'm currently stuck with using a dictionary -
not so bad, until a word isn't in the dictionary I'm using!


Since there is no fixed rendering of English words into phonetic form,
in a lot of cases (even the OED often describes several different
pronunciations) it's not surprising that you can't find one which works
well.

And if you want words which aren't 'known' all bets are off, since that
includes technical and foreign words 'imported' into the language...

(Followups to comp.programming)

Chris C
Nov 15 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1340
by: D & J Gilchrist | last post by:
Hi Is there any way of determining the width and height of a graphic in pixels (.jpg) and reporting this? Also, maybe following the above, can a picture-box position be automatically moved to...
12
3607
by: Cliff Wells | last post by:
Hi, I'm writing an application that needs to know if an Internet connection is available. Basically, I want to have something similar to what a lot of email clients have, where the app can work...
0
1230
by: Stephen Nesbitt | last post by:
All: Here's my implementation problem. I have a base class which has the responsibility for providing entry into the logging system. Part of the class responsibility is to ensure that lagger...
2
5406
by: Luca | last post by:
I have the following problem: I'm developing a system where there are some processes that communicate each other via message queues; the message one process can send to another process is as...
1
26863
by: Simon Wigzell | last post by:
I am adapting a javascript pulldown menu system to my dynamic website generator - the arrays that hold the menu items information are read from a database and will be different for different users...
2
2226
by: Phil Galey | last post by:
In VB.NET I find the IO object very handy in replacing most of the functionality of the FileSystemObject. One exception, however, is in determining the size of a file. How can you determine the...
3
1611
by: Fred Nelson | last post by:
I'm devloping a Web Application in VB.NET. In my web.config file I have specified that untrapped errors are to be sent to the page "errorpage.aspx". This is working fine - if an untrapped error...
13
5209
by: mavishster | last post by:
hi everyone. i have to write a prog that divedes words on syllables for one language. the algoritm itself is ok.i take from user the string convert it to char string make the operation and put it to...
0
6908
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7048
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7088
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6741
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
6956
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
4783
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
2997
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
1
563
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
183
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.