473,807 Members | 2,857 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Determining Syllables

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents, and, so
I'm currently stuck with using a dictionary - not so bad, until a word isn't
in the dictionary I'm using!

Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.



Nov 15 '05 #1
6 4639
pemo wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents, and, so
I'm currently stuck with using a dictionary - not so bad, until a word isn't
in the dictionary I'm using!

Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.


This is OT. It has nothing to do with C...
--
one's freedom stops where other's begin

Giannis Papadopoulos
http://dop.users.uth.gr/
University of Thessaly
Computer & Communications Engineering dept.
Nov 15 '05 #2
In article <dc**********@n ews.ox.ac.uk>,
pemo <pe*********@co mlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?


I don't know about comp.programmin g, but this topic certainly
does not belong to comp.lang.c

Perhaps the information on the following page may be of some help:

http://www.tex.ac.uk/cgi-bin/texfaq2html?label=hyphen

Read that then do a web search on "hyphenatio n Frank Liang".

--
Rouben Rostamian
Nov 15 '05 #3

[followups set to c.p, since this is not a C question]
On Thu, 4 Aug 2005, pemo wrote:

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents,
Not surprising, since that would be equivalent to counting the syllables
in English words, and that's not an algorithmic problem.
English doesn't follow strictly algorithmic rules, because it's not
strictly phonetic. I could come along tomorrow and make up a word, like
"Worcestershire ," and make up a pronunciation for it, like "wooster," and
any computer program in the word wouldn't be able to figure that out from
the spelling. Heck, most /humans/ don't know how every English word is
pronounced, and we have many, many man-years to study the problem!

[...] Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.
Yes, a good hyphenation algorithm can be /very/ good. The basic rule of
good hyphenation is to come up with sets of English words that all have a
hyphenation point in the same general context, and then remember the
context. For example, if you see a word ending in -ible, you can hyphenate
it there, unless it ends in c-ible or g-ible, in which case you can't. You
can generally hyphenate before -str, or after hy-. And so on.
The basic research for hyphenation patterns in English has already been
done several times, e.g. by Frank Liang for TeX, but I don't know anywhere
you could get patterns for syllable counting. Still, I'd start by
downloading the TeX hyphenation patterns, and using them to find every
single hyphenation point in your word. Then it would probably be a good
idea to discard any segments that don't contain any vowels (but I'm sure
there are exceptions, and not just "nth" and "ssh").
I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.


Really? One of the inputs to the Flesch readability formula /is/ the
number of syllables in the text. So if you can find a program that claims
to accurately compute Flesch scores, go with it! (I doubt such programs
exist, though. A Google search turned up Flesh,
http://jack.gravco.com/flesh.html, but it thinks "birthday" has one
syllable, so I didn't bother investigating any further.)

Actually, given the application to Flesch readability computations, I
might be interested in the syllable-counting problem. If you get anything
working, would you let me know? And I'll post here if I find anything
clever --- but don't hold your breath.

-Arthur
Nov 15 '05 #4
In article <dc**********@n ews.ox.ac.uk>,
pemo <pe*********@co mlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?


It cannot be done on a stand-alone basis. The same character string
might be multiple words with different pronunciations and
different syllable boundaries, so one would have to be able to
deduce which of the words was meant by examining surrounding context.

I played around informally with syllabification a few decades ago,
and eventually realized that in English (or Canadian English anyhow)
the proper syllabification depended upon the atress points ("accents"),
and was also tied in with whether particular vowels were long or short.
You have to "look ahead": syllables can change depending upon the
suffixes one adds... and if one then adds further suffixes,
they can change again.
--
"[...] it's all part of one's right to be publicly stupid." -- Dave Smey
Nov 15 '05 #5

"pemo" <pe*********@co mlab.ox.ac.uk> wrote
Does anyone know of an algorithm that can accurately determine the number
of syllables in a given English word - esp. if that word isn't already
'known' by such an algorithm?

FYI, there are two approaches I'm currently considering.

It's a machine learning problem. Try Hidden Markov Models or neural
networks. However the language you chose to implement such an algorithm in
will be the least of your problems, so comp.lang.c isn't very relevant.

Look at the NETtalk program. That used a neural network to convert text to
speech (phonemes) and could easily be modified to count syallables per word,
I would imagine.
Nov 15 '05 #6
On Thu, 4 Aug 2005 17:51:38 +0100, pemo
<pe*********@co mlab.ox.ac.uk> wrote:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward.
Well, you've got a problem right there -- some people say par'-luh-munt
and others say par'-li-a-ment. Lots of proper names (places as well
as people) have that effect -- is Worcester three syllables or two?
Aylesbury (ails'-bri or ails'-buh-ri)? Chol-mon-de-ly or Chum-ley?
Tall-i-a-fe-ro or Tol-i-ver? Michael pronounced mikh'-ah-el or mI'-kel?
Is Catherine kath-uh-rin or kath-rin? Con-sid-er-ing or con-sid-ring?
Equiv-a-lent or equ-v-lent? Al-go-rithm or Al-go-rith-um?
Dic-shun-ar-y or dik-shun-ry?
However, I can't find a program to convert English words into their
IPA equivalents, and, so I'm currently stuck with using a dictionary -
not so bad, until a word isn't in the dictionary I'm using!


Since there is no fixed rendering of English words into phonetic form,
in a lot of cases (even the OED often describes several different
pronunciations) it's not surprising that you can't find one which works
well.

And if you want words which aren't 'known' all bets are off, since that
includes technical and foreign words 'imported' into the language...

(Followups to comp.programmin g)

Chris C
Nov 15 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1354
by: D & J Gilchrist | last post by:
Hi Is there any way of determining the width and height of a graphic in pixels (.jpg) and reporting this? Also, maybe following the above, can a picture-box position be automatically moved to place the graphic's right-hand edge at the right-hand edge of a form, no matter what size the graphic? (Using VB5)
12
3670
by: Cliff Wells | last post by:
Hi, I'm writing an application that needs to know if an Internet connection is available. Basically, I want to have something similar to what a lot of email clients have, where the app can work either in "online" or "offline" mode (it keeps a cache of downloaded info, so it can work without a connection if needed). The basic problem is this: it downloads info (RSS feeds) from a variety of sources. Any one (or more) of these could...
0
1242
by: Stephen Nesbitt | last post by:
All: Here's my implementation problem. I have a base class which has the responsibility for providing entry into the logging system. Part of the class responsibility is to ensure that lagger names are consistent. For all intents and purposes this class should be considered abstract and will always be subclassed. What I want to do is the following: - allow the logger name to set explicitly. I've accomplished this by
2
5426
by: Luca | last post by:
I have the following problem: I'm developing a system where there are some processes that communicate each other via message queues; the message one process can send to another process is as follows: ****************************************** struct ST_MSG { int iType; char aData; }
1
26975
by: Simon Wigzell | last post by:
I am adapting a javascript pulldown menu system to my dynamic website generator - the arrays that hold the menu items information are read from a database and will be different for different users of my system. Unfortunately the javascript menu system requires that each menu cell's width be dimensioned. if I don't hard code a large enough value then long menu item names are trimmed, besides the fact that it then looks ugly for most menu...
2
2255
by: Phil Galey | last post by:
In VB.NET I find the IO object very handy in replacing most of the functionality of the FileSystemObject. One exception, however, is in determining the size of a file. How can you determine the size of a file in VB.NET without adding the FileSystemObject to your project?
3
1628
by: Fred Nelson | last post by:
I'm devloping a Web Application in VB.NET. In my web.config file I have specified that untrapped errors are to be sent to the page "errorpage.aspx". This is working fine - if an untrapped error occurs the application is indeed routed to this page. On this page I would like to determine the cause of the error and either log it in a file or send it to me via e-mail. I am able to determine the page that sent me there from the URL...
13
5255
by: mavishster | last post by:
hi everyone. i have to write a prog that divedes words on syllables for one language. the algoritm itself is ok.i take from user the string convert it to char string make the operation and put it to new string with difined size,but then the new string is displayed (and it is shorter than strings length) the empty spaces show squares. how can i solve this problem?
0
9599
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10372
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10112
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9193
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6879
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5685
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4330
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3854
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3011
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.