473,288 Members | 1,704 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,288 software developers and data experts.

compare two voices

I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

Jul 19 '05 #1
7 7689
On Sat, 30 Apr 2005 20:00:57 -0700, Qiangning Hong wrote:
I want to make an app to help students study foreign language. I want the
following function in it:

The student reads a piece of text to the microphone. The software records
it and compares it to the wave-file pre-recorded by the teacher, and gives
out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.


Do you have any idea what it takes to compare two voices in a
*meaningful* fashion? This is a serious question. I can't guarantee
there is no app to help with this, but if it does exist, it either costs a
lot of money, or will be almost impossible to use for what you want
(boiling two voice samples down to a speaker-independent single similarity
number... the mind boggles at the possible number of ways of defining that).
Quite possibly both.

If you *do* know something about the math, which, by the way, is graduate
level+, then you'd do better to go look at the open source voice
recognition systems and ask on those mailing lists.

No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good PhD
thesis. I have no idea if it has already been done and you will likely get
much better help from such a community where people might know that. I am
aware of the CMU Sphinx project, which should get you started Googling.
Good luck; it's a great idea, but if somebody somewhere hasn't already
done it, it's an extremely tough one.

(Theoretically, it's probably not a horrid problem, but my intuition leads
me to believe that turning it into a *useful product*, that corresponds to
what humans would say is "similar", will probably be a practical
nightmare. Plus it'll be highly language dependent; a similarity algorithm
for Chinese probably won't work very well for English and vice versa. All
this, and you *could* just play the two sounds back to the human and let
their brain try to understand it... ;-) )

Waiting for the message pointing to the Sourceforge project that
implemented this three years ago...
Jul 19 '05 #2
Jeremy Bowers wrote:
No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good
PhD thesis.


No, my goal is nothing relative to voice recognition. Sorry that I
haven't described my question clearly. We are not teaching English, so
the voice recognition isn't helpful here.

I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".

Yes, this not a Python problem, but I am a fan of Python and using
Python to develop the other parts of the application (UI, sound play
and record, grammer training, etc), so I ask here for available python
module, and of cause, for any kindly suggestions unrelative to Python
itself (like yours) too.

I myself have tried using Python's standard audioop module, using the
findfactor and rms functions. I try to use the value returned from
rms(add(a, mul(b, -findfactor(a, b)))) as the score. But the result is
not good. So I want to know if there is a human-voice optimized
algorithm/library out there.

Jul 19 '05 #3
> Jeremy Bowers wrote:
No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good
PhD thesis.

Qiangning Hong wrote: No, my goal is nothing relative to voice recognition. Sorry that I
haven't described my question clearly. We are not teaching English, so
the voice recognition isn't helpful here.
To repeat what Jeremy wrote - what you are asking *is* relative
to voice recognition. You want to recognize that two different voices,
with different pitches, pauses, etc., said the same thing.

There is a lot of data in speech. That's why sound files are bigger
than text files. Some of it gets interpreted as emotional nuances,
or as an accent, while others are simply ignored.
I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".


Try this: record the word twice and overlay them. They will be
different. And that's with the same speaker. Now try it with your
voice compared with another's. You can hear just how different they
are. One will be longer, another deeper, or with the "o" sound
originating in a different part of the mouth.

At the level you are working on the computer doesn't know which of
the data can be ignored. It doesn't know how to find the start
of the word (as when a student says "ummm, good"). It doesn't know
how to stretch the timings, nor adjust for pitch between, say,
a man and a woman's voice.

My ex-girlfriend gave me a computer program for learning Swedish.
It included a program to do a simpler version of what you are
asking. It only compared phonemes, so I could practice the vowels.
Even then it's comparison seemed more like a random value than
meaningful.

Again, as Jeremy said, you want something harder than what
speech recognition programs do. They at least are trained
to understand a given speaker, which helps improve the quality
of the recognition. You don't want that -- that's the
opposite of what you're trying to do. Speaker-independent
voice recognition is harder than speaker-dependent.

You can implement a solution on the lines you were thinking of
but as you found it doesn't work. A workable solution will
require good speech recognition capability and is still very
much in the research stage (as far as I know; it's not my
field).

If your target language is a major one then there may be some
commercial language recognition software you can use. You
could have your reference speaker train the software on the
vocabulary list and have your students try to have the software
recognize the correct word.

If your word list is too short or recognizer not set well
enough then saying something like "thud" will also be
recognized as being close enough to "good".

Why don't you just have the students hear both the
teachers voice and the student's just recorded voice, one
right after the other? That gives feedback. Why does
the computer need to judge the correctness?

Andrew
da***@dalkescientific.com

Jul 19 '05 #4
Qiangning Hong wrote:
I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

I have worked on a commercial product that attempts to do this and I will confirm that it is very
difficult to create a meaningful score.

Kent
Jul 19 '05 #5
[Qiangning Hong]
I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".
If I had this problem and was alone, I would likely create one audiogram
(I mean a spectral frequency analysis over time) for each voice sample.
Normally, this is presented with frequency on the vertical axis, time on
the horizontal axis, and gray value for frequency amplitude. There are
a few tools available for doing this, yet integrating them in another
application may require some work.

Now, because of voice pitch differences and elocution speed, the
audiograms would somehow look alike, yet scaled differently in both
directions. The problem you now have is to recognise that an image is
"similar" to part of another, so here, I would likely do some research
on various transforms (like Hough's and any other of the same kind) that
might ease normalisation prior to comparison. Image classification
techniques (they do this a lot in satellite imagery) for recognizing
similar textures in audiograms, and so, clues for matching images. A
few image classification programs which have been previously announced
here, I did not look at them yet, but who knows, they may be helpful.

Then, if the above work is done correctly and meaningfully, you now want
to compute correlations between normalised audiograms. More correlated
they are, more likely the original pronunciation were.

Now, if I had this problem and could call friends, I would surely phone
one or two of them, who work at companies offering voice recognition
devices or services. They will be likely reluctant at sharing advanced
algorithms, as these give them industrial advantage over competitors.
I try to use the value returned from rms(add(a, mul(b, -findfactor(a,
b)))) as the score. But the result is not good.


Oh, absolutely no chance that such a simple thing would ever work. :-)

--
François Pinard http://pinard.progiciels-bpi.ca
Jul 19 '05 #6

"Qiangning Hong" <ho****@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...
I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?


How about another approach?

All modern speech recognition systems employ a phonetic alphabet. It's how
you describe to the speech recognition engine exactly how the word sounds.

For each sentence read, you create a small recognition context that includes
the sentence itself, AND subtle variations of the sentence phonetically.

For example (using English):

You want them to say correctly: "The weather is good today".

You create a context with the following phrases which include the original
sentence, and then alternative sentences that dithers (varies) the original
sentence phonetically. Sample context:

(*) The weather is good today
Da wedder is god tuday
The weether is good towday

Etc.

Then submit the context to the speech recognition engine and ask the user to
say the sentences. If the original sentence (*) comes back as the speech
recognition engine's best choice, then they said it right. If one of the
other choices comes back, then they made a mistake.

You could even "grade" their performance by tagging the variations by
closeness to the original, for example:

(*) The weather is good today (100)
Da wedder is god tuday (80)
Ta wegger es gid towday (50)

In the example above, the original sentence gets a 100, the second choice
which is close gets an 80, and the last option which is pretty bad gets 50.
With a little effort you could automatically create the "dithered" phonetic
variations and auto-calculate the score or closeness to original too.

Thanks,
Robert
http://www.robodance.com
Robosapien Dance Machine - SourceForge project

Jul 19 '05 #7

Qiangning Hong wrote:
I want to make an app to help students study foreign language. I want the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.
This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?


As others have noted this is a difficult problem.
This library was developed to study speech and should be worth a look:
http://www.speech.kth.se/snack/
"""
Using Snack you can create powerful multi-platform audio applications
with just a few lines of code. Snack has commands for basic sound
handling, such as playback, recording, file and socket I/O. Snack also
provides primitives for sound visualization, e.g. waveforms and
spectrograms. It was developed mainly to handle digital recordings of
speech, but is just as useful for general audio. Snack has also
successfully been applied to other one-dimensional signals.
"""
Be sure to check out the examples.

Might be worth a look:
http://www.speech.kth.se/projects/speech_projects.html
http://www.speech.kth.se/cost250/

You might also have luck with Windows using Microsoft Speech SDK ( it
is huge ).
Combined with Python scrtiptng you can go far.

hth,
M.E.Farmer

Jul 19 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: MrCoder | last post by:
Hey guys, my first post on here so I'll just say "Hello everbody!" Ok heres my question for you lot. Is there a faster way to compare 1 byte array to another? This is my current code //...
8
by: Vincent | last post by:
has any one seen a program to compare mdbs'. I have ran into a few of them, but none seem to really do that job. Basically what I need to do is, take 2 access mdb's and check the differences...
11
by: Russ Green | last post by:
How does this: public TimeSpan Timeout { get { return timeout; } set { timeout = value; if(timeout < licenseTimeout) licenseTimeout = timeout; }
3
by: emari | last post by:
How can I with the TTS Control(Direct speech control) speak a paragraph with any of this Languages using the control properties?.I am not able to do it
1
by: Linda | last post by:
Hi, Is there a way to do a "text" (rather than "binary") compareison with the "like" operator, without changing the global "Option Compare" setting? I don't want to risk breaking many, many...
17
by: Mark A | last post by:
DB2 8.2 for Linux, FP 10 (also performs the same on DB2 8.2 for Windoes, FP 11). Using the SAMPLE database, tables EMP and EMLOYEE. In the followng stored procedure, 2 NULL columns (COMM) are...
2
by: Peter Anthony | last post by:
I have recently bought a Vista laptop to do speech recognition development using VS VC++ 2008 /CLI. It comes with 'Microsoft Anna' as it's only voice. I would like to add many voices to test my...
26
by: neha_chhatre | last post by:
can anybody tell me how to compare two float values say for example t and check are two variables declared float how to compare t and check please help me as soon as possible
1
by: Lambda | last post by:
I defined a class: class inverted_index { private: std::map<std::string, std::vector<size_t index; public: std::vector<size_tintersect(const std::vector<std::string>&); };
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.