compare two voices

Qiangning Hong

I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

Jul 19 '05 #1

Subscribe Post Reply

7714

Jeremy Bowers

On Sat, 30 Apr 2005 20:00:57 -0700, Qiangning Hong wrote:

I want to make an app to help students study foreign language. I want the
following function in it:

The student reads a piece of text to the microphone. The software records
it and compares it to the wave-file pre-recorded by the teacher, and gives
out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Do you have any idea what it takes to compare two voices in a
*meaningful* fashion? This is a serious question. I can't guarantee
there is no app to help with this, but if it does exist, it either costs a
lot of money, or will be almost impossible to use for what you want
(boiling two voice samples down to a speaker-independent single similarity
number... the mind boggles at the possible number of ways of defining that).
Quite possibly both.

If you *do* know something about the math, which, by the way, is graduate
level+, then you'd do better to go look at the open source voice
recognition systems and ask on those mailing lists.

No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good PhD
thesis. I have no idea if it has already been done and you will likely get
much better help from such a community where people might know that. I am
aware of the CMU Sphinx project, which should get you started Googling.
Good luck; it's a great idea, but if somebody somewhere hasn't already
done it, it's an extremely tough one.

(Theoretically, it's probably not a horrid problem, but my intuition leads
me to believe that turning it into a *useful product*, that corresponds to
what humans would say is "similar", will probably be a practical
nightmare. Plus it'll be highly language dependent; a similarity algorithm
for Chinese probably won't work very well for English and vice versa. All
this, and you *could* just play the two sounds back to the human and let
their brain try to understand it... ;-) )

Waiting for the message pointing to the Sourceforge project that
implemented this three years ago...

Jul 19 '05 #2

Qiangning Hong

Jeremy Bowers wrote:

No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good
PhD thesis.

No, my goal is nothing relative to voice recognition. Sorry that I
haven't described my question clearly. We are not teaching English, so
the voice recognition isn't helpful here.

I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".

Yes, this not a Python problem, but I am a fan of Python and using
Python to develop the other parts of the application (UI, sound play
and record, grammer training, etc), so I ask here for available python
module, and of cause, for any kindly suggestions unrelative to Python
itself (like yours) too.

I myself have tried using Python's standard audioop module, using the
findfactor and rms functions. I try to use the value returned from
rms(add(a, mul(b, -findfactor(a, b)))) as the score. But the result is
not good. So I want to know if there is a human-voice optimized
algorithm/library out there.

Jul 19 '05 #3

Andrew Dalke

> Jeremy Bowers wrote:

No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good
PhD thesis.

Qiangning Hong wrote: No, my goal is nothing relative to voice recognition. Sorry that I
haven't described my question clearly. We are not teaching English, so
the voice recognition isn't helpful here.
To repeat what Jeremy wrote - what you are asking *is* relative
to voice recognition. You want to recognize that two different voices,
with different pitches, pauses, etc., said the same thing.

There is a lot of data in speech. That's why sound files are bigger
than text files. Some of it gets interpreted as emotional nuances,
or as an accent, while others are simply ignored.
I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".

Try this: record the word twice and overlay them. They will be
different. And that's with the same speaker. Now try it with your
voice compared with another's. You can hear just how different they
are. One will be longer, another deeper, or with the "o" sound
originating in a different part of the mouth.

At the level you are working on the computer doesn't know which of
the data can be ignored. It doesn't know how to find the start
of the word (as when a student says "ummm, good"). It doesn't know
how to stretch the timings, nor adjust for pitch between, say,
a man and a woman's voice.

My ex-girlfriend gave me a computer program for learning Swedish.
It included a program to do a simpler version of what you are
asking. It only compared phonemes, so I could practice the vowels.
Even then it's comparison seemed more like a random value than
meaningful.

Again, as Jeremy said, you want something harder than what
speech recognition programs do. They at least are trained
to understand a given speaker, which helps improve the quality
of the recognition. You don't want that -- that's the
opposite of what you're trying to do. Speaker-independent
voice recognition is harder than speaker-dependent.

You can implement a solution on the lines you were thinking of
but as you found it doesn't work. A workable solution will
require good speech recognition capability and is still very
much in the research stage (as far as I know; it's not my
field).

If your target language is a major one then there may be some
commercial language recognition software you can use. You
could have your reference speaker train the software on the
vocabulary list and have your students try to have the software
recognize the correct word.

If your word list is too short or recognizer not set well
enough then saying something like "thud" will also be
recognized as being close enough to "good".

Why don't you just have the students hear both the
teachers voice and the student's just recorded voice, one
right after the other? That gives feedback. Why does
the computer need to judge the correctness?

Andrew
da***@dalkescientific.com

Jul 19 '05 #4

Kent Johnson

Qiangning Hong wrote:

I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

I have worked on a commercial product that attempts to do this and I will confirm that it is very
difficult to create a meaningful score.

Kent

Jul 19 '05 #5

François Pinard

[Qiangning Hong]

I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".
If I had this problem and was alone, I would likely create one audiogram
(I mean a spectral frequency analysis over time) for each voice sample.
Normally, this is presented with frequency on the vertical axis, time on
the horizontal axis, and gray value for frequency amplitude. There are
a few tools available for doing this, yet integrating them in another
application may require some work.

Now, because of voice pitch differences and elocution speed, the
audiograms would somehow look alike, yet scaled differently in both
directions. The problem you now have is to recognise that an image is
"similar" to part of another, so here, I would likely do some research
on various transforms (like Hough's and any other of the same kind) that
might ease normalisation prior to comparison. Image classification
techniques (they do this a lot in satellite imagery) for recognizing
similar textures in audiograms, and so, clues for matching images. A
few image classification programs which have been previously announced
here, I did not look at them yet, but who knows, they may be helpful.

Then, if the above work is done correctly and meaningfully, you now want
to compute correlations between normalised audiograms. More correlated
they are, more likely the original pronunciation were.

Now, if I had this problem and could call friends, I would surely phone
one or two of them, who work at companies offering voice recognition
devices or services. They will be likely reluctant at sharing advanced
algorithms, as these give them industrial advantage over competitors.
I try to use the value returned from rms(add(a, mul(b, -findfactor(a,
b)))) as the score. But the result is not good.

Oh, absolutely no chance that such a simple thing would ever work. :-)

--
François Pinard http://pinard.progiciels-bpi.ca

Jul 19 '05 #6

Robert Oschler

"Qiangning Hong" <ho****@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...

I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

How about another approach?

All modern speech recognition systems employ a phonetic alphabet. It's how
you describe to the speech recognition engine exactly how the word sounds.

For each sentence read, you create a small recognition context that includes
the sentence itself, AND subtle variations of the sentence phonetically.

For example (using English):

You want them to say correctly: "The weather is good today".

You create a context with the following phrases which include the original
sentence, and then alternative sentences that dithers (varies) the original
sentence phonetically. Sample context:

(*) The weather is good today
Da wedder is god tuday
The weether is good towday

Etc.

Then submit the context to the speech recognition engine and ask the user to
say the sentences. If the original sentence (*) comes back as the speech
recognition engine's best choice, then they said it right. If one of the
other choices comes back, then they made a mistake.

You could even "grade" their performance by tagging the variations by
closeness to the original, for example:

(*) The weather is good today (100)
Da wedder is god tuday (80)
Ta wegger es gid towday (50)

In the example above, the original sentence gets a 100, the second choice
which is close gets an 80, and the last option which is pretty bad gets 50.
With a little effort you could automatically create the "dithered" phonetic
variations and auto-calculate the score or closeness to original too.

Thanks,
Robert
http://www.robodance.com
Robosapien Dance Machine - SourceForge project

Jul 19 '05 #7

M.E.Farmer

Qiangning Hong wrote:

I want to make an app to help students study foreign language. I want the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.
This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

As others have noted this is a difficult problem.
This library was developed to study speech and should be worth a look:
http://www.speech.kth.se/snack/
"""
Using Snack you can create powerful multi-platform audio applications
with just a few lines of code. Snack has commands for basic sound
handling, such as playback, recording, file and socket I/O. Snack also
provides primitives for sound visualization, e.g. waveforms and
spectrograms. It was developed mainly to handle digital recordings of
speech, but is just as useful for general audio. Snack has also
successfully been applied to other one-dimensional signals.
"""
Be sure to check out the examples.

Might be worth a look:
http://www.speech.kth.se/projects/speech_projects.html
http://www.speech.kth.se/cost250/

You might also have luck with Windows using Microsoft Speech SDK ( it
is huge ).
Combined with Python scrtiptng you can go far.

hth,
M.E.Farmer

Jul 19 '05 #8

Similar topics

Byte array compare - Speed

by: MrCoder | last post by:

Hey guys, my first post on here so I'll just say "Hello everbody!" Ok heres my question for you lot. Is there a faster way to compare 1 byte array to another? This is my current code //...

C / C++

Compare 2 mdbs

by: Vincent | last post by:

has any one seen a program to compare mdbs'. I have ran into a few of them, but none seem to really do that job. Basically what I need to do is, take 2 access mdb's and check the differences...

Microsoft Access / VBA

Compare Timespan Values

by: Russ Green | last post by:

How does this: public TimeSpan Timeout { get { return timeout; } set { timeout = value; if(timeout < licenseTimeout) licenseTimeout = timeout; }

C# / C Sharp

Getting L&H voices in VB.NET

by: emari | last post by:

How can I with the TTS Control(Direct speech control) speak a paragraph with any of this Languages using the control properties?.I am not able to do it

Visual Basic .NET

A way to get "Like" to use "Option Compare Text" without changing the setting for all my code?

by: Linda | last post by:

Hi, Is there a way to do a "text" (rather than "binary") compareison with the "like" operator, without changing the global "Option Compare" setting? I don't want to risk breaking many, many...

Visual Basic .NET

Variables in SP do not compare as equal when both are NULL

by: Mark A | last post by:

DB2 8.2 for Linux, FP 10 (also performs the same on DB2 8.2 for Windoes, FP 11). Using the SAMPLE database, tables EMP and EMLOYEE. In the followng stored procedure, 2 NULL columns (COMM) are...

DB2 Database

Microsoft Lili (and other voices)

by: Peter Anthony | last post by:

I have recently bought a Vista laptop to do speech recognition development using VS VC++ 2008 /CLI. It comes with 'Microsoft Anna' as it's only voice. I would like to add many voices to test my...

.NET Framework

compare two float values

by: neha_chhatre | last post by:

can anybody tell me how to compare two float values say for example t and check are two variables declared float how to compare t and check please help me as soon as possible

C / C++

How to pass a third argument to compare function?

by: Lambda | last post by:

I defined a class: class inverted_index { private: std::map<std::string, std::vector<size_t index; public: std::vector<size_tintersect(const std::vector<std::string>&); };

C / C++

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++