compare two voices

Qiangning Hong

I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

Jul 19 '05 #1

Subscribe Reply

7768

Jeremy Bowers

On Sat, 30 Apr 2005 20:00:57 -0700, Qiangning Hong wrote:

I want to make an app to help students study foreign language. I want the
following function in it:

The student reads a piece of text to the microphone. The software records
it and compares it to the wave-file pre-recorded by the teacher, and gives
out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Do you have any idea what it takes to compare two voices in a
*meaningful* fashion? This is a serious question. I can't guarantee
there is no app to help with this, but if it does exist, it either costs a
lot of money, or will be almost impossible to use for what you want
(boiling two voice samples down to a speaker-independent single similarity
number... the mind boggles at the possible number of ways of defining that).
Quite possibly both.

If you *do* know something about the math, which, by the way, is graduate
level+, then you'd do better to go look at the open source voice
recognition systems and ask on those mailing lists.

No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good PhD
thesis. I have no idea if it has already been done and you will likely get
much better help from such a community where people might know that. I am
aware of the CMU Sphinx project, which should get you started Googling.
Good luck; it's a great idea, but if somebody somewhere hasn't already
done it, it's an extremely tough one.

(Theoretically, it's probably not a horrid problem, but my intuition leads
me to believe that turning it into a *useful product*, that corresponds to
what humans would say is "similar", will probably be a practical
nightmare. Plus it'll be highly language dependent; a similarity algorithm
for Chinese probably won't work very well for English and vice versa. All
this, and you *could* just play the two sounds back to the human and let
their brain try to understand it... ;-) )

Waiting for the message pointing to the Sourceforge project that
implemented this three years ago...

Jul 19 '05 #2

Qiangning Hong

Jeremy Bowers wrote:

No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good
PhD thesis.

No, my goal is nothing relative to voice recognition. Sorry that I
haven't described my question clearly. We are not teaching English, so
the voice recognition isn't helpful here.

I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".

Yes, this not a Python problem, but I am a fan of Python and using
Python to develop the other parts of the application (UI, sound play
and record, grammer training, etc), so I ask here for available python
module, and of cause, for any kindly suggestions unrelative to Python
itself (like yours) too.

I myself have tried using Python's standard audioop module, using the
findfactor and rms functions. I try to use the value returned from
rms(add(a, mul(b, -findfactor(a, b)))) as the score. But the result is
not good. So I want to know if there is a human-voice optimized
algorithm/library out there.

Jul 19 '05 #3

Andrew Dalke

> Jeremy Bowers wrote:

No matter how you slice it, this is not a Python problem, this is an
intense voice recognition algorithm problem that would make a good
PhD thesis.

Qiangning Hong wrote: No, my goal is nothing relative to voice recognition. Sorry that I
haven't described my question clearly. We are not teaching English, so
the voice recognition isn't helpful here.
To repeat what Jeremy wrote - what you are asking *is* relative
to voice recognition. You want to recognize that two different voices,
with different pitches, pauses, etc., said the same thing.

There is a lot of data in speech. That's why sound files are bigger
than text files. Some of it gets interpreted as emotional nuances,
or as an accent, while others are simply ignored.
I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".

Try this: record the word twice and overlay them. They will be
different. And that's with the same speaker. Now try it with your
voice compared with another's. You can hear just how different they
are. One will be longer, another deeper, or with the "o" sound
originating in a different part of the mouth.

At the level you are working on the computer doesn't know which of
the data can be ignored. It doesn't know how to find the start
of the word (as when a student says "ummm, good"). It doesn't know
how to stretch the timings, nor adjust for pitch between, say,
a man and a woman's voice.

My ex-girlfriend gave me a computer program for learning Swedish.
It included a program to do a simpler version of what you are
asking. It only compared phonemes, so I could practice the vowels.
Even then it's comparison seemed more like a random value than
meaningful.

Again, as Jeremy said, you want something harder than what
speech recognition programs do. They at least are trained
to understand a given speaker, which helps improve the quality
of the recognition. You don't want that -- that's the
opposite of what you're trying to do. Speaker-independent
voice recognition is harder than speaker-dependent.

You can implement a solution on the lines you were thinking of
but as you found it doesn't work. A workable solution will
require good speech recognition capability and is still very
much in the research stage (as far as I know; it's not my
field).

If your target language is a major one then there may be some
commercial language recognition software you can use. You
could have your reference speaker train the software on the
vocabulary list and have your students try to have the software
recognize the correct word.

If your word list is too short or recognizer not set well
enough then saying something like "thud" will also be
recognized as being close enough to "good".

Why don't you just have the students hear both the
teachers voice and the student's just recorded voice, one
right after the other? That gives feedback. Why does
the computer need to judge the correctness?

Andrew
da***@dalkescie ntific.com

Jul 19 '05 #4

Kent Johnson

Qiangning Hong wrote:

I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

I have worked on a commercial product that attempts to do this and I will confirm that it is very
difficult to create a meaningful score.

Kent

Jul 19 '05 #5

François Pinard

[Qiangning Hong]

I just want to compare two sound WAVE file, not what the students or
the teacher really saying. For example, if the teacher recorded his
"standard" pronouncation of "god", then the student saying "good" will
get a higher score than the student saying "evil" ---- because "good"
sounds more like "god".
If I had this problem and was alone, I would likely create one audiogram
(I mean a spectral frequency analysis over time) for each voice sample.
Normally, this is presented with frequency on the vertical axis, time on
the horizontal axis, and gray value for frequency amplitude. There are
a few tools available for doing this, yet integrating them in another
application may require some work.

Now, because of voice pitch differences and elocution speed, the
audiograms would somehow look alike, yet scaled differently in both
directions. The problem you now have is to recognise that an image is
"similar" to part of another, so here, I would likely do some research
on various transforms (like Hough's and any other of the same kind) that
might ease normalisation prior to comparison. Image classification
techniques (they do this a lot in satellite imagery) for recognizing
similar textures in audiograms, and so, clues for matching images. A
few image classification programs which have been previously announced
here, I did not look at them yet, but who knows, they may be helpful.

Then, if the above work is done correctly and meaningfully, you now want
to compute correlations between normalised audiograms. More correlated
they are, more likely the original pronunciation were.

Now, if I had this problem and could call friends, I would surely phone
one or two of them, who work at companies offering voice recognition
devices or services. They will be likely reluctant at sharing advanced
algorithms, as these give them industrial advantage over competitors.
I try to use the value returned from rms(add(a, mul(b, -findfactor(a,
b)))) as the score. But the result is not good.

Oh, absolutely no chance that such a simple thing would ever work. :-)

--
François Pinard http://pinard.progiciels-bpi.ca

Jul 19 '05 #6

Robert Oschler

"Qiangning Hong" <ho****@gmail.c om> wrote in message
news:11******** *************@f 14g2000cwb.goog legroups.com...

I want to make an app to help students study foreign language. I want
the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.

This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

How about another approach?

All modern speech recognition systems employ a phonetic alphabet. It's how
you describe to the speech recognition engine exactly how the word sounds.

For each sentence read, you create a small recognition context that includes
the sentence itself, AND subtle variations of the sentence phonetically.

For example (using English):

You want them to say correctly: "The weather is good today".

You create a context with the following phrases which include the original
sentence, and then alternative sentences that dithers (varies) the original
sentence phonetically. Sample context:

(*) The weather is good today
Da wedder is god tuday
The weether is good towday

Etc.

Then submit the context to the speech recognition engine and ask the user to
say the sentences. If the original sentence (*) comes back as the speech
recognition engine's best choice, then they said it right. If one of the
other choices comes back, then they made a mistake.

You could even "grade" their performance by tagging the variations by
closeness to the original, for example:

(*) The weather is good today (100)
Da wedder is god tuday (80)
Ta wegger es gid towday (50)

In the example above, the original sentence gets a 100, the second choice
which is close gets an 80, and the last option which is pretty bad gets 50.
With a little effort you could automatically create the "dithered" phonetic
variations and auto-calculate the score or closeness to original too.

Thanks,
Robert
http://www.robodance.com
Robosapien Dance Machine - SourceForge project

Jul 19 '05 #7

M.E.Farmer

Qiangning Hong wrote:

I want to make an app to help students study foreign language. I want the following function in it:

The student reads a piece of text to the microphone. The software
records it and compares it to the wave-file pre-recorded by the
teacher, and gives out a score to indicate the similarity between them.
This function will help the students pronounce properly, I think.

Is there an existing library (C or Python) to do this? Or if someone
can guide me to a ready-to-implement algorithm?

As others have noted this is a difficult problem.
This library was developed to study speech and should be worth a look:
http://www.speech.kth.se/snack/
"""
Using Snack you can create powerful multi-platform audio applications
with just a few lines of code. Snack has commands for basic sound
handling, such as playback, recording, file and socket I/O. Snack also
provides primitives for sound visualization, e.g. waveforms and
spectrograms. It was developed mainly to handle digital recordings of
speech, but is just as useful for general audio. Snack has also
successfully been applied to other one-dimensional signals.
"""
Be sure to check out the examples.

Might be worth a look:
http://www.speech.kth.se/projects/speech_projects.html
http://www.speech.kth.se/cost250/

You might also have luck with Windows using Microsoft Speech SDK ( it
is huge ).
Combined with Python scrtiptng you can go far.

hth,
M.E.Farmer

Jul 19 '05 #8

Similar topics

20675

Byte array compare - Speed

by: MrCoder | last post by:

Hey guys, my first post on here so I'll just say "Hello everbody!" Ok heres my question for you lot. Is there a faster way to compare 1 byte array to another? This is my current code // Check for a match

C / C++

7137

Compare 2 mdbs

by: Vincent | last post by:

has any one seen a program to compare mdbs'. I have ran into a few of them, but none seem to really do that job. Basically what I need to do is, take 2 access mdb's and check the differences between the 2. i am talking about tables, forms, queries, the whole ball of wax. Most of the programs jus do tables, that is the easy part. Also I was wondering how you would compate text fiels progammatically and kick back a report using...

Microsoft Access / VBA

9731

Compare Timespan Values

by: Russ Green | last post by:

How does this: public TimeSpan Timeout { get { return timeout; } set { timeout = value; if(timeout < licenseTimeout) licenseTimeout = timeout; }

C# / C Sharp

4664

Getting L&H voices in VB.NET

by: emari | last post by:

How can I with the TTS Control(Direct speech control) speak a paragraph with any of this Languages using the control properties?.I am not able to do it

Visual Basic .NET

3526

A way to get "Like" to use "Option Compare Text" without changing the setting for all my code?

by: Linda | last post by:

Hi, Is there a way to do a "text" (rather than "binary") compareison with the "like" operator, without changing the global "Option Compare" setting? I don't want to risk breaking many, many lines of functional code just to get one "like" operation to behave as I wish. I want to check whether a single-character string is (a letter or number, INCLUDING diacritical letters) or whether it is (something else.)

Visual Basic .NET

4539

Variables in SP do not compare as equal when both are NULL

by: Mark A | last post by:

DB2 8.2 for Linux, FP 10 (also performs the same on DB2 8.2 for Windoes, FP 11). Using the SAMPLE database, tables EMP and EMLOYEE. In the followng stored procedure, 2 NULL columns (COMM) are selected into 2 different SP variables and compared for equal. They are both NULL, but do not compare as equal. When the Not NULL columns (SALARY) are compared, they do compare as equal.

DB2 Database

5524

Microsoft Lili (and other voices)

by: Peter Anthony | last post by:

I have recently bought a Vista laptop to do speech recognition development using VS VC++ 2008 /CLI. It comes with 'Microsoft Anna' as it's only voice. I would like to add many voices to test my software. I have a menu in my application to choice a text-to-speech voice, but hard to tell if it works when there is only one voice to choose from! For example, I want 'Microsoft Lili' and 'Microsoft Sam'. But I have been asking for WEEKS in...

.NET Framework

7145

compare two float values

by: neha_chhatre | last post by:

can anybody tell me how to compare two float values say for example t and check are two variables declared float how to compare t and check please help me as soon as possible

C / C++

2491

How to pass a third argument to compare function?

by: Lambda | last post by:

I defined a class: class inverted_index { private: std::map<std::string, std::vector<size_t index; public: std::vector<size_tintersect(const std::vector<std::string>&); };

C / C++

9706

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10580

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10323

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10082

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7621

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6854

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5525

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5652

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4301

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp