473,398 Members | 2,368 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Speech!

Hi there!

I'm working on a simple script, and I was wondering if there is some way to use the Microsoft built-in speech SDKs (to voice, and text FROM voice)? I'd like to capture raw_input() data from my microphone whenever the program recognizes that something went through the microphone. I'd also like to have the program use the default text-to-voice synthesizer I have installed... currently Microsoft Mary.

Is this possible? If so, how might I go about doing it? Are there any simple modules with functions such as, "say()" or "listen()"? Thanks a lot!
Jul 2 '08 #1
4 3210
Hmm, okay. I figured out text to voice. Is there some way to do voice to text, though? I already have the Microsoft speech API, SAPI, but I don't know how to make that work with Python...
Jul 3 '08 #2
heiro
56
posting again with code tags
Jul 3 '08 #3
heiro
56
Expand|Select|Wrap|Line Numbers
  1. from win32com.client import constants
  2. import win32com.client
  3. import pythoncom
  4. class SpeechRecognition:
  5.     """ Initialize the speech recognition with the passed in list of words """
  6.     def __init__(self, wordsToAdd):
  7.         # For text-to-speech
  8.         self.speaker = win32com.client.Dispatch("SAPI.SpVoice")
  9.         # For speech recognition - first create a listener
  10.         self.listener = win32com.client.Dispatch("SAPI.SpSharedRecognizer")
  11.         # Then a recognition context
  12.         self.context = self.listener.CreateRecoContext()
  13.         # which has an associated grammar
  14.         self.grammar = self.context.CreateGrammar()
  15.         # Do not allow free word recognition - only command and control
  16.         # recognizing the words in the grammar only
  17.         self.grammar.DictationSetState(0)
  18.         # Create a new rule for the grammar, that is top level (so it begins
  19.         # a recognition) and dynamic (ie we can change it at runtime)
  20.         self.wordsRule = self.grammar.Rules.Add("wordsRule",
  21.                         constants.SRATopLevel + constants.SRADynamic, 0)
  22.         # Clear the rule (not necessary first time, but if we're changing it
  23.         # dynamically then it's useful)
  24.         self.wordsRule.Clear()
  25.         # And go through the list of words, adding each to the rule
  26.         [ self.wordsRule.InitialState.AddWordTransition(None, word) for word in wordsToAdd ]
  27.         # Set the wordsRule to be active
  28.         self.grammar.Rules.Commit()
  29.         self.grammar.CmdSetRuleState("wordsRule", 1)
  30.         # Commit the changes to the grammar
  31.         self.grammar.Rules.Commit()
  32.         # And add an event handler that's called back when recognition occurs
  33.         self.eventHandler = ContextEvents(self.context)
  34.         # Announce we've started
  35.         self.say("Started successfully")
  36.     def say(self, phrase):
  37.         self.speaker.Speak(phrase)
  38.  
  39. class ContextEvents(win32com.client.getevents("SAPI.SpSharedRecoContext")):
  40.     def OnRecognition(self, StreamNumber, StreamPosition, RecognitionType, Result):
  41.         newResult = win32com.client.Dispatch(Result)
  42.         print "You said: ",newResult.PhraseInfo.GetText()
  43. if __name__=='__main__':
  44.     wordsToAdd = [ "One", "Two", "Three", "Four" ]
  45.     speechReco = SpeechRecognition(wordsToAdd)
  46.     while 1:
  47.         pythoncom.PumpWaitingMessages()
  48.  
  49.  
##### for text to speech################
Expand|Select|Wrap|Line Numbers
  1.  
  2. import sys
  3. from win32com.client import constants
  4. import win32com.client
  5.  
  6. speaker = win32com.client.Dispatch("SAPI.SpVoice")
  7. while 1:
  8.    try:
  9.       s = raw_input('Type word or phrase: ')
  10.       speaker.Speak(s)
  11.    except:
  12.       if sys.exc_type is EOFError:
  13.          sys.exit()
  14.  
Jul 3 '08 #4
Thanks for your response:

However, that is not exactly what I want. For starters, I already got the text-to-voice working. As for voice-to-text, I already tried that example, but it doesn't work how I want it to. As far as my beginner eyes can see, there's no way to replicate raw_input() using it, where if I said, "This is a test, hello world", it would enter that.

Do you understand what I'm saying? If you can show me how to use that code example, that would be great, but I just want a raw_input() style function that enters data via voice-to-text.

Thanks!
Jul 3 '08 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: BrewskiAtBellSouth | last post by:
Is it my imagination or is the stuff in the SASDK overly complex? For example, let's say I'd like to have a one page web application where I enter a name into a text box and have the application...
5
by: Rod | last post by:
About two weeks ago I had an accident and have broken my left elbow and left wrist. For doing things like Word or e-mail (I use Outlook for) I have been using Microsoft's speech recognition and...
1
by: Colin | last post by:
Hello all... I'm working on a small app that uses SAPI with the widely mentioned SpeechLib interface... App is written in VC#... It speaks fine and I can trigger that event perfectly... ...
16
by: ShadowOfTheBeast | last post by:
Hi all, is there any one who have developed an application in c# using a text-speech engine using the Microsoft Speech SDK 5.1 especially using visual studio.net IDE 2003 (v7.1) I seem to get an...
0
by: Sateesh Kumar E C | last post by:
Dear all, I have developed a speech web application using Speech SDK(SASDK) with VisualStudio.NET on Win2003 environment. Users can access this speech web application by using SALT enabled...
1
by: Man From The Moon | last post by:
I'm interesting in coding a Windows app that can produce multiple audio voice effects, given input text. I've been doing some searching for SDKs/APIs, and this is what I've found: the Microsoft...
2
by: Onur | last post by:
Hi.All I'm working on a TTS application. It runs on my local pc (WindowXP pro) without any error. Microsoft Visual Studio .NET 2003, Microsoft .NET Framework SDK v1.1, Microsoft Speech...
1
by: Meena | last post by:
In our company we are trying to add speech recognition to our products. I downloaded the Speech Recognition engine. Now there is a component called Microsoft Direct Speech Recognition in VB.Net...
1
by: Hakan Fatih YILDIRIM | last post by:
Hi List! i am making a project and i want the text in speech mode so idownload a .net speech library from code project.But i have to make the speech in Turkish accent.How can i change the speech...
7
by: HardySpicer | last post by:
I am writing my own recogniser and synthesis software in VB .net. However, every time I get the syntheser to speek something the mic picks it up and thinks it is a command! It is quite bizzar - it...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.