473,396 Members | 1,786 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

A Vision: TTS+Ingenuity = Singing Sam!

Hello all

After doing extensive research on the web, I've decided to undertake a rather ambitious project. First I'll discuss my vision, then start posing questions on how to implement it

I want Microsoft Sam to sing me "Late Goodbye" by Poets of the Fall. I'll set it to the MIDI track of the song that I composed and it'll be a complete cover of the original song (lol.) No, seriously. I want to write an application with a fairly easy end-user interface (I've written so many interfaces before that I'm not worried about this part) which allows the user to "map out" lyrics and be able to customize them in the following manners
1. Change the pitch at which each syllable is sung. For this one, I will display a bar graph with the Y-axis being pitch converted to musical notes and the X-axis being the syllable number, and the user can just click at a point and have the corresponding bar set to the appropriate level. So if the melody of the song dictates that the word "headlights" should be pronounced with a C on the syllable "head" and a D# on the syllable "lights", it would require two clicks to set this and I'd be on my way to the next word
2. Insert musical rests, which are relative to the tempo. This should be fairly simple for me once I figure out how the underlying classes work
3. Change the octave whenever needed (I'll probably implement octave for each word, since I've never known a song which changed octave in the middle of a word...
4. Stretch the word over x milliseconds (I'd translate it in my program to beats relative to tempo, of course, and tempo "beats per minute" would be programmed, on the lowest level, in milliseconds as well.)
5. This feature is not necessary, especially since considering implementing it would probably require a large amount of code if it were even possible. In written music you can implement certain symbols to indicate that a note should be played (or sung or syncopated) very firmly, softly, fully, or discreetly, and some other things. Getting hard consonants to be enunciated would be one way to implement this for my singer, but again I don't even know if this is possible

Sounds great, right? Well, I haven't even gotten the first line of experimental code to do anything short of having Sam read text in his normal, monotone voice. I've looked at the Lexicon class but am unsure how to manipulate it. My questions for all of you will come in the form, "How do I manipulate X low-level functionality?" So, without further ado, here they are

A. How do I manipulate the overall speed at which a word is pronounced? (I'm looking for units like consonants per second or whatever... this would allow me to calculate the required number of consonants per second, based on the tempo and the note length, and "snap" the word into the optimized speed to get the word pronounced at the proper "pace" throughout the duration of the note.)

B. How do I manipulate the inflection of individual syllables in a word (or characters even), and can I set this on-the-fly for different instances of the same word?

C. What are the units that the TTS engine uses for all this, i.e., does it work in Hertz for tone and syllables per second in speed, or if not, what does it use

D. What classes and functional groups (related functions/properties) do I specifically need to get started in this direction, if I can even approach this program at all

Any development on this project would be open-source. If I get even the first requirement of my project (modifying tone/inflection on a syllable basis) up in a form, I will create a sourceforge project for it to indicate my progress and attempt to get some more support to flesh out the features

I may even release an mp3 on my website with me and Sam singing a duet of Sting's "Desert Rose" - myself sitting in for Anoushka Shankar, of course. OH NO, I'm not releasing an Arabic version. Not unless this becomes REALLY popular and worthwhile.

Or, perhaps I am asking too much of Dr. Sam and his maker(s)...

I've written .NET and 6.0 programs in C++, J# and Visual Basic. I could implement this in any of these, depending on whether the required functionality is available in the language in question. Any feedback is better than no feedback

Sean
Jul 21 '05 #1
1 1368
Hi,

I am wondering if you had any success in making Sam sing?
If you have (And hopfully in VB) then would you mind sharing your findings?

Its a challange to get tts to sing but you mif\ght want to check out this company called Flex Voice. The have a pretty good tts engine that runs under sapi 5 and you can control the rate , pitch etc..

I've got a copy if you want to evaluate it.

I look forward to your reply.

Don
Jul 3 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Chris | last post by:
Hi, I am running VB6 under windows XP Professional.. I added the 'Microsoft Direct Text-To-Speech' component, named it spkSpeak and added the following code in Form(Load): spkSpeak.Speak "You...
44
by: Mariusz Jedrzejewski | last post by:
Hi, I'll be very grateful if somebody can explain me why my Opera 7.23 (runing under linux) doesn't show me inner tables. Using below code I can see only "inner table 1". There is no problem with...
17
by: MLH | last post by:
From time to time, I find myself cross checking one block of code against another. Usually older stuff in which I've made modifications that don't work. I don't remember what all the changes were...
1
by: zoneal | last post by:
i'm developing a text-to-speech for my system. i use the text-to-speech component from the .net. but the voice sometimes is mary and sometimes is sam. Is there any way to default the voice i use in...
0
by: Cameron Laird | last post by:
QOTW: "You can tell everything is well in the world of dynamic languages when someone posts a question with nuclear flame war potential like 'python vs. ruby' and after a while people go off...
0
by: Sean | last post by:
Hello all After doing extensive research on the web, I've decided to undertake a rather ambitious project. First I'll discuss my vision, then start posing questions on how to implement it I...
0
by: U S Contractors Offering Service A Non-profit | last post by:
Brilliant technology helping those most in need Inbox Reply U S Contractors Offering Service A Non-profit show details 10:37 pm (1 hour ago) Brilliant technology helping those most in need ...
0
by: comcdp | last post by:
Microsoft Text-To-Speech(TTS) voice engine has been installed by default on Windows 2000, XP, Vista, you can find what voice engines have been installed on the computer by following Control Panel-...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.