Hello all
After doing extensive research on the web, I've decided to undertake a rather ambitious project. First I'll discuss my vision, then start posing questions on how to implement it
I want Microsoft Sam to sing me "Late Goodbye" by Poets of the Fall. I'll set it to the MIDI track of the song that I composed and it'll be a complete cover of the original song (lol.) No, seriously. I want to write an application with a fairly easy end-user interface (I've written so many interfaces before that I'm not worried about this part) which allows the user to "map out" lyrics and be able to customize them in the following manners
1. Change the pitch at which each syllable is sung. For this one, I will display a bar graph with the Y-axis being pitch converted to musical notes and the X-axis being the syllable number, and the user can just click at a point and have the corresponding bar set to the appropriate level. So if the melody of the song dictates that the word "headlights" should be pronounced with a C on the syllable "head" and a D# on the syllable "lights", it would require two clicks to set this and I'd be on my way to the next word
2. Insert musical rests, which are relative to the tempo. This should be fairly simple for me once I figure out how the underlying classes work
3. Change the octave whenever needed (I'll probably implement octave for each word, since I've never known a song which changed octave in the middle of a word...
4. Stretch the word over x milliseconds (I'd translate it in my program to beats relative to tempo, of course, and tempo "beats per minute" would be programmed, on the lowest level, in milliseconds as well.)
5. This feature is not necessary, especially since considering implementing it would probably require a large amount of code if it were even possible. In written music you can implement certain symbols to indicate that a note should be played (or sung or syncopated) very firmly, softly, fully, or discreetly, and some other things. Getting hard consonants to be enunciated would be one way to implement this for my singer, but again I don't even know if this is possible
Sounds great, right? Well, I haven't even gotten the first line of experimental code to do anything short of having Sam read text in his normal, monotone voice. I've looked at the Lexicon class but am unsure how to manipulate it. My questions for all of you will come in the form, "How do I manipulate X low-level functionality?" So, without further ado, here they are
A. How do I manipulate the overall speed at which a word is pronounced? (I'm looking for units like consonants per second or whatever... this would allow me to calculate the required number of consonants per second, based on the tempo and the note length, and "snap" the word into the optimized speed to get the word pronounced at the proper "pace" throughout the duration of the note.)
B. How do I manipulate the inflection of individual syllables in a word (or characters even), and can I set this on-the-fly for different instances of the same word?
C. What are the units that the TTS engine uses for all this, i.e., does it work in Hertz for tone and syllables per second in speed, or if not, what does it use
D. What classes and functional groups (related functions/properties) do I specifically need to get started in this direction, if I can even approach this program at all
Any development on this project would be open-source. If I get even the first requirement of my project (modifying tone/inflection on a syllable basis) up in a form, I will create a sourceforge project for it to indicate my progress and attempt to get some more support to flesh out the features
I may even release an mp3 on my website with me and Sam singing a duet of Sting's "Desert Rose" - myself sitting in for Anoushka Shankar, of course. OH NO, I'm not releasing an Arabic version. Not unless this becomes REALLY popular and worthwhile.
Or, perhaps I am asking too much of Dr. Sam and his maker(s)...
I've written .NET and 6.0 programs in C++, J# and Visual Basic. I could implement this in any of these, depending on whether the required functionality is available in the language in question. Any feedback is better than no feedback
Sean