Speech synthesis is usually about converting text to speech. This synth starts with gesture instead. I was curious about the different expressive possibilities that would come out of this, and I also wanted to make an electronic instrument that would make for an interesting live performance.
Expression before Language
Rather than typing or otherwise encoding text, the way you control speech with this synth is by imitating the shape made by a throat when speaking. When you lay your hand onto the controller, the tips of your fingers correspond to your mouth and your forearm corresponds to the back of your throat. The more you press down, the more you open that part of the throat. Currently, this synth can only control vowel sounds, but it does so fluidly; you don't choose between one vowel or another, but you move fluidly between them through different gestures. This allows for more variation in inflection and emphasis.
The controller uses a PIC microchip that communicates with Max/MSP through serial. The Max patch that I wrote is based on research by Brad Story dealing with vocal tract shapes and voice quality (See some of his work here). The Max patch acts as a filter, taking sound input and altering it to sound like it's being passed through a simulated vocal tract with a shape determined by the hardware input.
Here's a quick demo video of me playing around with it a bit.
- Story, Brad H. and Titze, Ingo R., "A preliminary study of voice quality transformation based on modifications to the neural vocal tract area function." Journal of Phonetics 30 (2002): 485-509.
- Story, Brad H. and Titze, Ingo R. and Hoffman, Eric A., "The relationship of vocal tract shape to three voice qualities." Journal of the Acoustical Society of America 109.4 (April 2001): 1651-1667.