Abstract
In building personalized synthetic voices for people with speech disorders, the output should capture the individual's vocal identity. This paper reports a listener judgment experiment on the similarity of Hidden Markov Model based synthetic voices using varying amounts of adaptation data to two non-impaired speakers. We conclude that around 100 sentences of data is needed to build a voice that retains the characteristics of the target speaker but using more data improves the voice. Experiments using Multi-Layer Perceptrons (MLPs) are conducted to find which acoustic features contribute to the similarity judgments. Results show that melcepstral distortion and fraction of voicing agreement contribute most to replicating the similarity judgment but the combination of all features is required for accurate prediction. Ongoing work applies the findings to voice building for people with impaired speech.
Original language | English |
---|---|
Title of host publication | 10th annual conference of the International Speech Communication Association INTERSPEECH 2009, Brighton, UK. |
Publication status | Published - 2009 |