Speech Technology Laboratory, Panasonic Technologies, Inc., Santa Barbara, CA 93105
This paper investigates the improvement of intelligibility in a formant synthesizer by using a library of sampled consonants extracted from natural waveforms. Four hundred and seventy-one monosyllable words containing consonants in all possible vowel environments from a male American English speaker were recorded. The current focus is on the voiceless consonants. A comparison test was conducted between our current synthesizer [K. Matsui et al., Proc. ICASSP 2, 769--772 (1991)] and the sampled consonant system. Eight naive listeners participated in a simple intelligibility test. The stimuli list consisted of 110 tokens, 60% of which were nonsense words. The results showed that the sampled consonant system was significantly higher (by 20%) in overall intelligibility score. In terms of consonant classes, initial stops showed the most improvement with a 26% increase. Weak fricatives did not show a dramatic difference between the two systems. In further experiments these were improved utilizing additional factors such as modification of the spectral shape, manipulation of transitional cues into vowels and durational adjustments. The relative importance of each factor will be discussed.