D. D. Paschall
Peter F. Assmann
Human Devel. and Commun. Sci., Univ. of Texas at Dallas, Box 830688, GR4.1, Richardson, TX 75083
This study compared the abilities of two computational models to predict the identification responses of listeners to vowels presented in quiet and with pink-noise maskers at signal-to-noise ratios (SNRs) of -6, -3, 0, +3, and +6 dB. One model used a representation derived from cepstral analysis. The other used a representation derived from a model of peripheral auditory analysis which (i) filtered the stimulus with a bank of bandpass filters; (ii) compressed the filtered waveforms using a simulation of hair cell transduction; (iii) computed the short-term autocorrelation function of the compressed, filtered waveform in each channel; and (iv) summed the autocorrelation functions across filter channels to generate a pooled autocorrelation function (PACF). Classification was performed by discriminant function analysis, using noise-free tokens as input to the classifier during the training phase. Performance using the cepstrum-based model was better than human listeners in quiet, but deteriorated rapidly with the audition of background noise. In contrast, the PACF-based model performed at levels similar to listeners in quiet, and mirrored their gradual decline in accuracy with decreasing SNR.