Dept. of Elec. and Comput. Eng., Univ. Waterloo, Waterloo, ON N2L 3G1, Canada
In this paper temporal aspects of auditory representation for several classes of speech sounds are studied. A modeling approach is taken in which a composite auditory model is used to generate paralleled sets of PST histograms along the spatial dimension, followed by a processing stage that constructs from the PSTHs the interval statistics in a form called inter-peak interval histogram (IPIH). Given the PSTH data from the auditory nerves in response to the identical speech stimuli, the goodness of fit between the computer-simulated IPIHs and the neural IPIHs is usable as a criterion to investigate the functional significance of the cochlear model's parameters. In particular, it was found that no satisfactory neural match can be obtained when the parameters controlling the bandwidths and the nonlinear damping elements of the basilar-membrane filters are set inappropriately. Using the ``optimally'' tuned model parameters according to the neural-matching criterion, sequences of IPIHs were generated from the auditory model's response to natural utterances (TIMIT) encompassing all major manner classes of American English: vowel, diphthong, semivowel, nasal, fricative, stop, and aspiration. Analysis of the results allows one to clearly identify major acoustic properties of the utterances from the simulated IPIHs. Further, when the IPIHs are constructed from the auditory model's response to noisy version of the utterances, it is observed that the original acoustic cues are affected little by the noise level.