[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

looking for help extracting information about the temporal properties of some stimuli

Hello all,

I am currently seeking advice and/or suggestions for possible acoustic analyses that could predict performance on subjects? ability to correctly decide whether the amplitude envelope of an original signal, when modulating a white-noise carrier, was originally speech or not. In a pilot study (the results of which were presented at the 147th meeting of the Acoustical Society of America, New York, New York, 2004 [2aPP14.]), subjects were presented with the amplitude envelopes of 30 two-second segments of conversational speech (French, German, Hebrew, Hindi, Japanese, and Russian), and 30 two-second segments of non-speech events (e.g., animal calls, ambient forest sounds, thunder, etc.). Their task was simply to identify the signal as speech or not, and rate their confidence in their decision. A signal-processing algorithm was adapted from previous research that investigated minimal spectral resolution required to understand speech (Dorman et al., 1997). Each original sound was filtered into one frequency band (300 ? 5500) Hz using a 6th order Butterworth filter. The energy envelope was obtained by, first, half-wave rectifying the signal, and, then, low-pass filtering it with a 2nd order Butterworth filter with a 50 Hz frequency cutoff. Next, the envelope was excited with white noise, and, filtered using the same band-filter settings as had been used to obtain the same frequency band from the original signal. Finally, the signal was low-pass filtered at 5000 Hz with a 6th order elliptic filter. (All processed stimuli were then listened to by the PI, faculty members, and fellow students to ensure there remained no frequency information within any of the stimuli to provide clues to each stimulus?s origin).  It was hypothesized that because sound and biphasic mandibular oscillation (MacNeilage, 1998) are universal to speech production, the amplitude modulatory pattern of vocal communication should provide sufficient information to listeners that a specific sound is speech. Listeners were able to correctly identify these stimuli 84% of the time, with an overall D-prime of 1.99, and a Spearman rank order correlation between the overall median confidence ratings per stimulus and the percentage of time each stimulus was judged to be speech of .95, suggesting that subjects were able to follow directions, stay on task, and the hypothesis, in general, is correct.  We now hope to correlate acoustic parameter(s) of these signals with the behavioral data they elicited, so that in future studies using this paradigm, my proposed dissertation, behavioral performance can be predicted prior to the actual presentation of these stimuli to novel listeners. Any input, help, advice, and/or suggestions would be much appreciated!

Robert J. Lehnhoff Jr.
Speech and Hearing Sciences Department
City University of New York,
New York, New York 10016