Subject: Re: speaker phones and listening in reverbation From: Bradley Wood Libbey <gt1556a(at)PRISM.GATECH.EDU> Date: Thu, 26 Oct 2000 14:59:22 -0400
James and list, My dissertation research started with the question of speaker phones and how to improve the quality of speech on such telecommunication devices. I suspect three reasons for the decrease in intelligibility: reverberation, bandwidth of phone lines, and single channel as opposed to binaural, Nabelek(82) and Eisenberg (00). I suspect the tube analogy is fairly accurate, why aren't there better speaker phones? Most telephone lines are band limited single channel, this is simply status quo. As for the reverberation, signal processing techniques are just recently capable of removing echos when the original signal is known (the chips are very expensive), but not when the speech is unknown. In order to remove reverberation I first looked at traditional signal processing techniques to remove echoes, microphone arrays are costly, cepstral processing Bogert (63) looks promising but fails with a larger number of echoes, I also looked into some other techniques in the literature that had limited results. (I have references if interested.) What I finally came to was based on the ideas that I have been hearing in this thread of e-mails, that humans don't notice echoes when listening binaurally in a reverberant room. Perhaps binaural neurological processes were responsible for dereverberation. Could these processes be modeled? Or are we simply capable of picking up enough information to understand the speech and ignore the reverb? To investigate this possibility I moved away from speech quality and studied intelligibility in a way similar to how you described using your Walkman headphones, except I did use good microphones and tried to eliminate the frequency response of the measurement equipment. I looked at only reverberation, no additive noise, no competing speakers, and full bandwidth (60-22kHz). Work has already been done in this area, Nabelek (82), but to gather more knowledge I have done some studies that consider some differences of monaural and binaural listening to real and simulated reverberation considering interaural time difference, interaural level differences, and spectral weighting. What I've found so far is in agreement with Nabelek's findings, that the binaural speech intelligibility advantage without competing noise sources in reverberation is relatively small, < 5 % difference in intelligibility for normal hearing listeners. (I plan on presenting my results at the next ASA conference, and am in the process of writing up these results for publication, sorry they aren't done yet. I also intend to do some quality testing in the future.) At this point, for no competing noise sources I have NOT shown that binaural listening makes great improvements in intelligibility. Furthermore my experimental conditions have not shown that the pinna and interaural level differences have an effect on intelligibility. (see Bronkhorst (00) for some disagreement with my last statement.) Now back to the speaker phones, why do they sound so bad? and why are they less intelligible? I suspect that the decrease has a lot to do with reverberation (direct to reverberation ratios and all that) a little to do with binaural listening (when only one sound sources exist), and a lot to do with the bandwidth. The quality will be affected differently, Eisenberg (98). If speaker phones were full bandwidth, then in the absence of competing sound sources I don't see that a binaural phone would offer great improvements in intelligibility over a single channel phone, little more than a summation of an appropriately delayed version of each signal. (see Koenig (50) for disagreement on this point.) However the argument might not hold for reduced bandwidth. Is there any research that directly links single echo suppresion to reverberation suppression? Nabelek (89) and Bronkhorst (00) both have tested and suggested that the reverberation acts as a masker, an area ripe for research. I graduate in a year, anyone have a post-doc position? :) Brad Libbey Graduate Student, Georgia Institute of Technology REFERENCES Bogert, Bruce P., M. J. R. Healy, and John W. Tukey. (1963). "The Quefrency Analysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking." Proceedings of the Symposium on Time Series Analysis. Ed. Murray Rosenblatt. New York, NY: John Wiley and Sons. Bronkhorst, Adelbert W. (2000). "The cocktail part phenomenon: A review of research on speech intelligibility in multiple-talker conditions." Acustica, 86: 117-128. Eisenberg, Laurie S., D. D. Dirks, S. Takayanagi, and A. S. Martinez. (1998). "Subjective judgments of clarity and intelligibility for filtered stimuli with equivalent speech intelligibility index predictions." J. Speech, Language, Hearing Res., 41: 327-339 Eisenberg, Laurie S., Robert V. Shannon, Amy Schaefer Martinez, John Wygonski, and Arthur Boothroyd. (2000). "Speech recognition with reduced spectral cues as a function of age." J. Acoust. Soc. Am. 107: 2704-10. Koenig, W. (1950). "Subjective effects in binaural hearing." J. Acoust. Soc. Am., 22: 61-62. Nabelek, Anna K. Nabelek, Tomasz R. Letowski, and Frances Tucker. (1989). "Reverberant overlap- and self-masking in consonant identification." J. Acoust. Soc. Am., 86: 1259-1265. Nabelek, Anna K., and Pauline K. Robinson. (1982). "Monaural and Binaural Speech Perception in Reverberation for Listeners of Various Ages." J. Acoust Soc. Am. 71: 1242-1248.