Recordings in enclosed spaces. (Brad Libbey )

Subject: Recordings in enclosed spaces. From: Brad Libbey <gt1556a(at)PRISM.GATECH.EDU> Date: Mon, 3 Sep 2001 17:55:20 -0400 Dear List, In regards to the question of recordings made from loudspeakers in enclosed spaces, I would like to answer some questions that were posed to hopefully stimulate some more discussion, critics always welcome. First some data, At the Atlanta ASA conference I presented speech intelligibility results from a highly reverberant small room, T60 = 1.4 s, with low levels of noise and no competing sources. The binaural intelligibility with a person listening directly to the speaker in the room was 72%, two microphone recordings of a loudspeaker in the room were 55% intelligble. To investigate this difference a similar test, using the same equipment and settings, was performed with anechoic words. In this second test, anechoic speech intelligibility when listening directly to the loudspeaker was 96%, the recordings of the loudspeaker showed 94% of the words intelligible. This difference was insignificant (p=.11), on the other hand the similar setup in the reverberant room is significant (p=.0001). This differences between these two tests is paritally attributable to the overall intelligibility level, but nonetheless leads me to the belief that speaker problems may interact with the reverberation to cause an intelligibility loss. Someone brought up the question if the subjects could tell they were listening to recordings. Several subjects after completing the experiments in the anechoic chamber commented how the recordings and speaker presentations sounded the same. The recordings made in reverberation did not draw the same comments, and when I listen I can tell the difference between revereberation and recorded reverberation. Possible causes 1. Speakers As seems to be a consensus of the list, the loudspeaker system is the first to blame for reduced intelligibility or quality. To compensate for loudspeaker limitations, I used two speakers without an enclosure mounted face to face wired in phase. Thus creating a volume source similar to a monopole, informal anechoic tests show that this condition is met at lower frequencies but not at higher frequencies. Of course a human voice is not a monopole, but I selected this type of source because I was also testing simulations that used monopole sources. The tests showed insignificant differences between the simulations and listening to the speaker in the room, thus leading to the conclusion that the monopole approximation was acceptable. My speaker calibration attempts to compensate for frequency characteristics by determining an inverse digital time domain filter. To hearty readers the method is outlined at the end of this letter, I believe credit goes to Oppenheim and Shafer’s Digital Signal Processing an older edition, but I don’t have my notes with me at present so please excuse any mistakes and omission of details. 2. Receivers The other aspect which has received considerable attention is the importance of interaural clues. These clues are undoubtedly important for multiple sources of speech and noise. However I have not yet found any research showing that localization of multiple echoes is a viable means of removing reverberation for complex signals. (To my knowledge the precedence effect has not been extended to include complex sounds and multiple echoes.) In addition to the work presented above, I used two omni directional receivers as ears in a reverberation simulation, thus only including interaural time differences in the binaural presentation. The speech in these simulations was 69% intelligible compared to 72% intelligibility for the person using their own ears (and all clues) in the actual room. This difference was not significant. Thus leading me to believe for this situation (high S/N, one talker, source directly in front, T60=1.4s) that interaural level difference and outer ear spectral weighting have a limited effect on intelligibility. I again stress that this is probably not the case with noise and mutliple talkers. Closing comments, Even with this extensive explanation I am still left with the question about why in reverberant settings do recordings of loudspeakers reduce intelligibiltiy? My current focus is on the inexpensive equipment that my budget has permitted me to buy. I am aware of some clipping of the louder speech phonemes. These had little effect on the anechoic tests, thus my hypothesis that clipping interacts with revereberation. Noise is know to interact with reverberation and cause a greater decreases in intelligibility than the sum of the individual effects alone. I estimate the S/N levels in all of my tests to be >= 11 dB (considering bands 32-16000 Hz) and >=22 dB (considering bands 125-4000 Hz). I’m don’t want to admit my “hi-fi” isn’t good enough and it certainly isn’t as interesting as binaural theories, but until my next set of tests prove otherwise it must remain in question. Sincerely, Brad Libbey Determination of a Time Domain Inverse Digital Filter If the digital impulse response of the speaker system is g(n), then it would be desirable to have an inverse filter h(n) such that g(n) x h(n) = delta(n0). In this way any signal can be convolved with this inverse filter to remove the effects of speaker. For example if one wants to play a speech signal, s(n) this signal should be first convolved with the inverse filter, h(n), and subsequently played through the loudspeaker system, g(n). ( s(n) x h(n) ) x g(n) = s(n) x ( h(n) x g(n) ) = s(n) x delta(n0) = s(n-n0) The output signal, s(n-n0) should be a delayed version of the input. Of course the hard part is determining h(n). To do this I first determined the anechoic impulse response of the speaker system, g(n). Then by formulating the convolution problem as a matrix, let gg(n)=convolution matrix for g(n) such that gg(n)h(n) = delta(n0). >From this expression a least squares solution can be used to determine the inverse filter, h(n) = ((gg(n)’gg(n))^-1) delta(n0). The optimal delay, n0, and optimal filter length depend on the original impulse response. There are of course limitations to this techinque, of which the effect of noise may be the greatest. Several data sets were averaged to limit the effect of noise, but no elaborate analysis due to the effects of noise was done. ------------------------------------------------- Sent through Cyberbuzz- A Server for the Students http://cyberbuzz.gatech.edu/

This message came from the mail archive
http://www.auditory.org/postings/2001/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University