[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Recordings in enclosed spaces.



Dear List,

In regards to the question of recordings made from loudspeakers in enclosed
spaces, I would like to answer some questions that were posed to hopefully
stimulate some more discussion, critics always welcome.

First some data,
At the Atlanta ASA conference I presented speech intelligibility results from a
highly reverberant small room, T60 = 1.4 s, with low levels of noise and no
competing sources.   The binaural intelligibility with a person listening
directly to the speaker in the room was 72%, two microphone recordings of a
loudspeaker in the room were 55% intelligble.  To investigate this difference a
similar test, using the same equipment and settings, was performed with
anechoic words.  In this second test, anechoic speech intelligibility when
listening directly to the loudspeaker was 96%, the recordings of the
loudspeaker showed 94% of the words intelligible.  This difference was
insignificant (p=.11), on the other hand the similar setup in the reverberant
room is significant  (p=.0001). This differences between these two tests is
paritally attributable to the overall intelligibility level, but nonetheless
leads me to the belief that speaker problems may interact with the
reverberation to cause an intelligibility loss.

Someone brought up the question if the subjects could tell they were listening
to recordings.  Several subjects after completing the experiments in the
anechoic chamber commented how the recordings and speaker presentations sounded
the same.  The recordings made in reverberation did not draw the same comments,
and when I listen I can tell the difference between revereberation and recorded
reverberation.


Possible causes
1. Speakers
As seems to be a consensus of the list, the loudspeaker system is the first to
blame for reduced intelligibility or quality.  To compensate for loudspeaker
limitations, I used two speakers without an enclosure mounted face to face
wired in phase.  Thus creating a volume source similar to a monopole, informal
anechoic tests show that this condition is met at lower frequencies but not at
higher frequencies.  Of course a human voice is not a monopole, but I selected
this type of source because I was also testing simulations that used monopole
sources.  The tests showed insignificant differences between the simulations
and listening to the speaker in the room, thus leading to the conclusion that
the monopole approximation was acceptable.

My speaker calibration attempts to compensate for frequency characteristics by
determining an inverse digital time domain filter.  To hearty readers the
method is outlined at the end of this letter, I believe credit goes to
Oppenheim and Shafer’s Digital Signal Processing an older edition, but I don’t
have my notes with me at present so please excuse any mistakes and omission of
details.

2. Receivers
The other aspect which has received considerable attention is the importance of
interaural clues.  These clues are undoubtedly important for multiple sources
of speech and noise.  However I have not yet found any research showing that
localization of multiple echoes is a viable means of removing reverberation for
complex signals. (To my knowledge the precedence effect has not been extended
to include complex sounds and multiple echoes.)

In addition to the work presented above, I used two omni directional receivers
as ears in a reverberation simulation, thus only including interaural time
differences in the binaural presentation.  The speech in these simulations was
69% intelligible compared to 72% intelligibility for the person using their own
ears (and all clues) in the actual room.  This difference was not significant.
Thus leading me to believe for this situation (high S/N, one talker, source
directly in front, T60=1.4s) that interaural level difference and outer ear
spectral weighting have a limited effect on intelligibility.  I again stress
that this is probably not the case with noise and mutliple talkers.


Closing comments,
Even with this extensive explanation I am still left with the question about
why in reverberant settings do recordings of loudspeakers reduce
intelligibiltiy?  My current focus is on the inexpensive equipment that my
budget has permitted me to buy.  I am aware of some clipping of the louder
speech phonemes.  These had little effect on the anechoic tests, thus my
hypothesis that clipping interacts with revereberation.  Noise is know to
interact with reverberation and cause a greater decreases in intelligibility
than the sum of the individual effects alone.   I estimate the S/N levels in
all of my tests to be >= 11 dB (considering bands 32-16000 Hz) and >=22 dB
(considering bands 125-4000 Hz).  I’m don’t want to admit my “hi-fi” isn’t good
enough and it certainly isn’t as interesting as binaural theories, but until my
next set of tests prove otherwise it must remain in question.

Sincerely,
Brad Libbey


Determination of a Time Domain Inverse Digital Filter
If the digital impulse response of the speaker system is g(n), then it would be
desirable to have an inverse filter h(n) such that g(n) x h(n) = delta(n0).  In
this way any signal can be convolved with this inverse filter to remove the
effects of speaker.  For example if one wants to play a speech signal, s(n)
this signal should be first convolved with the inverse filter, h(n), and
subsequently played through the loudspeaker system, g(n).
( s(n) x h(n) ) x g(n) = s(n) x ( h(n) x g(n) ) = s(n) x delta(n0) = s(n-n0)
The output signal, s(n-n0) should be a delayed version of the input.  Of course
the hard part is determining h(n).  To do this I first determined the anechoic
impulse response of the speaker system, g(n).  Then by formulating the
convolution problem as a matrix, let gg(n)=convolution matrix for g(n) such that
              gg(n)h(n) = delta(n0).
>From this expression a least squares solution can be used to determine the
inverse filter,
              h(n) = ((gg(n)’gg(n))^-1) delta(n0).
The optimal delay, n0, and optimal filter length depend on the original impulse
response.  There are of course limitations to this techinque, of which the
effect of noise may be the greatest.  Several data sets were averaged to limit
the effect of noise, but no elaborate analysis due to the effects of noise was
done.



-------------------------------------------------
Sent through Cyberbuzz- A Server for the Students
http://cyberbuzz.gatech.edu/