masked speech responses (RJ ZATORRE )

Subject: masked speech responses
Date:    Tue, 8 Aug 1995 15:54:35 EDT

Dear Auditory List members Many of you wrote to me regarding my question about masking a speaker's own speech. I will try to summarize some of the main points of the responses, but first I should explain more about what we were trying to do, as that might clarify certain issues; and anyway many of you were wondering why in the world we would want to do such a strange thing in the first place. My colleagues and I are interested in the brain mechanisms underlying speech, among other things. The technique we're using is positron emission tomography (PET), which measures changes in local cerebral blood flow (CBF) in normal volunteers while they perform a given task. In this particular experiment, we wondered whether we could find evi- dence for interactions between speech output mechanisms and cortical regions devoted to auditory analysis. The hypothesis behind this is too complex to explain in detail here, but based on certain other observations we had reason to believe that there might be some physiological feedback mechanisms, related to corollary discharge, for example. Anyway, our aim was to scan subjects while they produced speech at different rates, and to look for any changes in CBF in auditory cortex that might correlate with the rate of output. But of course if they're speaking more quickly they'll get more auditory input per unit time, so a change in CBF in the auditory cortex would be a trivial observation. So, we reasoned, if we mask the subject's speech via noise, such that the actual sound reaching the cochlea is constant, any variation in CBF would have to be a con- sequence of internally driven feedback mechanisms. Hence our desire to mask speech. To make a long story short, we think the experiment worked. In order to be able to mask speech effectively without introducing a huge masking signal, subjects were trained to speak in a whisper, with no phonation (which they seemed to learn easily). We then adjusted the noise until they told us they could no longer hear themselves. People were surprisingly consistent in setting the noise to about 60 dB SPL, as measured directly from the foam insert earphone using a specially- adapted acoustic coupler. We have not yet finished analyzing the data but one result which we are pleased with is that there are indeed region in the left temporal cortex whose blood flow covaries with rate of speech output. This region probably contains neurons that are specialized for analy- sis of acoustic features relevant to speech, so the fact that its hemodynamics are systematically related to rate of speech output could be evidence for a feedback network of the sort we were hypothesizing. Now for the answers to my original query: Several people pointed out the fairly obvious fact that feedback from one's own voice would consist of both air and bone conduction, so any masking would have to affect both components. On a related point, by using insert earphones we make bone conduction all that much more ef- ficient, particularly in the low frequencies, according to several responses. Using whispered speech seems to overcome this problem, though, since there is very little low-frequency energy. Ed Burns reminded me that the tendency to speak loudly when your voice is masked is called the Lombard effect. This we took care of via our training procedure, we think. We did not notice any increase in the intensity of the subject's voice over the course of the study, once they had been trained. A number of people suggested that speech output would be disrupted in other ways during masking, including speech errors, and problems in intonation (at least if they were singing; Ward and Burns did some work on this 20 years ago). This is not much of an issue in my partic- ular study, since the output was simply two meaningless speech syll- ables ("ba" and "lu"), so there was not much room for speech errors to show up. Plus subjects had to practice doing this a lot before we stuck them inside the scanner. Under more naturalistic conditions, though, it is likely that speech errors would be observed. One of the most useful comments came from B. Repp: "You could do a small control experiment in which you present the subject with DELAYED auditory feedback of his/her voice, using the same level of masking noise. If there is no evidence of interference, then you have objec- tive evidence that the speech was inaudible to the speaker" Sounded like a clever idea to me. Also useful was the suggestion by A. Houtsma that noise with a spec- tral shape similar to speech would be more effective than just white noise. That's about it for the comments, except that many people seemed to be interested in the same problem for a variety of different reasons. So I hope this has been useful to at least some of you. I appreciated the responses and thank all those who took the time. If anybody has any further ideas, feel free to communicate them with me. Best wishes, Robert Zatorre Montreal Neurological Institute

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University