[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: own voice versus recorded voice perception
Thanks much for the replies and the interesting discussion. I received several thoughtful and helpful responses off list and thought it might be useful to summarize what I've gleaned to this point.
Own voice versus recorded voice.
I. Why does one's voice sound different to them when they hear it played back from a recording?
As Kent Walker succinctly stated because "It is different." Several factors documented in literature appear to affect our perception of our own voice compared to a recorded version.
1. Transmission medium: One well documented factor that can affect the perception of our own voice compared to a recorded version is the fact that we hear our own voice through a combination of bone and air conduction.
a. Sound transmission via bone conduction appears to be better at low frequencies. In addition, due to the radiation characteristics of our mouth, air conducted transmission of our own voice to our ears are attenuated in the high frequencies. The combined effects of these air and bone conducted signals would be dominated by low frequency information, at least compared to recording of our voice obtained from a microphone located directly in front of the talker.
i. The relative weights of bone versus air conducted sound on own voice perception may well vary across individuals, depending, in part on individual variations in bone conduction transmission characteristics.
b. A recording of our own speech, if obtained from a microphone in front of the talker, will have more high frequency emphasis relative to the speech recorded at ear level and will be missing the contribution of the low frequency weighted bone conducted sound.
2. Physiologic factors that may affect own voice perception
a. Activation of the acoustic reflex during vocalization will further alter the spectral shape of the air conducted stimuli.
b. Own voice vocalizations may suppress cortical activity in auditory areas and thus potentially affect own voice perception.
3. Recording and playback system
a. Although a given to this group, some of the reports related to unnatural sounding vocal recordings in the general population are likely due to poor quality recording and playback equipment (e.g., home video camera recordings..).
b. As suggested above, the location of the recording microphone will affect the spectrum of the air conducted vocalization and thus potentially the perceived quality of the speech. A microphone located directly in front of the talker will have substantially more high frequency energy in the vocalization compared to a microphone located at the entrance of the talker's ear canal.
c. Some have suggested that the use of a sound field microphone reduces the unnatural perception of a recorded voice, although that would suggest limited impact of the bone conducted transmission process.
4. Internal voice versus perceived voice: Although my brief foray into this area did not find any literature on the topic, some have suggested (I'm paraphrasing here, perhaps incorrectly) that we hear an internal voice apart from that generated by our vocal productions (e.g., what we hear when reading silently as opposed to reading aloud) and that internal voice may be more perceptually dominant.
II. Why do people normally dislike the sound of their recorded voice?
While not a universal finding, it is not uncommon for listeners to dislike their recorded voice. Some report the recorded sound as weak or thin. Not liking the recorded voice could be due to the simple fact that it is, as described above, different from our own vocal productions and that we are not used to hearing it. The reports of a weak or thin voice would be consistent with less low and more high frequency emphasis in the recorded stimulus compared to actual vocalizations due to recording microphone location and loss of contribution of bone conducted speech.
Citations and links to relevant references
Georg von Bekesy did a lot of work on bone conduction in the late 1950s/60s.
It included a paper "On the hearing of one's own voice by bone conduction", from JASA if I remember correctly (I have a copy in a box in my loft, but can't get to it at the moment to confirm the details). Others have sporadically re-visited the topic since then. There's a very brief but informative description of the phenomenon here:
I just read something about this in Scientific American a few months ago (not exactly a published journal but may help): http://www.sciam.com/article.cfm?id=why-does-my-voice-sound-different
Here is a link to a wonderful review on bone conduction by Paula Henry and Tomasz Letowski
Von Bekesy's paper on hearing one's own voice is on pages 181-203 of his book "Experiments in Hearing"
(1960). Original reference: "The structure of the middle ear and the hearing of one's own voice by bone conduction", JASA 21 (1949) 217-232.
it is also the case that auditory areas are suppressed in activity during speech production (shown using PET by Wise et al, 1999, Lancet, and using MEG Houde et al, 2002). This suppression has been shown in primates to start before the actual vocalization (Eliades and Wang 2005 I think!). The reasons for this suppression have been linked to monitoring own voice, or due to suppression of any self-generated sensations (e.g. touch). However, whatever the reason for the suppression, it might impact on the perceptual processes which occurs in these areas, and thus our voice might sound different,
S. R. Appel and J. G. Beerends, "On the quality of hearing ones own voice," J. Audio Eng. Soc., vol. 50, pp. 237-248 (2002 April).
Sook Young Won at CCRMA (Stanford) has been studying the relationship between bone-conducted and air-conducted voice perception from a singer's perspective. Her publications are listed on her website: http://ccrma.stanford.edu/~sywon/.
...the introduction section of Tolerable Hearing Aid delays I, Ear & Hear 1999. There are some references to some of the early work in the 1950s as well.
Ben's question appears to address a conundrum that has faced the audio engineer since the era of Edison and Berliner. I am tempted to spew-on but instead will refer you to these gentleman:
-"Stereo is an attempt to create the illusion of reality through the willing suspension of disbelief² Richard Heyser
BEN'S SECOND QUESTION: "And why do people normally dislike the sound of their recorded voice?"
Answer: Is this really true? I know plenty of folks that love the sound of their own voice, and plenty of folks that love the sound of their recorded voice. If this is true, perhaps this is because the voice is disembodied?
With regards to this affective response "dislike", it seems possible in this situation that the "naturalness" of the recorded voice might be negatively correlated (i.e. I would rather hear my own voice playback through my mobile phone than on a 5.1 surround sound recording). While tastes do vary, there is research that suggests the above "haute technologie" is not always the mitigator we hold it up to be:
I have often had success creating natural voice-overs in the studio using one high-quality microphone with is routed to the center channel (one high-quality loudspeaker). This is the industry standard practice.
From: AUDITORY - Research in Auditory Perception [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Hornsby, Benjamin Wade Young
Sent: Thursday, April 09, 2009 7:44 PM
Subject: own voice versus recorded voice perception
This is a bit of an odd request but I've been asked to comment on the question "Why does one's voice sound different to them when they hear it played back from a recording? And why do people normally dislike the sound of their recorded voice?"
My own thoughts are that this has to do with the fact that we hear our own voice via a combination of air and bone conducted sound while the recorded voice would be via air conduction alone. I imagine there are some differences in the transmission characteristics of sound to the cochlea from the vocal folds via air versus the body that would also affect our perception of the sound of our voice.
That said, I did a quick search and didn't find any published research (plenty of speculation similar to mine) discussing this topic and was hoping some one might point me to some relevant references. Any help is greatly appreciated.