[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Eckard Blumschein: Perception of irregular short pulses

To: Multiple recipients of list AUDITORY <AUDITORY%MCGILL1.bitnet@xxxxxxxxxxxxxx>
Subject: Eckard Blumschein: Perception of irregular short pulses
From: Dan Ellis <dpwe@xxxxxxxxxxxxxxxxx>
Date: Fri, 3 Oct 1997 12:53:18 PDT
Reply-to: Dan Ellis <dpwe@xxxxxxxxxxxxxxxxx>
Sender: Research in auditory perception <AUDITORY%MCGILL1.bitnet@xxxxxxxxxxxxxx>
Dear List -

Eckard Blumschein of Magdeburg sent me the enclosed message for your
consideration.  It describes some intriguing problems of investigating
the sound of arc-welding (!)

  DAn.

------- Forwarded Message
Date: Thu, 02 Oct 1997 17:40:49 +0200
From: Eckard Blumschein <blumschein@E-Technik.Uni-Magdeburg.DE>

Dear list members,

Several types of noise are defined by only a spectral mixture and they are
named like colors. May I ask for advice concerning perception of rather a
temporally structured sound consisting of very short time sound pulses
(clicks) in irregular sequence.

This noise pattern seems to constitute the major component of sound from a
kind of arc welding I am dealing with. In time domain, the pattern mainly
consists of larger positive sound pulses and rather negligible negative ones
repeating alternatively as frequent as typically 30 to 150 times each
second. Sound pressure may reach 100 dB or more at peak of positive sound
pulses and it returns to zero about 25 microseconds after their rise.

Depending on stability of the underlying welding process, randomness
affecting amplitudes of pulses and time distances between them may vary from
almost chaotic scattering, reminding to fireworks, crackle of burning wood,
or rain patter down to a much lesser extent of irregularity. However, the
impression of a fuzzy and pitchless sound does not vanish as long as the
welding process is free running. Welders say, they prefer it like sizzling
bacon in a pan. Forced pulsation would sound quite different and seriously
annoying.

Since welding by ear is common practice and it surpasses automatic control
by far, all and any even subconsciously or vague minded features are kept
for crucial, at least single events, frequency, regularity, and hardness.
Unfortunately, as a rule, neither single events nor the dominant frequency
can be clearly distinguished by ear.

When we recorded that sound, we chose at least one second record time, high
sample rate and a microphone being linear up to 100 kHz, in order to make
sure both representative pattern and sufficient resolution. If frequency
limit of microphone was six times lower, then measured peak of sound
pressure reached only a fifth. Perhaps, the real sound pulses were even more
narrowly spiked.

We had no problem in getting frequency of repetition by counting the number
of outstanding positive spikes in time domain from these records. But, FFT
of the extremely narrowly spiked signal failed to extract any frequency
because FFT was limited to 16 k samples, and most of the spikes were lost
during re-sampling. We only succeeded via considerably widening the sound
pulses by dithering the signal before FFT.

If we reduced distance between microphone and source of sound much below
half a meter, then we had trouble with saturation of our microphone due to
exceptional high pulse amplitudes. For that reason, one could fear damage to
the welders' ears, too, although this noise is not felt to be loud or
unpleasant anyway. In turn, randomization seems to be an effective means for
mitigation of annoyance due to tonality.

I also guess, the human ear is most safely protected by inertia of
mechanically transmitting elements in the auditory pathway. Gain adjustment
at ear drum and/or basilar membrane can certainly not fast enough react.
Basilar membrane should similarly respond to each sharp click as a cheap
microphone does, i. e. the traveling wave on it is somewhat flatted and
widened as compared with the original signal. Nevertheless, performance of
the ear is astonishing. Could we learn from nature how to cope with narrow
spiked signals by intelligent coding?

I recently realized a most convincing theory by Jont Allen explaining
dynamic range of hearing. He says, stiffness of basilar membrane is slowly
adapted to intensity of excitation via changing length of outer hair cells.
My question is, how fast does this active feedback ease and rebuilt tension
of BM?

Our recording system adapted itself to maximum of sound pressure. As a
consequence no peaks were cut off. If, however, a simple sound card was used
for recording the narrow spiked signal, then inevitably almost all spikes
were rigorously clipped to the same length because the sound card adapted
very slowly to mean value of sound signal.

We compared by ear both sounds against the original one, the unclipped
signal that was audibilized and sufficiently amplified for audibility, and
the mutilated by a sound card signal. Surprisingly, there were only minute
differences for my inexperienced and already somewhat aged ears. The typical
fuzzy impression remained unchanged.

On the other hand, we barely found similarity between the original sound and
two other ones: First we tried the same signal after all small amplitudes
were suppressed. The second dissimilar sound was an audibilized synthetic
signal only consisting of those narrow pulses we kept for essential because
they clearly dominate the linearly plotted sound record.

This disappointing observation can be explained by non-linear amplitude
coding in the auditory system in connection with narrowness of the pulses.
There was much hidden acoustic energy within the some hundred times smaller
amplitudes filling the some thousand times longer break between pulses.

We performed an other experiment with arc welding sound as follows: The
derivative of electric power was audibilized after all those misleading time
intervals were excluded during which electric power does not contribute to
electric arc. Then comparison between real sound and the audibilized signal
was made. On principle, similarity was pretty good as to be expected.

There was only one very imposing difference. Sometimes re-ignition of the
arc happens not as usual at a small volume but over a larger length. In that
rare case always a striking loud crack is separately audible in the real
signal but not at all from the audibilized derivative of power. Amplitude is
about ten times higher than that of ordinary clicks during the normal
process, but subjective standing out seems to be even much more pronounced.
Before emission of the exceptional loud crack, there was always an
exceptional long period of silence over tens of milliseconds.

Certainly, human outer hair cells can not respond in time to an incoming
needle of sound. Corollary: Sudden sound pulse might hit basilar membrane
before it is adapted by eased tension. The loud crack has an other, a
physically plausible reason, too. Here it is worth to be noticed merely
because it might give a deeper insight in how single bangs, cracks, and
clicks are perceived.

It might be a moot point, whether or not Corti's organ is really able to
perceive the narrow sound spike itself immediately and directly. As
mentioned above, common means for sound recording fails to correctly
reproduce an exceptional high amplitude. The all clipped amplitudes should
sound equal. Nevertheless, some obviously audible characteristic features
indicate occurrence of a loud bang even from a clipped record.

The explanation of that perceptual phenomenon can easily be demonstrated.
Therefore the sound is stretched by replaying it much more slowly than it
was recorded. Given sufficiently corrected amplification, the originally
sharp needle sounds now like slowly declining rumble thunder, lasting long
time after the lightning stroke disappeared.

One can remove from record the few microseconds of real time the actual
sound pulse itself is lasting. It seems to me, that even a missing
excitation will be audible or may be imaginable from remaining 'resounding'
(maybe this word is incorrect). I have the chutzpah to speak of a missing
fundamental excitation in order to underline my naive idea of pattern
reconstruction in the brain. Well, I know from a paper by Riquimaroux, the
term missing fundamental is only an other name of temporal pitch.

I do not exclude the possibility, reflections might play a role, including
those within the auditory pathway. No matter what and where the cause of the
phenomenon is, this perceived resounding after the causing original sound
pressure already disappeared is exactly the noise is of that type, I am
mainly interested in. It has presumably dynamic features being unknown thus
far. If it has a color, then it is certainly a changing one with a
distinctive temporal asymmetry. Maybe, these dynamic features provides a new
dimension, a key to better understanding of mysterious or just pretty
complex properties like metallic sound requested by Jana Schiffels.

I expect similar effects when investigating negative sound pulses
thoroughly. I repeatedly read, there is a one-way-rectifying effect of
basilar membrane, i. e. negative sound pulses should not be perceived, at
least not in the same manner as positive ones. The latter is convincing to
me because I am able to recognize whether I am hearing plosives or
implosives, a beat or a kiss for instance.

Has anybody performed experiments with complex noise which was differently
modified and separately fed to left and right ear? I read a paper on
psychoacoustic roughness by Daniel and Weber (ACUSTICA 1997) who quote
disappearing psychoacoustic roughness on condition two normally conflicting
tones were presented separately to both ears. This gives evidence that
psychoacoustic roughness depends on overlap of excitation pattern on basilar
membrane. However, this quantity is based on superposition of harmonic
tones. So I doubt whether it is compatible to some kind of roughness to be
attributed to arc welding noise.

Traditional spectrograms show mountains attracting attention to their peaks.
They seem to me rather unfit for revealing any valuable insight into
processes from which the bangs, cracks, or clips are descended. Their noise
has a wideband spectrum. Each single click is represented by a vertical line
appearing like a razors blade in a 3D spectrogram. They may cross horizontal
lines of constant frequency indicating an additional periodic excitation. If
the sound pulses were made rectangular by cutting off their peaks then the
razor blades in the spectrogram may be show a periodic waveform
corresponding to width of the remaining stumps. Such effects of improper
signal processing can be impressive, but they do not provide valuable
information.

Applications of wavelet transform or of another symmetrical spectral
representation of sound have been published for purposes as cardiology,
knock detection in cars, monitoring a gear or wear at machine tools, etc.
However, I read only a very few discussions on how features can be extracted
from admittedly beautiful wavelet transform plots. Only a better
discernibility of singularities from background was claimed.

You might ask, why do I not resort to one of the numerous models of cochlea?
I have to admit, I am reluctant to swallow not only temporal symmetry,
obviously contradicting reality but also the imagination of incredibly fine
mechanically tuned filter banks. Why not scrutinizing the possibility of an
other model, too?

Wang and Shamma (IEEE Trans. on Speech and Audio Processing Vol. 3, No. 5,
Sept. 1995) suggested a logarithmic frequency axis in the primary auditory
cortex. I am used to think in terms of electrical engineering and, as a
layman in acoustics who does not believe in mathematical skill of natural
systems being thought as having developed stepwise from a quite simple
starting point, I just naively guess, cochlea might capture rather a time
span which traveling waves require for rising up to a certain level or for
spreading along a certain distance on basilar membrane than location of a
maximal vibration.

Why not replace the frequency-vs.-time representation by a corresponding
reciprocal-of-frequency-vs.-time plane? Would we loose any information as
compared to common practice ignoring phase angle? No, in turn, we could win
asymmetry if we were ready to accept negative values, too. Also, a smooth
shake hand between short time spectral and long time temporal coding would
be conceivable more naturally.

I read from Len Trejo's Lectures, there are 40 to 60 hairs per cell, and one
hair cell contacts 3 to 15 neural fibers. Would it make sense to look for
how distant the places are, where the fibers end? Can pairs or groups of
them possibly pick up electrically differences of pressure or velocity along
the tonotopical axis on basilar membrane?

As I further read in a paper from Boystown, after recording a tuning curve
from a tapped nerve fiber, it is possible to mark it chemically so that the
path of the fiber can be traced to the place within the cochlea where it
synapses with an inner hair cell. So, I trust in reliability of
frequency-to-place maps. I also realized successful studies on functional
development of the ear around the time of birth. May I hope for a persuasive
description of intra-cochlear coding coming soon?

There is a remaining problem touching arc welding sound, too. Acoustic
perception can only partially be covered by spectral perception or by
perception of small time differences, respectively. Memory in the brain has
to support and finally to undertake the job of cochlea, until frequency is
lower than a certain value. Is it correct to estimate less than 20 Hz or a
time span larger than 50 ms, respectively? Is it useful to exactly quantify
this value?

I know, Zwicker gave three limits (Grenzdauern) based on psychoacoustic
experiments: 200 ms, 20 ms, and 2 ms. The smallest value corresponds to 500
Hz and denotes the limit of temporal resolution in monaural or diotic
hearing. The medium one, 20 ms, relates to 8 Hz frequency of modulation
being not yet derogatory to gain with increasing frequency.

Chris Plack gave an other view within his Psychoacoustic Lecture Notes.
Based mainly on measurements of neural firing rates, he suggested to 'use a
temporal mechanism at low frequencies, and a place mechanism above 5 kHz
where phase locking is not available.' He wrote: 'At frequencies below 5
kHz, neurons will tend to fire only at one point in the sinusoidal cycle of
the incoming waveform.'

Probably, this handshake between spectral and temporal coding give rise to
both consequences: On one hand a deficiency in overlap might be imaginable.
On the other hand the double coding to be suggested else might be a source
of exceptional hearing skill which can be enhanced by training. Emotion
should also play a role in that region. Can anybody tell me whether or not
and if so how to understand delta frequency band (1 to 3.5 Hz) and theta
frequency band (3.5 to 7.5 Hz) etc. in that context?

I myself personally suffer sometimes from an unidentified occasionally
appearing traffic sound at very low frequency. My office is near to
crossroads. I do not know what is to blame for a highly annoying feeling I
experience from an all penetrating vibration. Is it a resonance in the
exhaust pipe of large diesel driven vehicles or is it just my over-sensitivity?

I remember the assessment an expert welder gave on a new welding machine
which was distinguished by a favorably more regular process but
unfortunately also by a repeating with low frequency one. He said, the light
flicker is unacceptable to me. I never heard of any regulation concerning
audible noise that corresponds to flicker limits ensuring electric voltage
quality.

Finally, as concerns that noise from arc welding I described above, I would
like to try a tentative conjecture based on a simple consideration. Temporal
pattern, my brain can imagine and assess, are presumably, as a rule, the
same as those perceived and stored in my brain. If there is evidence for
perception of mixed pattern containing both spectral and temporal features,
then spectral and temporal perception will very likely overlap each other.
Examples of such pattern could be dynamic color and dynamic asymmetry.

I propose to shift the mentioned limits of perception concerning frequency
or time, respectively relative to these pattern by changing speed of replay.
Quantitative proof of predicted herewith pattern inconsistency against
frequency shift should provide a simple method for proof of guilt to my
conjecture.

I apologize if someone feels upset because I failed to correctly and briefly
write down my very limited knowledge, observations and immature ideas.
Nevertheless, having frankly addressed a lot of new and coherent questions
from a practical point of view, I hope for correcting and advisory response
from the auditory community.

Sincerely,
Eckard Blumschein, 1997, Oct. 1

Otto of Guericke University Magdeburg
Dept. of Electrical Engineering
IELE
PSF 4120
D-39016 Magdeburg

Phone:  + 49 391 67 12403                       Fax: + 49 391 67 12408

blumschein@et.uni-magdeburg.de


------- End of Forwarded Message
Prev by Date: modelling nVIII tuning curves
Next by Date: Re: Eckard Blumschein: Perception of irregular short pulses
Previous by thread: modelling nVIII tuning curves
Next by thread: Re: Eckard Blumschein: Perception of irregular short pulses
Index(es):
- Date
- Thread