Eckard Blumschein: Perception of irregular short pulses (Dan Ellis )

Subject: Eckard Blumschein: Perception of irregular short pulses
From:    Dan Ellis  <dpwe(at)ICSI.BERKELEY.EDU>
Date:    Fri, 3 Oct 1997 12:53:18 PDT

Dear List - Eckard Blumschein of Magdeburg sent me the enclosed message for your consideration. It describes some intriguing problems of investigating the sound of arc-welding (!) DAn. ------- Forwarded Message Date: Thu, 02 Oct 1997 17:40:49 +0200 From: Eckard Blumschein <blumschein(at)E-Technik.Uni-Magdeburg.DE> Dear list members, Several types of noise are defined by only a spectral mixture and they are named like colors. May I ask for advice concerning perception of rather a temporally structured sound consisting of very short time sound pulses (clicks) in irregular sequence. This noise pattern seems to constitute the major component of sound from a kind of arc welding I am dealing with. In time domain, the pattern mainly consists of larger positive sound pulses and rather negligible negative ones repeating alternatively as frequent as typically 30 to 150 times each second. Sound pressure may reach 100 dB or more at peak of positive sound pulses and it returns to zero about 25 microseconds after their rise. Depending on stability of the underlying welding process, randomness affecting amplitudes of pulses and time distances between them may vary from almost chaotic scattering, reminding to fireworks, crackle of burning wood, or rain patter down to a much lesser extent of irregularity. However, the impression of a fuzzy and pitchless sound does not vanish as long as the welding process is free running. Welders say, they prefer it like sizzling bacon in a pan. Forced pulsation would sound quite different and seriously annoying. Since welding by ear is common practice and it surpasses automatic control by far, all and any even subconsciously or vague minded features are kept for crucial, at least single events, frequency, regularity, and hardness. Unfortunately, as a rule, neither single events nor the dominant frequency can be clearly distinguished by ear. When we recorded that sound, we chose at least one second record time, high sample rate and a microphone being linear up to 100 kHz, in order to make sure both representative pattern and sufficient resolution. If frequency limit of microphone was six times lower, then measured peak of sound pressure reached only a fifth. Perhaps, the real sound pulses were even more narrowly spiked. We had no problem in getting frequency of repetition by counting the number of outstanding positive spikes in time domain from these records. But, FFT of the extremely narrowly spiked signal failed to extract any frequency because FFT was limited to 16 k samples, and most of the spikes were lost during re-sampling. We only succeeded via considerably widening the sound pulses by dithering the signal before FFT. If we reduced distance between microphone and source of sound much below half a meter, then we had trouble with saturation of our microphone due to exceptional high pulse amplitudes. For that reason, one could fear damage to the welders' ears, too, although this noise is not felt to be loud or unpleasant anyway. In turn, randomization seems to be an effective means for mitigation of annoyance due to tonality. I also guess, the human ear is most safely protected by inertia of mechanically transmitting elements in the auditory pathway. Gain adjustment at ear drum and/or basilar membrane can certainly not fast enough react. Basilar membrane should similarly respond to each sharp click as a cheap microphone does, i. e. the traveling wave on it is somewhat flatted and widened as compared with the original signal. Nevertheless, performance of the ear is astonishing. Could we learn from nature how to cope with narrow spiked signals by intelligent coding? I recently realized a most convincing theory by Jont Allen explaining dynamic range of hearing. He says, stiffness of basilar membrane is slowly adapted to intensity of excitation via changing length of outer hair cells. My question is, how fast does this active feedback ease and rebuilt tension of BM? Our recording system adapted itself to maximum of sound pressure. As a consequence no peaks were cut off. If, however, a simple sound card was used for recording the narrow spiked signal, then inevitably almost all spikes were rigorously clipped to the same length because the sound card adapted very slowly to mean value of sound signal. We compared by ear both sounds against the original one, the unclipped signal that was audibilized and sufficiently amplified for audibility, and the mutilated by a sound card signal. Surprisingly, there were only minute differences for my inexperienced and already somewhat aged ears. The typical fuzzy impression remained unchanged. On the other hand, we barely found similarity between the original sound and two other ones: First we tried the same signal after all small amplitudes were suppressed. The second dissimilar sound was an audibilized synthetic signal only consisting of those narrow pulses we kept for essential because they clearly dominate the linearly plotted sound record. This disappointing observation can be explained by non-linear amplitude coding in the auditory system in connection with narrowness of the pulses. There was much hidden acoustic energy within the some hundred times smaller amplitudes filling the some thousand times longer break between pulses. We performed an other experiment with arc welding sound as follows: The derivative of electric power was audibilized after all those misleading time intervals were excluded during which electric power does not contribute to electric arc. Then comparison between real sound and the audibilized signal was made. On principle, similarity was pretty good as to be expected. There was only one very imposing difference. Sometimes re-ignition of the arc happens not as usual at a small volume but over a larger length. In that rare case always a striking loud crack is separately audible in the real signal but not at all from the audibilized derivative of power. Amplitude is about ten times higher than that of ordinary clicks during the normal process, but subjective standing out seems to be even much more pronounced. Before emission of the exceptional loud crack, there was always an exceptional long period of silence over tens of milliseconds. Certainly, human outer hair cells can not respond in time to an incoming needle of sound. Corollary: Sudden sound pulse might hit basilar membrane before it is adapted by eased tension. The loud crack has an other, a physically plausible reason, too. Here it is worth to be noticed merely because it might give a deeper insight in how single bangs, cracks, and clicks are perceived. It might be a moot point, whether or not Corti's organ is really able to perceive the narrow sound spike itself immediately and directly. As mentioned above, common means for sound recording fails to correctly reproduce an exceptional high amplitude. The all clipped amplitudes should sound equal. Nevertheless, some obviously audible characteristic features indicate occurrence of a loud bang even from a clipped record. The explanation of that perceptual phenomenon can easily be demonstrated. Therefore the sound is stretched by replaying it much more slowly than it was recorded. Given sufficiently corrected amplification, the originally sharp needle sounds now like slowly declining rumble thunder, lasting long time after the lightning stroke disappeared. One can remove from record the few microseconds of real time the actual sound pulse itself is lasting. It seems to me, that even a missing excitation will be audible or may be imaginable from remaining 'resounding' (maybe this word is incorrect). I have the chutzpah to speak of a missing fundamental excitation in order to underline my naive idea of pattern reconstruction in the brain. Well, I know from a paper by Riquimaroux, the term missing fundamental is only an other name of temporal pitch. I do not exclude the possibility, reflections might play a role, including those within the auditory pathway. No matter what and where the cause of the phenomenon is, this perceived resounding after the causing original sound pressure already disappeared is exactly the noise is of that type, I am mainly interested in. It has presumably dynamic features being unknown thus far. If it has a color, then it is certainly a changing one with a distinctive temporal asymmetry. Maybe, these dynamic features provides a new dimension, a key to better understanding of mysterious or just pretty complex properties like metallic sound requested by Jana Schiffels. I expect similar effects when investigating negative sound pulses thoroughly. I repeatedly read, there is a one-way-rectifying effect of basilar membrane, i. e. negative sound pulses should not be perceived, at least not in the same manner as positive ones. The latter is convincing to me because I am able to recognize whether I am hearing plosives or implosives, a beat or a kiss for instance. Has anybody performed experiments with complex noise which was differently modified and separately fed to left and right ear? I read a paper on psychoacoustic roughness by Daniel and Weber (ACUSTICA 1997) who quote disappearing psychoacoustic roughness on condition two normally conflicting tones were presented separately to both ears. This gives evidence that psychoacoustic roughness depends on overlap of excitation pattern on basilar membrane. However, this quantity is based on superposition of harmonic tones. So I doubt whether it is compatible to some kind of roughness to be attributed to arc welding noise. Traditional spectrograms show mountains attracting attention to their peaks. They seem to me rather unfit for revealing any valuable insight into processes from which the bangs, cracks, or clips are descended. Their noise has a wideband spectrum. Each single click is represented by a vertical line appearing like a razors blade in a 3D spectrogram. They may cross horizontal lines of constant frequency indicating an additional periodic excitation. If the sound pulses were made rectangular by cutting off their peaks then the razor blades in the spectrogram may be show a periodic waveform corresponding to width of the remaining stumps. Such effects of improper signal processing can be impressive, but they do not provide valuable information. Applications of wavelet transform or of another symmetrical spectral representation of sound have been published for purposes as cardiology, knock detection in cars, monitoring a gear or wear at machine tools, etc. However, I read only a very few discussions on how features can be extracted from admittedly beautiful wavelet transform plots. Only a better discernibility of singularities from background was claimed. You might ask, why do I not resort to one of the numerous models of cochlea? I have to admit, I am reluctant to swallow not only temporal symmetry, obviously contradicting reality but also the imagination of incredibly fine mechanically tuned filter banks. Why not scrutinizing the possibility of an other model, too? Wang and Shamma (IEEE Trans. on Speech and Audio Processing Vol. 3, No. 5, Sept. 1995) suggested a logarithmic frequency axis in the primary auditory cortex. I am used to think in terms of electrical engineering and, as a layman in acoustics who does not believe in mathematical skill of natural systems being thought as having developed stepwise from a quite simple starting point, I just naively guess, cochlea might capture rather a time span which traveling waves require for rising up to a certain level or for spreading along a certain distance on basilar membrane than location of a maximal vibration. Why not replace the frequency-vs.-time representation by a corresponding reciprocal-of-frequency-vs.-time plane? Would we loose any information as compared to common practice ignoring phase angle? No, in turn, we could win asymmetry if we were ready to accept negative values, too. Also, a smooth shake hand between short time spectral and long time temporal coding would be conceivable more naturally. I read from Len Trejo's Lectures, there are 40 to 60 hairs per cell, and one hair cell contacts 3 to 15 neural fibers. Would it make sense to look for how distant the places are, where the fibers end? Can pairs or groups of them possibly pick up electrically differences of pressure or velocity along the tonotopical axis on basilar membrane? As I further read in a paper from Boystown, after recording a tuning curve from a tapped nerve fiber, it is possible to mark it chemically so that the path of the fiber can be traced to the place within the cochlea where it synapses with an inner hair cell. So, I trust in reliability of frequency-to-place maps. I also realized successful studies on functional development of the ear around the time of birth. May I hope for a persuasive description of intra-cochlear coding coming soon? There is a remaining problem touching arc welding sound, too. Acoustic perception can only partially be covered by spectral perception or by perception of small time differences, respectively. Memory in the brain has to support and finally to undertake the job of cochlea, until frequency is lower than a certain value. Is it correct to estimate less than 20 Hz or a time span larger than 50 ms, respectively? Is it useful to exactly quantify this value? I know, Zwicker gave three limits (Grenzdauern) based on psychoacoustic experiments: 200 ms, 20 ms, and 2 ms. The smallest value corresponds to 500 Hz and denotes the limit of temporal resolution in monaural or diotic hearing. The medium one, 20 ms, relates to 8 Hz frequency of modulation being not yet derogatory to gain with increasing frequency. Chris Plack gave an other view within his Psychoacoustic Lecture Notes. Based mainly on measurements of neural firing rates, he suggested to 'use a temporal mechanism at low frequencies, and a place mechanism above 5 kHz where phase locking is not available.' He wrote: 'At frequencies below 5 kHz, neurons will tend to fire only at one point in the sinusoidal cycle of the incoming waveform.' Probably, this handshake between spectral and temporal coding give rise to both consequences: On one hand a deficiency in overlap might be imaginable. On the other hand the double coding to be suggested else might be a source of exceptional hearing skill which can be enhanced by training. Emotion should also play a role in that region. Can anybody tell me whether or not and if so how to understand delta frequency band (1 to 3.5 Hz) and theta frequency band (3.5 to 7.5 Hz) etc. in that context? I myself personally suffer sometimes from an unidentified occasionally appearing traffic sound at very low frequency. My office is near to crossroads. I do not know what is to blame for a highly annoying feeling I experience from an all penetrating vibration. Is it a resonance in the exhaust pipe of large diesel driven vehicles or is it just my over-sensitivity? I remember the assessment an expert welder gave on a new welding machine which was distinguished by a favorably more regular process but unfortunately also by a repeating with low frequency one. He said, the light flicker is unacceptable to me. I never heard of any regulation concerning audible noise that corresponds to flicker limits ensuring electric voltage quality. Finally, as concerns that noise from arc welding I described above, I would like to try a tentative conjecture based on a simple consideration. Temporal pattern, my brain can imagine and assess, are presumably, as a rule, the same as those perceived and stored in my brain. If there is evidence for perception of mixed pattern containing both spectral and temporal features, then spectral and temporal perception will very likely overlap each other. Examples of such pattern could be dynamic color and dynamic asymmetry. I propose to shift the mentioned limits of perception concerning frequency or time, respectively relative to these pattern by changing speed of replay. Quantitative proof of predicted herewith pattern inconsistency against frequency shift should provide a simple method for proof of guilt to my conjecture. I apologize if someone feels upset because I failed to correctly and briefly write down my very limited knowledge, observations and immature ideas. Nevertheless, having frankly addressed a lot of new and coherent questions from a practical point of view, I hope for correcting and advisory response from the auditory community. Sincerely, Eckard Blumschein, 1997, Oct. 1 Otto of Guericke University Magdeburg Dept. of Electrical Engineering IELE PSF 4120 D-39016 Magdeburg Phone: + 49 391 67 12403 Fax: + 49 391 67 12408 blumschein(at) ------- End of Forwarded Message

This message came from the mail archive
maintained by:
DAn Ellis <>
Electrical Engineering Dept., Columbia University