[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: AUDITORY@xxxxxxxxxxxxxxx*Subject*: Re: Blind Source Separation by Sparse Decomposition*From*: "John K. Bates" <jkbates@xxxxxxxxxxxx>*Date*: Tue, 21 Sep 1999 10:14:12 -0400*Reply-to*: "John K. Bates" <jkbates@xxxxxxxxxxxx>*Sender*: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear List, Following is a note regarding Al Bregman's 9/6/99 post on "blind source separation by sparse decomposition." To refresh memories on the subject I have included his note. I am intrigued by the implications of his questions on multiple sources and clipped waveforms. They suggest that he might be looking at a "granular" approach to CASA. Since I've been working with these aspects of auditory perception for many years, I feel that I can address some of his questions. With respect to the separation of signals, one must consider the operational tradeoffs of the biological ear for its intended purpose as a sensor for biological survival. Conventional methods do not do this. They treat the ear as a communications channel, that is inherently ignorant of the meaning contained in the data it carries. In contrast, the ear as a sensor searches for meaning rather than, say, low distortion. This search for meaning seems consistent with Al's remark about "humans trading off perfection in a narrow set of circumstances for flexibility." Thus, the solution to sorting multiple sources is to be found not so much in a mathematical formula as in an engineering design approach that considers tradeoffs of constraints and objectives. Such an approach could reach the putative goal of CASA, a signal processing method that can do what the ear does. I have found that granular time-domain waveform analysis can get these results efficiently, while shunning the method of biophysical modeling. Dealing with waveform clipping is the to key high-resolution granular signal processing. I first looked at this in connection with the intelligibility of infinitely clipped speech (Licklider & Pollack [1]). Typically, in voiced speech the upper formants form ripples that ride upon the wave of a strong first formant. This is especially noticeable in the /ee/ waveform. These ripples contain zeros of the waveform in the complex time domain (Voelcker [2]) that are destroyed by clipping. Now, if destroying ripples removes upper formants, how is it that clipped speech can be intelligible? The answer is that in the region of the waveform near the zero axis a few higher frequency zeros can remain. In fact, enough zeros remain to retain better than 65 percent intelligibility [1]. In addition, if the waveform is differentiated before it is clipped, most of the complex zeros are converted to real zeros, giving nearly perfect intelligibility. Thus, despite severe waveshape distortion, the meaning is preserved. More generally, Voelcker has shown that almost all information is found in the real and complex zeros of a waveform. Thus, information from overlapping signal sources is contained in the mix of real and complex zeros that are defined by the clipped waveform. This can be appreciated by listening to a multi-signal waveform that has been differentiated and clipped. The real problem here is to devise an algorithm that deconstructs the clipped waveform and sorts out its mixed zeros into their respective sources. As an example, my paper on the Haas effect presented at Mohonk97 described a method for doing this using direction of arrival to separate sources from their reverberations. In various experiments what I have found is that a granular algorithm using real and complex zeros can replicate many crucial psychoacoustic and speech processing functions. To summarize: Voelcker has shown that signal additivity is not necessarily destroyed by clipping. However, the mathematics of the separation problem seem to point toward something like fractal theory. Meanwhile, heuristic methods such as the one I have been using can lead to practical applications. References: [1] J.C.R. Licklider and I. Pollack, "Effects of differentiation, integration, and infinite clipping upon the intelligibility of speech," J. Acous. Soc. Am., Vol. 20, pp42-51, January 1948 [2] H.B. Voelker, "Toward a unified theory of modulation," Part I, Phase envelope relationships, Proc. IEEE, Vol. 63, pp 340-353, March 1966, and Part II, "Zero manipulation," pp735-755, May 1966 -John Bates Time/Space Systems 79 Sarles Lane Pleasantville, NY 10570 914-747-3143 jkbates@ieee.org ---------------------------------------------------------------- At 02:08 PM 9/6/99 -0400, you wrote: >Dear Michael, > >Thanks for your response about the number of receivers versus the number >of sources It makes the human ability to (imperfectly) deal with many >sources with only 2 ears even more intriguing. Somehow humans are trading >off perfection in a narrow set of circumstances for flexibility. I >suspect _heuristic_ approaches to CASA (computational auditory scene >analysis) would work more like people do. > >Here is why I asked about the clipping problem. I'm no physicist so I >can't give you an exact physical formulation of the problem. However, it >seems to me that clipping destroys the linear additivity of the frequency >components in the signal. Here is a simple example: mix a low amplitude >high frequency component with a high amplitude, low frequency one. In the >waveform, the high frequency seems to be riding on top of the low >frequency at all points in the signal. Now clip the signal. Now the high >frequency signal is missing in the segments that exceed the clipping >threshold. It could have changed in frequency (and then back again) for >all we know. > >I wanted to know whether, by destroying the additivity of the signals, >clipping ruled out any mathematical methods for separation that are based >on this additivity. I'm also not sure what echos and reverberation would >do to such mathematical methods. > >- Al >-----------------------------------------------

- Prev by Date:
**Re: Gestalt criticisms** - Next by Date:
**slowing speech,music rate without alerting pitch** - Previous by thread:
**Re: Blind Source Separation by Sparse Decomposition** - Next by thread:
**Research Opportunity in Audio Classification** - Index(es):