Re: An Auditory Illusion (Al Bregman )


Subject: Re: An Auditory Illusion
From:    Al Bregman  <bregman(at)HEBB.PSYCH.MCGILL.CA>
Date:    Thu, 22 May 1997 16:37:39 -0400

Here is a message sent to me to be forwarded to the list. - Al -------------------------------------------------------------- Date: Mon, 19 May 1997 15:03:21 +0000 From: Peter Cariani <peter(at)epl.meei.harvard.edu> To: AUDITORY(at)MCGILL1.BITNET Subject: Re: An Auditory Illusion Christian Kaernbach wrote: > Dear Al, > > But consider the following two cases: > > 1. One sees a red ball on a blue table. > > 2. One sees a blue ball on a red table. > > According to a "node" theory, in both cases the nodes representing > > red, blue, ball, on and table are activated. What then is the > > difference? The two cases require a different arrangement of the > > same ideas, but node-based theories cannot express this. > > The (now) "classical" solution to this problem is synchronicity, The > cell assembly for RED and the cell assembly for BALL fire in > synchrony if they are to express that it is the ball that is red. > This idea dates back to the 70es (Christoph von der Malsburg, "A > correlation theory of brain function", he actually cites someone from > last century who had the same idea) but was confirmed only recently > (1991 or so, Gray, Konig, Engel, Singer, in Nature: binding by > motion). I'm sympathetic to the notion that synchrony could play a role in the binding of local features, but I think that there exists a great deal of uncertainty (and/or skepticism) in the vision community about how robust the effects are (whether they are seen in unanesthetized preps for example). I wouldn't go so far as to say that the hypothesis is "confirmed". It would also seem that by changing the timings of presentations of different visual objects that one should be able to get them to fuse in different ways (I don't know of any perspicuous demonstrations of this kind -- does anyone know of any?). One of the main classical arguments against "scanning models" of perception (e.g. based on phases re: alpha-waves or on absolute synchrony, e.g. models by Pitts & McCulloch and Walter in the 1950's) was that cortical discharge patterns could be driven by many different kinds of stimuli (clicks, flashes) that don't necessarily interfere with perceptual integration (see the discussion of McCulloch's Why the Mind is in the Head in Jeffress, Cerebral Mechanisms of Behavior (the Hixon Symposium), 1951). On the other hand, Chistovitch's experiments with trains of alternating single-formant vowels (JASA 77(3):785-809, 1985) seem to indicate that spectral components need to occur within a common 10-15 ms window in order for spectral integration to occur (for the 1-formant vowels to fuse together into a 2-formant percept). But this can't be the whole story, because we also can (better) separate 2-formant vowels with different F0's, such that multiple auditory objects don't necessarily fuse even though they overlap in time. Somehow one needs a neural mechanism for binding together multiple independent perceptual attributes of a stimulus. The strategy of using "feature detectors" (nodes, "place-coded" representational systems) runs into problems for objects with multiple attributes. Either one needs to have tuned neural arrays that encompass all possible combinations (ensembles of "combination-detectors"), or one needs some other means of encoding the conjunction of the different kinds of neural information. Synchrony (common time of activation) is one way of doing this, but then one needs a separate time-slot for each independent object. Lisman and others who work on "phase-codes" in the hippocampus have proposed a division of the hippocampal theta wave based on these kinds of notions (with 7 +/- 2 slots). An alternative (or complement) to binding-through-synchrony mechanisms is binding through common time (phase) structure. When we have a harmonic complex with a mistuned component or when we have two harmonic complexes (double vowels) with different F0's, the time (phase) relations within each object (complex vs. mistuned component; vowel1 vs. vowel2) are constant from one fundamental period to the next. The time (phase) relations across objects, however, are constantly changing. I think that any mechanism that groups by common time pattern from period to period should be able to segregate out multiple objects this way (Patterson's strobed temporal integration model, JASA 98(4);1890-4, 1995 is in the right direction, but I'm not sure how well the triggering algorithm would handle multiple objects with different F0's). If one thinks about the visual example with two overhead transparencies moved independently, a similar situation obtains: similar spatial phase relations group together -- it's easy to separate the two images. When the two images move together, they fuse and it's hard to separate them. Like in audition, visual neurons "phase-lock" to sinusoidal gratings and respond with precise latencies to contrast transients (lines, edges). (It is interesting that the responses of simple cells in V1 to drifting gratings look much like the responses of units in auditory cortex to AM tones with comparable modulation rates (Shamma, Computation in Neural Systems 7 (1996): 439-476).) Each image-object presumably produces a common spatio-temporal correlation structure that encodes all the edges and shapes (but as far as I know, there are no models of spatial form perception that use phase-locking to form spatio-temporal correlation patterns.) There is also a small literature on temporal response patterns generated by light of different wavelengths (e.g. Kozak & Reitboeck, Vision Res. 14:1890-1894, 1974), so that color-related time patterns could be mixed in with spatial form and texture information in the same channels (there is also the work of Optican, Richmond, McClurkin et al on multiplexing of these kinds of information). The point is that the problem of segmentation and binding is generated by the assumption that we are trying to assemble the outputs of "nodes" or local "feature detectors", and that these outputs themselves have little internal structure. Once we admit the possibility of time playing a role -- there being functionally-significant temporal (or spatial) microstructure to our inputs, either through synchrony or through common time pattern -- then much more flexible modes of association become possible. Suddenly we are able to do many things that traditional, explicitly-coded logic systems are able to do quite easily (but that are difficult to accomplish using std. connectionist nets). It is because suddenly, instead of operating on scalar signals (1 variable per node), we are able to have signals with higher dimensionality. This is the equivalent of stringing together symbols that denote different properties, as in symbolic logic. The red ball on the blue table is different from the blue ball on the red table because the representation of each object carries form and color information that are multiplexed together. This obviates the need for an ensemble of detectors for all of the combinations, or for the objects to be segregated and coordinated in time. Peter Cariani


This message came from the mail archive
http://www.auditory.org/postings/1997/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University