Re: perceptual segregation of sound (Erick Gallun )


Subject: Re: perceptual segregation of sound
From:    Erick Gallun  <gallun@xxxxxxxx>
Date:    Wed, 3 May 2006 00:11:19 -0400

Sorry to just now get around to responding to this one. I have spent a fair bit of time thinking about these sorts of issues recently and wasn't sure how to put it all down. Perhaps the other responses have already said most of what I've written below, but I did want to make sure I responded. The initial question was >are we really capable of perceptually >segregating multiple sources concurrently, or are we just focusing our >attention on one source, and then shifting it very quickly to another >source? Broadbent (1956) suggested that we switch very rapidly between two auditory sources, giving the appearance of simultaneous processing. His estimate of the switching rate was 6 Hz, which agrees pretty well with a number that has been suggested for visual switching (Miller and Bonnel, 1994, say a switch takes 150 ms). I think that these numbers may be right for the upper limit on how fast we can switch, but it doesn't address the issue of whether we must switch. Treisman (1969) hypothesized that much of the literature on selective listening (i.e., Cherry, 1953; Treisman, 1964; see also Moray, 1969; Wood and Cowan, 1995) could be accounted for by the idea that when listeners must analyze two speech streams for the same features, then there was conflict and only one could be analyzed. When the analysis could be carried out by independent feature analyzers, however, there was no conflict and analysis proceeded in parallel. This is a pretty deep point for me. It also relates to Dan Levitin's nicely put point about the automatic nature of sensory processing. Lavie (2005; Tsal and Lavie, 1994) has suggested the "perceptual load" theory of selective attention in which spare processing resources are automatically allocated to non-target items. She uses this idea to explain why some situations make it look like selective attention acts to filter out distractors at the level of simple features (high load cases) whereas others make it look like all stimuli are automatically processed to the level of semantic recognition (low load cases). This is relevant to the divided attention question because it suggests that whether or not we can listen to two things at once will depend both on what sort of analysis is required and the complexity of the stimuli. In situations where only a few simple stimuli are present, the load is low and automatic recognition may make performing two tasks fairly easy. When the load is high, performance may depend fairly substantially on what the listener is trying to do. This is also relevant to the Cusack and Carlyon (1999) work on stream segregation in which repeating tone patterns are heard as a single stream whenever the listener begins to respond to them and then they gradually become two streams. In that work, the listener is sometimes asked to perform a distracting task and stream segregation seems to wait until attention is focused on the streaming stimulus. If segregation were automatic (they argue) then the number of streams heard would depend only on the time elapsed since the start of the stream. Sussman (2005) showed ERP results that suggest that stream segregation is automatic, since her listeners were not attending to the stimulus at all. Since there was no distraction, however, this may have been a low load case. I think the jury is still out on this one (and I may have missed some crucial piece of evidence, naturally), but it seems to me that it falls in line with a general load explanation. Finally, to get back to the original question, I think that whether or not we are following two streams simultaneously depends heavily on what one means by following a stream. Information extraction can take many forms and I'm certain that a good many of them proceed automatically. In addition, some situations allow the listener to store one of the inputs in memory (the sensory buffer described by Cowan, 1988; 1995) and recall it a second or two later for further processing. If that sensory trace is not overwritten by new input (it seems to be quite fragile, as is the visual buffer described by researchers into the "attentional blink"), then the listener can perform two analyses in series even if the two inputs occur simultaneously. If this "retrieve and analyze" step can occur automatically (and it should), then there is another mechanism that could result in "simultaneous" processing. So, to summarize, I think that the auditory system is probably built to retain and process as much information as possible. This means that spare processing resources are quite likely to be automatically allocated to any stimuli that are not actively being analyzed. This is a very good strategy for the only sensory system that can detect and identify stimuli from 360 degrees across large distances with pretty good spatial accuracy (smell is a very poor localization mechanism, I believe, despite its omindirectionality). In addition, it seems that a sensory buffer (very short-term sensory memory of 1-4 seconds) is imperative for dealing with complex, multi-source situations like the two lions in the example. However, do you really care what they are saying? Detecting both and localizing them accurately counts as a good survival strategy. With a sensory buffer and some automatic processing added to a different feature analyzer for speech and for lion roaring, you could probably know where both lions were located, how loudly they were roaring, if they were getting louder or softer and that your friend just yelled "LION!" Add in a little more short-term memory and you could even tell whether or not the lions were louder and less reverberant (closer) than they were the last time they roared. Sounds to me like rapidly shifting from one to the other is a pretty poor substitute. Especially since our remarkable ability to "fill in" the world around us in the lack of new information would make the whole complex process just seem like continuous processing of all the input. Erick Gallun Postdoctoral Fellow Hearing Research Center Boston University P.S. - If this all seems like a lot of "ifs" "maybes" and "probablies", I apologize. The work that needs to be done on these issues is just beginning to roll (in my opinion), especially my own. Stay tuned, though, because what I wrote above pretty much summarizes the research project I'm currently engaged in! To that end, I'm completely open to dissenting opinions :) If I'm missing something, I want to be the first one to hear about it.


This message came from the mail archive
http://www.auditory.org/postings/2006/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University