Re: perceptual segregation of sound (Prof Roger K Moore )


Subject: Re: perceptual segregation of sound
From:    Prof Roger K Moore  <r.k.moore@xxxxxxxx>
Date:    Thu, 4 May 2006 18:09:25 +0100

Apologies if this is a naïve contribution (not being a psychologist), but I am struck by the thought that, from a strictly computational perspective, there are two quite different analogues of 'attention' ... The first is the notion of the 'weight' that can be assigned to any aspect of perceptual experience (beit sensory input or any intermediate or high-level representation). In other words, a process that actively manages the 'visibility' of information to some other (interpretive?) process can be invoked in order to either allocate restricted computational/energy resources, or to modulate the effect of such information by assigning relative 'importance' with respect to competing information. The second, is precisely the amount of computational/energy resource that is allocated to an interpretive or generative (planning) process. In this case, the search over possible scenarios (hypotheses/plans) can be resource managed to trade accuracy against speed and energy cost. It seems to me that these two aspects are separable in that they can be managed independently, i.e. particular sensory input could be given high weight (salience?) but then allocated a restricted search resource, or vice versa. Questions of divided attention thus point to multiple instantiations of such management processes that, nevertheless, need to cooperate in order to ensure that the net result is a single stable percept (or behaviour) that is of most value to the organism in a given environmental context. Now as to the original issue - "are we really capable of perceptually segregating multiple sources concurrently, or are we just focusing our attention on one source, and then shifting it very quickly to another source?". Given the hypothesis that multiple interpretations need to be coordinated, then what is the difference? It's like asking whether a central computer server uses a parallel processor or a serial processor - what is important is that the overall solution is coherent and that information is not compromised, not the 'data sampling' mechanism involved. Having said that, there is indeed a fundamental question as to whether we believe that an organism's perception of the world is based on analysis of sensory input (bottom-up) or on a process of checking whether it fits with our expectations (top-down). I understand that there is evidence for the latter in theories about the saccadic movements of the eyes, and this is not a million miles from Martin Cooke's 'glimpsing' model of speech perception. The extra step needed is to invoke mechanisms that operate on (and incorporate) information that is _not_ expected, and hence to allocate attention based on a judgement of its ecological salience (either by adjusting weights, re-allocating computational resources, or just by turning your head!). Regards Roger ________________________________________________________________ Prof ROGER K MOORE BA(Hons) MSc PhD FIOA MIEE Chair of Spoken Language Processing Speech and Hearing Research Group (SPandH) Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK e-mail: r.k.moore@xxxxxxxx web: http://www.dcs.shef.ac.uk/~roger/ tel: +44 (0) 11422 21807 fax: +44 (0) 11422 21810 mobile: +44 (0) 7910 073631 ________________________________________________________________ > -----Original Message----- > From: AUDITORY Research in Auditory Perception > [mailto:AUDITORY@xxxxxxxx On Behalf Of Mark Every > Sent: 04 May 2006 16:23 > To: AUDITORY@xxxxxxxx > Subject: Re: [AUDITORY] perceptual segregation of sound > > Dear List, > > Many thanks to all contributors for their enlightening replies to my > initial question, this has been a very interesting discussion. > > > are we really capable of perceptually segregating multiple sources > concurrently, or are we just focusing our attention on one source, and > then shifting it very quickly to another source? > > I would like to summarise and reply to some comments raised, though > mainly by conjecture on my part. > > Firstly, examples have been given (e.g. listening to music) whereupon > repeated exposure to a sound and use of top-down processes, additional > information is extracted from perceptual streams that were not initially > the focus of attention. To make a loose analogy, repeated listening must > be like learning a foreign language; we start off learning the most > useful words/sounds and, over time, as these become part of our > vocabulary, we redirect our attention at more subtle structures and > relationships between words/sounds. However, it is evident that we can > form multiple perceptual streams even from completely unfamiliar sounds, > so let's try to isolate perceptual stream segregation from any > additional complications inherent in repeated listening. Brian puts it > thus: "top down processes are useful for resolving ambiguous cases". > > John's remark that: "survival requires an animal's sensory > organs to produce a timely response to environmental information" > implies that we should maximise the "potential evolutionary benefit" of > perceived information in the shortest possible time. I would imagine > that to this end, using all processing resources is better than using > only some of them, so it would make sense to use spare resources to > analyse any content that is not the main focus of attention (the basis > of the "perceptual load" theory of selective attention that Erick > mentioned). Erick's comments about the sensory buffer are also > interesting in light of the above mentioned topic of repeated listening, > since we can repeatedly process a sound in short-term memory even if it > was physically heard once. However, he mentions a limit of around 4s for > the sensory buffer. So, how is the situation Kevin described possible: > " In my classes, I have had students who can "go back" to a sound > (or > sounds) they heard and extract components that they did not 'hear' when > the sound was presented. In one case a student re-listened to a piece he > had heard a couple of weeks previously.) "? > Is 4s a typical limit for people with average memory, excluding those > with photographic/eidetic memory? > > This was relatively clear in my mind before, but now I'm confused: what > is attention? If attention can be present even at a cochlear level, then > would we define it by its functionality rather than its "level in a > perceptual hierarchy of processes?" > > Finally, to add to the arguments for preattentive structuring of sensory > evidence into multiple streams, some experiments are described in > (Bregman A.S., Auditory Scene Analysis, 1990, Chapter 2-Relation to > Other Psychological Mechanisms: Attention) where perception of > nonfocused streams in speech mixtures can extend even to recognition of > words or associated meanings of words. However, as Erick pointed out, > according to the "perceptual load" theory, perception of nonfocused > streams will not necessarily always reach such an advanced state given > different sounds and perception tasks, due to limited cognitive > resources. > > Once again, thanks to all contributors. > > Mark > > > -- > Mark Every <m.every@xxxxxxxx> > CVSSP, SEPS, University of Surrey


This message came from the mail archive
http://www.auditory.org/postings/2006/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University