[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: perceptual segregation of sound

Apologies if this is a naïve contribution (not being a psychologist), but I
am struck by the thought that, from a strictly computational perspective,
there are two quite different analogues of 'attention' ...

The first is the notion of the 'weight' that can be assigned to any aspect
of perceptual experience (beit sensory input or any intermediate or
high-level representation).  In other words, a process that actively manages
the 'visibility' of information to some other (interpretive?) process can be
invoked in order to either allocate restricted computational/energy
resources, or to modulate the effect of such information by assigning
relative 'importance' with respect to competing information.

The second, is precisely the amount of computational/energy resource that is
allocated to an interpretive or generative (planning) process.  In this
case, the search over possible scenarios (hypotheses/plans) can be resource
managed to trade accuracy against speed and energy cost.

It seems to me that these two aspects are separable in that they can be
managed independently, i.e. particular sensory input could be given high
weight (salience?) but then allocated a restricted search resource, or vice

Questions of divided attention thus point to multiple instantiations of such
management processes that, nevertheless, need to cooperate in order to
ensure that the net result is a single stable percept (or behaviour) that is
of most value to the organism in a given environmental context.

Now as to the original issue - "are we really capable of perceptually
segregating multiple sources concurrently, or are we just focusing our
attention on one source, and then shifting it very quickly to another
source?".  Given the hypothesis that multiple interpretations need to be
coordinated, then what is the difference?  It's like asking whether a
central computer server uses a parallel processor or a serial processor -
what is important is that the overall solution is coherent and that
information is not compromised, not the 'data sampling' mechanism involved.

Having said that, there is indeed a fundamental question as to whether we
believe that an organism's perception of the world is based on analysis of
sensory input (bottom-up) or on a process of checking whether it fits with
our expectations (top-down).  I understand that there is evidence for the
latter in theories about the saccadic movements of the eyes, and this is not
a million miles from Martin Cooke's 'glimpsing' model of speech perception.
The extra step needed is to invoke mechanisms that operate on (and
incorporate) information that is _not_ expected, and hence to allocate
attention based on a judgement of its ecological salience (either by
adjusting weights, re-allocating computational resources, or just by turning
your head!).





Chair of Spoken Language Processing
Speech and Hearing Research Group (SPandH)
Department of Computer Science, University of Sheffield,
Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK

e-mail: r.k.moore@xxxxxxxxxxxxxx
web:    http://www.dcs.shef.ac.uk/~roger/
tel:    +44 (0) 11422 21807
fax:    +44 (0) 11422 21810
mobile: +44 (0) 7910 073631

> -----Original Message-----
> From: AUDITORY Research in Auditory Perception
> [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Mark Every
> Sent: 04 May 2006 16:23
> To: AUDITORY@xxxxxxxxxxxxxxx
> Subject: Re: [AUDITORY] perceptual segregation of sound
> Dear List,
> Many thanks to all contributors for their enlightening replies to my
> initial question, this has been a very interesting discussion.
> > are we really capable of perceptually segregating multiple sources
> concurrently, or are we just focusing our attention on one source, and
> then shifting it very quickly to another source?
> I would like to summarise and reply to some comments raised, though
> mainly by conjecture on my part.
> Firstly, examples have been given (e.g. listening to music) whereupon
> repeated exposure to a sound and use of top-down processes, additional
> information is extracted from perceptual streams that were not initially
> the focus of attention. To make a loose analogy, repeated listening must
> be like learning a foreign language; we start off learning the most
> useful words/sounds and, over time, as these become part of our
> vocabulary, we redirect our attention at more subtle structures and
> relationships between words/sounds. However, it is evident that we can
> form multiple perceptual streams even from completely unfamiliar sounds,
> so let's try to isolate perceptual stream segregation from any
> additional complications inherent in repeated listening. Brian puts it
> thus: "top down processes are useful for resolving ambiguous cases".
> John's remark that: "survival requires an animal's sensory
> organs to produce a timely response to environmental information"
> implies that we should maximise the "potential evolutionary benefit" of
> perceived information in the shortest possible time. I would imagine
> that to this end, using all processing resources is better than using
> only some of them, so it would make sense to use spare resources to
> analyse any content that is not the main focus of attention (the basis
> of the "perceptual load" theory of selective attention that Erick
> mentioned). Erick's comments about the sensory buffer are also
> interesting in light of the above mentioned topic of repeated listening,
> since we can repeatedly process a sound in short-term memory even if it
> was physically heard once. However, he mentions a limit of around 4s for
> the sensory buffer. So, how is the situation Kevin described possible:
> 	" In my classes, I have had students who can "go back" to a sound
> (or
> sounds) they heard and extract components that they did not 'hear' when
> the sound was presented. In one case a student re-listened to a piece he
> had heard a couple of weeks previously.) "?
> Is 4s a typical limit for people with average memory, excluding those
> with photographic/eidetic memory?
> This was relatively clear in my mind before, but now I'm confused: what
> is attention? If attention can be present even at a cochlear level, then
> would we define it by its functionality rather than its "level in a
> perceptual hierarchy of processes?"
> Finally, to add to the arguments for preattentive structuring of sensory
> evidence into multiple streams, some experiments are described in
> (Bregman A.S., Auditory Scene Analysis, 1990, Chapter 2-Relation to
> Other Psychological Mechanisms: Attention) where perception of
> nonfocused streams in speech mixtures can extend even to recognition of
> words or associated meanings of words. However, as Erick pointed out,
> according to the "perceptual load" theory, perception of nonfocused
> streams will not necessarily always reach such an advanced state given
> different sounds and perception tasks, due to limited cognitive
> resources.
> Once again, thanks to all contributors.
> Mark
> --
> Mark Every <m.every@xxxxxxxxxxxx>
> CVSSP, SEPS, University of Surrey