[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Auditory User Interface



                        Auditory User Interface

Preface:

In this unfortunately long message I explain what I believe to be some
important ideas that will make a significant positive difference in
the working lives of blind people.  Basically it's about a
sophisticated Auditory User Interface, very much like the GUI's
currently being used by the sighted.  I don't know if others are
working on ideas like these, so I'm sending this message to all the
people I can think of that may be able to help me do one of two
things.  Either to make sure that others have had these ideas before
and that they are being implemented them in a way that will reach the
blind computing community.  Or to get the ideas worked on by competent
people, in such a way as to make it become available.  If you're not
interested, please don't read on and delete this, and accept my
apology for filling up your mailbox.

However, if you're competent in signal processing and/or graphical
user interface programming (e.g. Object-Oriented Widget programming),
please read this message.  If you believe you can do something with
the ideas, please get in touch with me.  If you know someone or some
population (or email list) of people that might be interested in
developing an AUI like this, please pass this on to them, and let me
know you did.  It is important that good AUI's are developed for the
blind, so please help me by doing what you can to spread this message
around to people that could have something to do with implementing it.
By sending out this message, by the way, its contents become public
domain, and anyone can do with it what they want.  I just want blind
people to have an AUI available to them.


Background:

Computers have given a new lease on productive life to many blind
people, but the user interface that computers present to the blind is
to my knowledge very limited as far as output and feedback are
concerned: either text-to-speech or (rarely) printed braille are the
only outputs I know of.

While the Mac, Microsoft Windows, and to a far lesser extent,
Unix-based windowing environments such as X-Windows, have brought the
benefits of Graphical User Interfaces to the general population, these
benefits have been lost on the large population of blind computer
users.  Nonetheless these people are often highly dependent on their
computers for their work.

The problem with text-to-speech is that it is basically a slow,
one-dimensional, serial output form.  One word at a time is spoken;
only one item can be simultaneously presented by the computer to its
user.  All good user interfaces, and graphical ones in particular,
overcome seriality by simultaneous presentation of lots of
information. This is why a screen editor is better than a line editor,
and why statically presented icons in a Desktop GUI are better than
mere tacit availability of remembered names of programs or files in a
command-line interface.  I believe that much more productive computer
work is done using GUI's than using a serial interface.

Since by definition, the blind cannot use a GUI, what else can be used
to substitute?

I spoke for a few minutes with a blind man the other day.  He said he
was into "sonification" of computer environments.  While I told him
that the auditory system is much better than the visual system at
certain things, he responded that he was more interested in the things
that vision can do that hearing cannot.  This was a challenge to me,
and I thought for a little while about how to make it clear that the
blind need not be as handicapped as it may seem.  As someone on the
hearing-seminar@apple.com list or some other similar list said once,
the auditory system is better able to simultaneouly monitor multiple
sources of information in parallel, while vision is more serial,
focussing on one item at a time.  This is the basic insight I would
like to see developed into much more sophisticated Auditory User
Interfaces.  It seems to me that they could be built in a way very
similar to Graphical User Interfaces.

I need to present a little background before I get to the meat of the
idea.

There are people that do tape-recordings of nature and such that use
microphones set inside the acoustically-correct ears of an
acoustically-correct dummy head, to record exactly what it would sound
like to be there listening yourself, and apparently these recordings
are uncannily natural, more than just stereo.  Hearing can distinguish
not just left-to-right balance (stereo), but to a lesser extent
distance, height, and front-versus-back.  That is, there is an
"auditory space" with some correspondence to physical space in that
human perception can distinguish sounds originating in different
places (even simultaneously, as in the cocktail party effect).  One
can also sense the size and perhaps shape of a room to some extent.

I believe there must be algorithms to map a source sound and a
location (relative to the head) to the appropriate pair of sounds to
be played into the ears from stereo headphones so as to create the
effect that that sound emanated from that location, as if it were
recorded, for example, using the dummy head in a free-field acoustical
environment.  This ought to be a mechanical DSP task, though I'm
probably missing a lot of details a lot.


Auditory User Interface

With that background, here's the idea.  For each icon, button, or
simple or composite widget or graphical object displayed in a GUI,
locate a repeating sound at some location in auditory space
(preferably corresponding in some sense to the location of the widget
in the GUI's visual space).  Add up each of the left-ear- and
right-ear-mapped outputs together to form composite left-ear and
right-ear signals, and pipe them to stereo headphones.  The effect
should be a cocktail-party-like simultaneous presentation of lots of
sources at once.  Just as I can tell very quickly when my old car is
making a new sound, an AUI user could monitor a lot of sound-sources at
once.

One could play around a lot with hooking up each widget or icon to
different kinds of sounds.  The general rules being 1) to minimize the
obnoxiousness of having to listen to the stuff all day long, 2) to
maximize the distinctness of the sounds, 3) to maximize
ease-of-learning.  Nature sounds (birds and gorillas come to mind as
relatively easily distinguishable) or musical instruments (perhaps
relatively pleasant-sounding) or human voices repeating a relevant
word or phrase like "open file", or "save file", (relatively
easily-learned), etc.  While repeated phrases are easiest to learn,
they also might be more obnoxious to live with.  If non-speech is
used, there should be a one-sound-at-a-time mode for learning what the
meanings of the sounds are.

To optimally implement such a system, experiments would have to be
done to determine the granularity of auditory space, on the basis of
which one could prevent different auditory icons from being located
indistinguishably close together (though the types of the paired
sounds undoubtedly interact with their distinguishability).

Next, a mouse could be represented auditorily with a bee buzz or
something (a bee can fly around in space, after all, which is why it
seems appropriate to me), which can be moved around, at least in two
dimensions, in the auditory space as it moves around on the mouse pad.

The sources should be fairly low volume, so as to form an auditory
background or murmur, from which the attentive listener can pick out
what is of interest.

Highlighting can be represented by increasing the volume or pitch or
repetition rate, etc., of the sound associated with a widget.  So when
the mouse's auditory-space location gets close to the widget's
auditory-space location (tracked by buzzing the mouse towards the
widget's sound-source), the widget can be highlighted in this way, and
a click can produce some appropriate effect on it.

Association of simple widgets as part of a complex widget (e.g.,
buttons in a button box) could be done by putting their collective
output through some kind of filter, adding some kind of
room-reflectance or low-volume pink noise or something to show that
they're associated.

Even sequences of text, or computer programs, can be represented
auditorily in parallel to through such an AUI, by simultaneously doing
text-to-speech for each, and by locating each line of the program at a
different point in auditory space (within the constraints of what's
distinguishable).  Highlighting of selected ranges of text could be
done by changing the voice-quality or pitch of the text-to-speech
system's speaker.


Enhancements:

Later, more expensive versions of this kind of system might have a
virtual-reality-like reorientation system, which tracks the user's
head orientation and changes the sounds' apparent locations so they
don't appear to move, but only the head moves in a fixed space, giving
the effect that the user can "look at" different locations, thus
bringing them to front/center/level.  This kind of input function
could substitute even more naturally for the function of a mouse.  It
could also be done cheaply using some appropriate magnet-on-a-hat and
some kind of passive electromagnetic tracking system.  Joystick
applications could also be envisioned.

In different program contexts (e.g., a word-processor vs a spreadsheet
vs something else), the rhythm or room-reflectance or speed of the
sounds could be changed, creating a different auditory context, in
addition to the differences from simply having a different set of
auditory widgets in the environment.  One could do effects like
virtual-reality flying, too, so that different programs could reside
in different virtual locations, and the user could move around in the
virtual space using a mouse or other input functions and make use of
the functions available there.


Summary:

While the last two paragraphs have gotten rather extreme and
visionary, I believe the basic ideas above are both sound and fairly
easily implementable.  Every widget in an object-oriented GUI could
simply be augmented with an auditory representation (an associated
digitized sound file) and an auditory-space location (consistent with
what's distinguishable in auditory space), and the AUI infrastructure
should take care of generating the composite auditory scene.  This AUI
infrastructure should include 1) the signal-processing to map a
one-channel sound and a location relative to the head into two sounds
that put the source at the correct apparent location in auditory
space, plus 2) addition, to put all the sources in their locations at
once, plus 3) event-mapping that deals with mouse location (moving it
around in auditory space) and highlighting (doing the appropriate
volume-control or other change to the source to signify highlighting)
and other events like mouse-clicks, etc.


Appeal:

Someone that knows more about GUI implementation than I should be able
to take this and run with it.  I don't know enough about acoustics or
about GUI design or about signal-processing or about Object-Oriented
Programming to be able to do a decent version of this myself.  Also I
work more than full time and don't have time to pursue it.  But as I
said initially, an AUI of this kind would vastly enrich the computer
interface available to the blind, and make their work much more
productive and enjoyable, significantly reducing the limitations on
working life produced by blindness.  I want to see it happen.  I think
most of us want to see it happen, too.  The fact that it has
virtual-reality implications, as well as implications for enhancing
sighted computer-users' interactions with computers should also not be
overlooked.  If you can help these ideas happen by passing them on to
someone you know with the right expertise or interest, please do so,
and let me know about it.  Thanks very much.

Tom Veatch
veatch@andrea.stanford.edu (but I read my email only once a week)
home: (415) 322-6569
315 E. O'Keefe St., #10
East Palo Alto, CA 94303