[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: your mail



> I was asked for some references on books (preferably textbooks or
> conference proceedings) that connect physiology/neurophysiology
> and music perception.  I am working in psychoacoustics but I am
> not familiar with this specific topic.  Someone pointed towards
> Helmholtz' "Lehre von den Tonempfindungen" but I thought there
> should be something more recent.
>
> I would be glad if anyone could help me.
>
> Stefan

Dear Stefan,

Some time ago I constructed a mini-FAQ about psychoacoustics. I think
the most interesting part of it , is the bibliography at the end.

Here it is:

CUT HERE-----CUT HERE-----CUT HERE-----CUT HERE-----CUT HERE-----CUT HERE-----

           ______________________________________________________
          |                                                      |
          |   HUMAN AUDIO PERCEPTION FREQUENTLY ASKED QUESTIONS  |
          |              version 2.0   June 4 , 1994             |
          |______________________________________________________|


                          I n t r o d u c t i o n
                        ---------------------------

All started from a recent UseNet posting of mine. From the volume of mail I
received  ,  it seems to be a very interesting subject.I decided to release
an  edited  version  of  all the answers I received so far in the form of a
F.A.Q.  (Frequently Asked Questions).

This version is preliminary.It is still *VERY* incomplete .With your help I
will  try  to  make  it  as complete as possible.Please read on to see what
other additional information is needed...

The main topic remains the same :

Given  two spectra ( STFFT's Short Time Fast Fourier Transforms for example
)  we  try  to  estimate  a  psychoacoustic distance between them (i.e.:  a
timbral metric).  This involves some additional data:

1)  Equal  loudness  curves  (Fletcher-Munson).   Originally  published  in
J.A.S.A.   (Journal  of the Acoustical Society of America) in 1933.  Please
send  to  me  your  data/approximations/formulae.   Still  more information
needed on this subject.

2)   Bark   frequency   scale   (Critical  Bands)  .   I  have  found  some
approximations  in  the  range  0..5  KHz .  Again more precise information
needed.

3) "Masking"  effects .  Useful introductory information can be found at the
MPEG Audio compression FAQ (available via anonymous FTP at sunsite.unc.edu,
at IUMA archive).

4) Other psychoacoustic data ?

______________________________________________________________________________


-MANY  THANKS  to  all those kind people who contributed to this text (they
are too many to list).

-My comments are put in square brackets [ ... ].

-A  recent  version  of  this  text  is  available  via  anonymous  FTP  at:
svr-ftp.eng.cam.ac.uk  (  maintained by Tony Robinson <ajr@eng.cam.ac.uk> )
Directory:    /pub/comp.speech/info   ,   Filename:   HumanAudioPerception.
Please note that this FAQ is *NOT* restricted in speech topics.


                          Argiris A. Kranidiotis

                           University Of Athens
                          Informatics Department

                       akra@zeus.di.uoa.ariadne-t.gr


______________________________________________________________________________

                           Equal loudness curves
______________________________________________________________________________


From: Various people
------------------------------------------------------------------------
-Flecher-Munson curves (the most popular answer).

Peak sensitivity at 3,300 Hz , falling off below 40 Hz, and above 10 kHz.

-"An  Introduction  to  the Psychology of Hearing".  By Moore , 3d edition.
(the most popular reference).


From: Vincent Pagel <Vincent.Pagel@loria.fr>
------------------------------------------------------------------------

[...]

It's a family of curves [Fletcher Munson curves --AK] a bit like this:


     Db ^|
        ||                            |
        | \                          |
        | |                         |
        |  \                       /
        |   |                     /
        |    \________     ______/
        |             \___/
        |
        |
        |_________________________________________________>  Frequency (Hz)
           400      2500   6000    10000  20000


PERCEPTUALLY  all  the sounds corresponding to the points on the curve have
the same intensity :  this means that the ear has a large range where it is
nearly  linear  (  1000  to  8000 Hz ), achieving better result on a little
domain (around 3000 Hz if my memory serves).

[ the curve has a minimum at 3,300 Hz -- AK ]

The rate drops dramatically after 10000 Hz and before 500 Hz ).

You  can  draw  different  equal  loudness  curves  depending  on the first
intensity  you  begin  with ( e.g.  if the intensity at 2500Hz is 50 db you
get one curve, but if you start at 2500 Hz with 70 db you get another equal
loudness  curve  ....  generally equal loudness curves have nearly the same
shape and it does not depend too much on the point it begins at)

To my knowledge there is no mathematical formula given to approximate equal
loudness  curves,  but  with  the data in the book by Moor it should not be
very difficult to find an approximation.


From: Angelo Campanella <acampane@magnus.acs.ohio-state.edu>
------------------------------------------------------------------------

Obtain the ISO "Zero Phons" standard threshold of human hearing.

-The standard was ISO 389-1975 "Audiometer Standard Reference Zero".
-The US Equivalent is ANSI S3.6 - 1969.

The following numbers apply:

These are dB re 20 micropascals for a sound of pure tone or very narrow
band noise:

--------------------------------------------------------------------------
Audio Frequency        125   250   500  1000  2000  3000 4000  6000 8000
=========================================================================
Human (Monaural)
Threshold of Hearing   rmal young adult
with undisturbed
hearing.  dB re
20 micropascals.


Binaural hearing is 10 to 15 dB better, since the brain has a magnificent
capability to correlate the simultaneous listening of both ears.


From: walkow@compsci.bristol.ac.uk (Tomasz Walkowiak)
------------------------------------------------------------------------
The equal loudness curve can be approximated by:

E(w)=1.151*SQRT( (w^2+144*10^0^4)) )

From: Robinson et al.: Br.J.A.Phys. 7, 166-181, 1956.

This  approximation  is  for  Nyquist  frequency  equal  to  5  kHz, so
w = 2*Pi*f/5kHz , for 0<f<5kHz.  Therefore E(w) is defined for 0<w<Pi.  The
E(w) is linear.  And usually is applied to the power spectrum.



______________________________________________________________________________

                        Bark scale / Critical Bands
_________________________________________________________________g@netcom.com
 (Filiz Basbug)
------------------------------------------------------------------------

>From a paper given by David Lubman at Inter-Noise '92(Toronto) the critical
band rate (z) in Bark can be determined by

z=[13*arctan(0.76*f)+3.5*arctan(f^2/56.25)]

where  f  is in kHz and the angles returned from the arctangent expressions
are  in  radians.   When  z is an integer, f is the dividing line frequency
between two critical bands.

If the frequency corresponding to a particular Bark (z) is desired, use the
following:

f={[(exp(0.219*z)/352)+0.1]*z-0.032*exp{-0.15*(z-5)^2]}

where f is in kHz.

Finally,  the  critical  bandwith (df) can be calculated for a given center
frequency (f) by

df={25+7z and df is in Hz.

There  are  no  explicitly stated limits on the variables, but according to
the  table that Mr.  Lubman generated from the formulas, 1<=z<=24 for Bark,
and   20<=f<=15500  for  frequency,  except  50<=f<=13500  for  the  center
frequencies.  (df) ranges from 100 Hz to 3500 Hz.

Also note that these formulas are generally accepted approximations but, as
far  as  I  know,  are  not yet standardized.  I believe they have all been
empirically derived.

Calculation  of  psychoacoustic  Loudness steady-state sounds is defined in
ISO 532, ISO Rec.  675, and DIN 45631.

Extension  to  non-steady  sounds  was  defined  by  Zwicker but is not yet
standardized (as of 1992).


___________________________________________cts
______________________________________________________________________________



From: Vincent Pagel <Vincent.Pagel@loria.fr>
--------------------------------------------------------------------------

[...]

About curves corresponding to the masking effect:

Those curves show the minimal intensity a sound with a given frequency must
have  to  be  perceived,  when  played simultaneously with a sound having a
constant frequency during th masking effect of a 500 Hz frequency ....  you'll
 play it for
example  a 50 db ....and at the same time you'll play another frequency and
you adjust the level of the second frequency to find out the limen where it
is perceived.  For example a soundz ).


______________________________________________________________________________

                   Psychoacoustic norm / Timbral Metric
______________________________________________________________________________



From: Fahey@psyvax.psy.utexas.edu (Richard Fahey)
--------------------------------------------------------------------------

These  curves  [Fletcher-Munson  again...--AK]  may  be  used  to normalize
spectra for loudness at different frequencies (changing dB into phont  can  be
 made more psychologically real by changing the frequency
scale  to  the  Bark  scale,  and  using  an  auditory  filter to smear the
spectrum.

The distance between two spectra represented in ways similar to this can be
calculated as a Euclidean distance, and compared with psychoacoustic data.

From: James Beauchamp <beaucham@uxh.cso.uiuc.edu>
------------------------------we  are comparing two time-varying spectra which
 are very similar to
one another.

This  would  be  used  to  measure the efficiency of a particular synthesis
technique.  Our first guess was to use :

                     SUM(k=1 to n) ((A2(t,k) - A1(t,k))^2
        e(t) = sqrt( ----------------is the partial
number  t  is  time, and A1(t,k) and A2(t,k) are the kth partial amplitudes
vs.  time for signals s1(t) and s2(t).  Then the average error over time is
given by

        e_ave = (1/DUR) SUM(t=0 to DUR) e(t)

The  theory  is that given two syntheses of signal s1, namely s2 and s3, s2
is  a  better  synthesis  of  s1  than  is  s3  if e_ave_2 < e_ave_3.  This
formulation seems to work fairly well, but it really fails when a synthesis
has weak upper partials not found in the original.  The weak upper partials
contribute  very little to the error calculation, but make a big difference
in  the  perceived  result.  Therefore, it would probably be much better to
add  up  the  amplitudes within critical bands than to give all frequencies
equal   weights   as   we   have   been   doing,   and   also   to  use  an
amplitude-to-loudness (in sones) translation.  (Usually, S = K*A^0.6).

The  problem with equalizing the A(k,t) using the Fletcher-Munson curves is
that  one  doesn't really know the absolute level of a given sound prior to
playing  it  back,  except  in a lab testing situation, perhaps.  Thus, the
difference   result  would  vary  with  playback  level,  an  uncomfortable
situation.


From: Richard Parncutt <parncutt@sound.music.mcgill.ca>
-------------------------------------------------------------------------

The psychoacoustic distance between two steady state complex sounds (or its
converse,  perceived  similarity)  is  i, and the degree to
which  the sounds have pitches in common (where by "pitch" I mean PERCEIVED
pitch in the psychoacoustic sense.)

Terhardt  (1972)  distinguished  two  kinds  of  pitch.   Spectral  pitches
correspond  to  individual  audiboximately harmonic pattern, suggesting the
 presence of an (embedded)
harmonic-complex  tone.   Most  pitches  perceived  in everyday and musical
sounds  are  virtual  pitches.  The relative perceptual salience of pitches
may be estimated by the algorithm of Terhardt et al.  (1982).

Parncutt  (1989) defined the pitch commonality of two comple  they  have
 perceived pitches in common, depending on the
number  and salience of coinciding pitches (by comparison to non-coinciding
pitches).    Calculated   pitch  commonality  values  correlate  well  with
similarity  judgments  of  pairs  of  complex sounds that differ relatively
little   in   loudness   and   timbre  (Parncutt,  1989,  1oretic  accounts  of
 the strength of harmonic relationship between
musical tones and chords (Parncutt, 1989).


From: Christopher John Rolfe <rolfe@sfu.ca>
-------------------------------------------------------------------------

Metric  Cognitive science, however, points out that perceptual
space may be non-Euclidean.  In other words, there is NO simple metric.




______________________________________________________________________________

                            References / Books
______________________________________________________________________________




"Loudness: its definition, measurement, and calculation, Journal of the
Acoustical Society of America, 1933, vol 5, p 9.

Author: Fry R.B.  PhD Dissertation, Duke Unive, Stevens S.S.
Title: Voice Level: Autophonic Scale, Perceived Loudness, and Effects of
Sidetone
Journal: JASA
Volume: 33
Number: 2
Page(s): 160-167
Date: 1961

Author: Peterson G E, McKinney N P
Title: The measurement of speech power
Journal: Phonetica
Volume: 7
Page(s): 65-84
Date: 1961

Author: Schlauch R.S., Wier C.C.
Title: A Method for Relating Loudness-Matching and Intensity-Discrimination
Data
Journal: Journal of Speech and Hearing Research
Volume: 30
Page(s): 13-20
Date: 1987

Author: Small AM, Brandt JF, Cox PG
Title: [...?] function of signal duration
Journal: JASA
Volume: 34
Page(s): 513-514
Date: 1962

Author: Stevens S.S.
Title: Calculation of the Loudness of Complex Noise
Journal: JASA
Volume: 28
Number: 5
Page(s): 807-832
Date: 1956

Handel, S. (1989).  "Listening: an introduction to the perception of
auditory events." MIT, Cambridge, MA

Dooling, R. J. and Hulse, S. H. (ed.) (1989).  The comparative
psychologoy of audition: Perceiving complex sounds.  Erlbaum, Hillsdale, NJ.

McAdams, S. and Bigand, E. (ed.) (1993).  Thinking in sound: the
cognitive psychology of human audition. Oxford Univ. Press, NY

Sloboda, J. A. (1985).  The musical mind: The cognitive psychology of
music.  Clarendon, Oxford

Proceedings of IEEE, V. 81, No 10 ,"Signal Compression Based on Models
of Human Perception".

Grey, J.M. "Multidimensional Perceptual Scaling of Musical Timbres"
Journal of the Acoustical Soceiety of America, 63, 1493-1500.

Repp, B.H (1984) "Categorical perception: Issues, methods, findings"
In N.J. Lass (ed.) Speech and Language: Advances in Basic
Research and Practice. Vol. 10. 1249-1257.

Moore and Glasberg, JASA 74(3) 1983. "Suggested formulae for calculating
auditory-filter bandwidths and excitation patterns"

Bladon and Lindblom, JASA 69(5) 1981. "Modeling the judgement of vowel
quality differences"

J. R. Pierce, The Science of Musical Sound (Freenam, New York, 1983).

J. G. Roederer, Introduction to the Physics and Psychophysics of Music
(Springer-Verlag, New York, 1975).

S. S"Measurement of Loudness", JASA 27 (1955): 815

S. S. Stevens, "Neural Events ans Psyhcophysical Law", _Science 170_
(1970): 1043

E. Zwicker, G. Flottorp, and S. S. Stevens, "Critical Bandwidth in Loudness
Summation",  JASA 29 (1957): 548

Author:Hynek Hermansky
Institution:Speech Technology Laboratory, Divisios, Inc., 3888 State Street,
 Santa Barbara, CA 93105, USA
Title:Perceptual linear predictive ({PLP}) analysis of speech},
Journal: JASA
Year:1990
Vol.87 ,Number 4 , Page(s):1738-1752

Gersho et al (Bark Spectral Distance).
IEEE Journal Selected areas of Communications Sept. (?) 1992


Name:    "An Introduction to the Physiology of Hearing"
Author:  James O. Pickles,Dept. of Physiology,Uni. Birmingham,England.
Publisher: Academic Press,1982.
ISBN 0-12-554750-1 (hardback)
ISBN 0-12-554 (paperback).

"An introduction to the psychology of hearing" by B. MOORE , 3d Edition.

Terhardt, E. (1972mplex tones). Acustica, 26, 173-199.

Terhardt, E., Stoll, G., & Seewann, M. (1982). Algorithm for
extraction of pitch and pitch salience from complex tonal signals.
Journal of the Acoustical Society of America, 71, 679-688.

[ The following papers are from Richard Parncutt
  (parncutt@sound.music.mcgill.ca) -- AK ]

Bigand, E., Parncutt, R., & Lerdahl, F. (under review). Perception of
musical tension in short chord sequences: The influence of harmonic
function, sensory dissonance, horizontal motion, and musical
training.  Perception and Psychophysics.

Parncutt, R. (1993). Pitch properties of chords of octave-spaced
tones. Contemporary Music Review, 9, 35-50.

Parncutt, R. (1989). Harmony: A Psychoacoustical Approach.
Springer-Verlag, Berlin. (Springer Series in Information Sciences,
Vol. 19. Eds.: T.S. Huang & M.R. Schroeder. ISBN 3-540-51279-9. 218
pages, 22 figs.)

Stoll, G., & Parncutt, R. (1987). Harmonic relationship in similarity
judgments of nonsimultaneous complex tones. Acustica, 63, 111-119.

Terhardt, E., Stoll., G., Schermbach, R., & Parncutt, R. (1986).
Tonhoehenmehrdeutigkeit, Tonverwandschaft und Identifikation von
Sukzessivintervallen (Pitch ambiguity, harmonic relationship, and
melodic interval identification). Acustica, 61, 57-66.

Parncutt, R. (1989). Harmony. A psychoacoustical approach.
Heidelberg: Springer-Verlag.

Parncutt, R. (1993). Pitch properties of chords of octave-spaced
tones. Contemporary Music Review, 9, 35-50.

______________________________________________________________________________

--
      ____________________________      __________________________________
     /                           /\    /                                 /\
    /   Argiris A. Kranidiotis _/ /\  /       E-mail (InterNet):       _/ /\
   /  University Of Athens    / \/   /                                / \/
  / Informatics Department    /\    /    akra@di.uoa.ariadne-t.gr     /\
 /___________________________/ /   /_________________________________/ /
 \___________________________\/    \_________________________________\/
  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \     \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \


--
      ____________________________      __________________________________
     /                           /\    /                                 /\
    /   Argiris A. Kranidiotis _/ /\  /       E-mail (InterNet):       _/ /\
   /  University Of Athens    / \/   /                                / \/
  / Informatics Department    /\    /    akra@di.uoa.ariadne-t.gr     /\
 /___________________________/ /   /_________________________________/ /
 \___________________________\/    \_________________________________\/
  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \     \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \