[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

mp3 and the perceptual coding (resume)

Dear all,

I summarized (i.e., cut/paste) all the replies I had below. But I'm not going to tell who's the winner!

Thanks to everybody,

--------- QUESTION ----------

I would like to convince to my students that psychophysics can (seldom)
be useful. For this reason, I want to talk about mp3 and perceptual coding.

Is there a book/chapter/paper that you would particularly recommend
about it? There are many things out there, but so far I didn't like much any of them. In particular, the characteristics of the perceptual coding are always roughly described.

Any suggestion is more than welcome.

%%%%%%%%%%% REPLIES %%%%%%%%%

Laszlo Toth:

I usually do the following demonstration: take a speech sample, compress
it with an mp3 codec at the highest possible compression rate, then
decompress it and display the spectrogram of the original and the
processed signal. The spectral valleys are wiped out,
while there is minimal perceptual difference. I think this quite
convincingly demonstrates that masking indeed works, and that the industry
can make use of the results of psychophysics.


you can have a look at:


and then 'Technical papers' which not only contains a collection of
papers but also theses covering basic psychoacoustics. I also like the
book "Audio Signal Processing and Coding" by Spanias.

Hope this helps!

The Bosi & Goldberg book is a good resource for beginners:

For a somewhat more concise tutorial, see the 1995 paper by Davis Pan:

Also, the CD published by the AES is a great resource to learn about
the basic audio artifacts:

If you have specific questions/needs, let me know and I'll do my best
to point you in the right direction.

- Jon

Jon Boley
Hello Massimo  :)

A very good review paper is by T. Painter et al.

 :)  stefan
By the way, there are also a couple Matlab implementations of the two
psychoacoustic models specified in the standard:

Model 1:  http://www.petitcolas.net/fabien/software/mpeg/

Model 2: http://perceptualentropy.com/Model2.zip

I wrote the Model 2 code while working on my Masters.  It doesn't
quite follow the standard (since I had trouble getting it to work),
but it should be good enough for demo purposes.

Ken Pohlmann uses figures from both of these models in his book:

- Jon
I would suggest "Perceptual coding of digital audio" by Painter and
Spanias in Proc. IEEE (you can find it easily with google).

Buona Primavera a voi!

Mark Kahrs.
Dear Massimo,

I used a book by Zwicker and Fastl a while ago. It did address some perceptual issues very nicely. Also, you should talk to Juergen Herre. I forget which German university he is with but he is extremely knowledgeable on this. The people at the Fraunhofer Institute should also be very helpful.

Dear Massimo,

There is a brief discussion of this in the last chapter of "An introduction to the psychology of hearing". It should be suitable for psychologists, but you may find it to be over-simplified.

Best wishes,

Houtsma, Adrian (2008).  Perceptually based audio coding.  Chapter 42 in
Handbook of Signal Processing in Acoustics, Edited by David Havelock, Sonoko
Kuwano and Michael Vorlander, Springer, New York.

(Search on: adrian houtsma perceptual coding)
Dear Massimo,

I faced the same problem a few years ago while teaching signal processing and psychoacoustics to psychology students.

I read the "psychoacoustic" part of the MP3 description, and found it quite non-psychoacoustic. So I produced a demo that illustrates the effect visually and auditorily. Simply take a notched noise with a pure tone, like is used to measure auditory filters, and change the width of the notch. For each sound, encode in MP3 with very low quality, and you'll see what happens to the notch.

I've attached a slide with this demo. I think it illustrates the fact that the MP3 encoder uses the concept of auditory filters, and when you listen to the encoded version, it also illustrates that it doesn't do a very good job... Feel free to re-use these slides if you like.

You can also produce other demos on the same concept with other coders (ogg, etc...).

Hope that helps.
Well, I'm sorry that Engineers rate so poorly, but you might try the following (written by this Engineer, I fear), one of the pioneers in the audio perceptual coding field, perhaps:

Johnston, James D, "Perceptual Audio Coding - A History and Timeline",41st Asilomar Conference on Signals, Systems and Computers, 2007.

Jayant, N. S., Johnston, J. D. and Safranek, R. J., "Signal compression based on models of human perception," Proc. IEEE, Oct. 1993, pp. 1385-1422.

Brandenburg, K., Herre, J., Johnston, J. D., Mahieux, Y. and Schroeder, E. F., "ASPEC: Adaptive spectral entropy coding of high quality music signals," 90th Convention of the AES, Feb. 1991, Preprint 3011 A-4. Brandenburg, K., Stoll, G., Johnston, J. D. and et al, "Coding of moving pictures and associated audio for digital storage media at up to about 1.5 mb./s audio," ISO/IEC JTC1/SC29/WG11 MPEG: International Standard ISO 11172-3, 1991. Johnston, J. D. and Brandenburg, K., "Wideband coding - Perceptual considerations for speech and music" in Advances in Speech Signal Processing, Furui and Sondhi (Ed.), Marcel Dekker, 1991, Preprint 3011 A-4. Brandenburg, K. and Johnston, J. D., "Second generation perceptual audio coding: The hybrid coder," AES 88th Conv. Preprint, March 1990. Safranek, R. J., Johnston, J. D. and Rosenholtz, R. E., "A perceptually tuned sub-band image coder," Proc. SPIE Symp. Human Vision & Electronic Imaging: Models, Methods & Applications, Santa Clara, CA, Feb. 1990. Johnston, J. D., "Digital audio - Future trends in quantization, storage, and compression," AES 7th Intn'l. Conf. Audio in Digital Times, May 1989. Johnston, J. D., "Perceptual transform coding of wideband stereo signals," ICASSP '89, May 1989, pp. 1993-1996. Safranek, R. J. and Johnston, J. D., "A perceptually tuned sub-band image coder with image dependant quantization and post quantization data compression," ICASSP '89, May 1989, pp. 1945-1948. Johnston, J. D., "Transform coding of audio signals using perceptual noise criteria," IEEE Jour. Selected Areas in Commun., vol. 6, no. 2, Feb 1988, pp. 314-323. Johnston, J. D., "Estimation of perceptual entropy using noise masking criteria," ICASSP '88 Record, 1988, pp. 2524-2527. Cox, R. V., Bock, D. E., Bauer, K. B., Johnston, J. D. and Snyder, J. H., "The analog voice privacy system," AT&T Tech. Jour., vol. 66, no. 1, Jan-Feb 1987, pp. 119-131.

Or look at


James D. Johnston
I would recommend "Applications of DSP to audio and acoustics". Best, Ayce

A related demo is to subtract the spectrum of the mp3 coded sound from the original, and do an inverse fft, to show all the sounds that they are not "hearing." Werner Deutsch used to do this with symphonic recordings and joke that this showed that most of the musicians in the symphony were not needed.

Brian Gygi, Ph.D.
Hi Massimo,
For more specific questions on mp3 (and other lossles (and lossy) audio formats) I recommend hydrogenaudio.org forums:


Danijel Domazet
Dear Massimo Grassi,

As Matthias said, you can have a look at the book "Audio Signal Processing and Coding" by Spanias. In chapter 5, there's an outline about psychoacoustics and its application to mp3. This application is in section 5.7 (EXAMPLE CODEC PERCEPTUAL MODEL: ISO/IEC 11172-3 (MPEG - 1) PSYCHOACOUSTIC MODEL 1).

There are many tutorials about mp3, but one very easy to read is Davis Pan's "A tutorial on MPEG/Audio Compression". It also shows graphically the advantage of the mp3 technique.

Please note that MP3 takes advantage of the spectral and temporal masking of the input data. The former is used for reducing the output bit rate. Temporal masking is used for minimizing the error introduced by the filter bank.

For further questions, please do not hesitate in asking me.


Dear Massimo, all,

I would like to add the following websites to the range of recommended
places to look:


They contain, of course, only consumer-level of detail, but they might
be useful nonetheless.

As a student I personally found the presentation of the "13db miracle"
the most impressive and straightforward demonstration of psychoacoustics
in audio coding. Of course that would require your students to have an
understanding of the principle of SNR.

Max Neuendorf