[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

overall level of the masking threshold

Title: Re: computational complexity of psychoacoustic models

Dear Daniel & Alexander:


Thanks a lot for nice opinions, and apologies for my slow reaction. It was really nice and helpful.


In the meantime, I have more questions for the list:


I was wondering if it is possible to predict the “overall level of the masking threshold” based on some statistics of the input signal (e.g. power, power spectral density, etc). By “overall level of the masking threshold”, I mean the average of the masking threshold over frequency.

For example, the threshold in quiet has an overall level of  ~ 32 dB


It would be great to have some kind of relation between the overall level of the masking threshold with the level of the input signal and the level of the threshold in quiet.

Is there any formulas existing somewhere in the psychoacoustics literatures? 


Any suggestions and/or comments in this regard will be highly appreciated.


Thanks and regards,


From: AUDITORY Research in Auditory Perception on behalf of Danijel Domazet
Sent: Thu 2/9/2006 11:55 AM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: computational complexity of psychoacoustic models

Hi Arijit,
Try avoiding MPEG psychoacoustic model 2 - I think it is too complex.

There are a few things that are important when desingning SIMPLE psycho

- try to avoid separate time/freq. transformation (usualy FFT). Use the
result of the one that is already present in the encoder (MDCT most likely).
It isn't as good but spares you a FFT computation. Results are more than

- don't define separeate critical bands like in Psycho 2 (that better fit
human hearing), use the ones defined in your encoder as scalefactor bands,
it will be much simpler.

- tonality estimation might also be unnecessary. Just assume the constant
masking for tonal and non-tonal singnals, it will do the job for most
signals (you might loose some quality for strong tonal samples but it might
not be too critical).

- if you have to include tonality detection - don't calculate it based on
prediction accross frames, lookahead buffers will increase the delay and
complexity also. MPEG psycho model 2 has some really unnecessary lookaheads.
Use some other method for tonality estimation (Spectral Flatness Measure for

- don't complicate with the spreading function, simple triangular function
will do the job.

- detect transients in TIME domain.

- estimate scalefactors directly from masking threasholds, don't use
inner-and-outter loop method like Psycho 2 recommends (many iterations slow
you down drastically).

What I would do is somehting like:
- calculate time/freq transformation
- calculate energy accros sritical bands
- calculate masking (or use constant)
- calculate masking threshold as energy * masking
- apply spreading function
- apply threashold in quiet (this will give you the main result of the
psycho analysis - the masking threashold)
- convert masking thresholds directly to scalefactors
If your quantized spectar doesn't fit the bitrate, just increment ALL
scalefacotors at the same time and repeat the quantization.

I hope this helped. It you don't understand all this now, don't worry - you
will when you get involved with psychoacoustics some more.

Also, take a look at the psychoacoustic model of the Enhanced aacPlus
general audio codec from 3GPP - TS 26.403.


----- Original Message -----
From: "alexander lerch" <lerch@xxxxxxxxx>
To: <AUDITORY@xxxxxxxxxxxxxxx>
Sent: Wednesday, February 08, 2006 1:49 PM
Subject: Re: [AUDITORY] computational complexity of psychoacoustic models

The choice is, at least for all MPEG codecs, completely up to the
developer. You can decide not to use a psychoacoustic model at all, or
you can decide to use a complex model to gain as much quality as possible.

Oftenly used steps are:

Critical Band grouping
Conversion to dB
(Analysis of tonality of possible maskers)
calculation of masking threshold via masking model

Have a look at the psychoacoustic model 2 in the informative part of the
MPEG-1 standard.

Kind regards,

> Hi List:
> I’m interested to know the computational complexity (number of additions
> and multiplications) of psychoacoustic models used in audio coding.
> Well, to be more specific, let’s say if I’m targeting to build a “fast”
> psychoacoustic model, which existing model and/or what kind of
> computational complexity should I try to beat?
> Any help/suggestions/references in this direction will be highly
> appreciated.
> Best Regards,
> ~Arijit

dipl. ing.
alexander lerch

d-10965 berlin

fon: +49.30.854 09 15.0
fax: +49.30.854 09 15.5