[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computational complexity of psychoacoustic models

Hi Arijit,
Try avoiding MPEG psychoacoustic model 2 - I think it is too complex.

There are a few things that are important when desingning SIMPLE psycho

- try to avoid separate time/freq. transformation (usualy FFT). Use the
result of the one that is already present in the encoder (MDCT most likely).
It isn't as good but spares you a FFT computation. Results are more than

- don't define separeate critical bands like in Psycho 2 (that better fit
human hearing), use the ones defined in your encoder as scalefactor bands,
it will be much simpler.

- tonality estimation might also be unnecessary. Just assume the constant
masking for tonal and non-tonal singnals, it will do the job for most
signals (you might loose some quality for strong tonal samples but it might
not be too critical).

- if you have to include tonality detection - don't calculate it based on
prediction accross frames, lookahead buffers will increase the delay and
complexity also. MPEG psycho model 2 has some really unnecessary lookaheads.
Use some other method for tonality estimation (Spectral Flatness Measure for

- don't complicate with the spreading function, simple triangular function
will do the job.

- detect transients in TIME domain.

- estimate scalefactors directly from masking threasholds, don't use
inner-and-outter loop method like Psycho 2 recommends (many iterations slow
you down drastically).

What I would do is somehting like:
- calculate time/freq transformation
- calculate energy accros sritical bands
- calculate masking (or use constant)
- calculate masking threshold as energy * masking
- apply spreading function
- apply threashold in quiet (this will give you the main result of the
psycho analysis - the masking threashold)
- convert masking thresholds directly to scalefactors
If your quantized spectar doesn't fit the bitrate, just increment ALL
scalefacotors at the same time and repeat the quantization.

I hope this helped. It you don't understand all this now, don't worry - you
will when you get involved with psychoacoustics some more.

Also, take a look at the psychoacoustic model of the Enhanced aacPlus
general audio codec from 3GPP - TS 26.403.


----- Original Message ----- 
From: "alexander lerch" <lerch@xxxxxxxxx>
To: <AUDITORY@xxxxxxxxxxxxxxx>
Sent: Wednesday, February 08, 2006 1:49 PM
Subject: Re: [AUDITORY] computational complexity of psychoacoustic models

The choice is, at least for all MPEG codecs, completely up to the
developer. You can decide not to use a psychoacoustic model at all, or
you can decide to use a complex model to gain as much quality as possible.

Oftenly used steps are:

Critical Band grouping
Conversion to dB
(Analysis of tonality of possible maskers)
calculation of masking threshold via masking model

Have a look at the psychoacoustic model 2 in the informative part of the
MPEG-1 standard.

Kind regards,

> Hi List:
> I’m interested to know the computational complexity (number of additions
> and multiplications) of psychoacoustic models used in audio coding.
> Well, to be more specific, let’s say if I’m targeting to build a “fast”
> psychoacoustic model, which existing model and/or what kind of
> computational complexity should I try to beat?
> Any help/suggestions/references in this direction will be highly
> appreciated.
> Best Regards,
> ~Arijit

dipl. ing.
alexander lerch

d-10965 berlin

fon: +49.30.854 09 15.0
fax: +49.30.854 09 15.5