Re: computational complexity of psychoacoustic models (Danijel Domazet )


Subject: Re: computational complexity of psychoacoustic models
From:    Danijel Domazet  <Danijel.Domazet@xxxxxxxx>
Date:    Thu, 9 Feb 2006 11:55:22 +0100

Hi Arijit, Try avoiding MPEG psychoacoustic model 2 - I think it is too complex. There are a few things that are important when desingning SIMPLE psycho model: - try to avoid separate time/freq. transformation (usualy FFT). Use the result of the one that is already present in the encoder (MDCT most likely). It isn't as good but spares you a FFT computation. Results are more than acceptable. - don't define separeate critical bands like in Psycho 2 (that better fit human hearing), use the ones defined in your encoder as scalefactor bands, it will be much simpler. - tonality estimation might also be unnecessary. Just assume the constant masking for tonal and non-tonal singnals, it will do the job for most signals (you might loose some quality for strong tonal samples but it might not be too critical). - if you have to include tonality detection - don't calculate it based on prediction accross frames, lookahead buffers will increase the delay and complexity also. MPEG psycho model 2 has some really unnecessary lookaheads. Use some other method for tonality estimation (Spectral Flatness Measure for example). - don't complicate with the spreading function, simple triangular function will do the job. - detect transients in TIME domain. - estimate scalefactors directly from masking threasholds, don't use inner-and-outter loop method like Psycho 2 recommends (many iterations slow you down drastically). What I would do is somehting like: - calculate time/freq transformation - calculate energy accros sritical bands - calculate masking (or use constant) - calculate masking threshold as energy * masking - apply spreading function - apply threashold in quiet (this will give you the main result of the psycho analysis - the masking threashold) - convert masking thresholds directly to scalefactors If your quantized spectar doesn't fit the bitrate, just increment ALL scalefacotors at the same time and repeat the quantization. I hope this helped. It you don't understand all this now, don't worry - you will when you get involved with psychoacoustics some more. Also, take a look at the psychoacoustic model of the Enhanced aacPlus general audio codec from 3GPP - TS 26.403. Regards, Daniel ----- Original Message ----- From: "alexander lerch" <lerch@xxxxxxxx> To: <AUDITORY@xxxxxxxx> Sent: Wednesday, February 08, 2006 1:49 PM Subject: Re: [AUDITORY] computational complexity of psychoacoustic models The choice is, at least for all MPEG codecs, completely up to the developer. You can decide not to use a psychoacoustic model at all, or you can decide to use a complex model to gain as much quality as possible. Oftenly used steps are: FFT Critical Band grouping Conversion to dB (Analysis of tonality of possible maskers) calculation of masking threshold via masking model Have a look at the psychoacoustic model 2 in the informative part of the MPEG-1 standard. Kind regards, Alexander #ARIJIT BISWAS# wrote: > Hi List: > > > > I’m interested to know the computational complexity (number of additions > and multiplications) of psychoacoustic models used in audio coding. > > Well, to be more specific, let’s say if I’m targeting to build a “fast” > psychoacoustic model, which existing model and/or what kind of > computational complexity should I try to beat? > > > > Any help/suggestions/references in this direction will be highly > appreciated. > > > > Best Regards, > > ~Arijit > -- dipl. ing. alexander lerch zplane.development :www.zplane.de katzbachstr.21 d-10965 berlin fon: +49.30.854 09 15.0 fax: +49.30.854 09 15.5


This message came from the mail archive
http://www.auditory.org/postings/2006/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University