overall level of the masking threshold (#ARIJIT BISWAS# )


Subject: overall level of the masking threshold
From:    #ARIJIT BISWAS#  <arijit17@xxxxxxxx>
Date:    Tue, 21 Feb 2006 21:46:05 +0800

This is a multi-part message in MIME format. ------_=_NextPart_001_01C636ED.296C89FB Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear Daniel & Alexander: =20 Thanks a lot for nice opinions, and apologies for my slow reaction. It = was really nice and helpful.=20 =20 In the meantime, I have more questions for the list: =20 I was wondering if it is possible to predict the "overall level of the = masking threshold" based on some statistics of the input signal (e.g. = power, power spectral density, etc). By "overall level of the masking = threshold", I mean the average of the masking threshold over frequency.=20 For example, the threshold in quiet has an overall level of ~ 32 dB =20 It would be great to have some kind of relation between the overall = level of the masking threshold with the level of the input signal and = the level of the threshold in quiet. Is there any formulas existing somewhere in the psychoacoustics = literatures? =20 =20 Any suggestions and/or comments in this regard will be highly = appreciated. =20 Thanks and regards, ~Arijit=20 =20 =20 ________________________________ From: AUDITORY Research in Auditory Perception on behalf of Danijel = Domazet Sent: Thu 2/9/2006 11:55 AM To: AUDITORY@xxxxxxxx Subject: Re: computational complexity of psychoacoustic models Hi Arijit, Try avoiding MPEG psychoacoustic model 2 - I think it is too complex. There are a few things that are important when desingning SIMPLE psycho model: - try to avoid separate time/freq. transformation (usualy FFT). Use the result of the one that is already present in the encoder (MDCT most = likely). It isn't as good but spares you a FFT computation. Results are more than acceptable. - don't define separeate critical bands like in Psycho 2 (that better = fit human hearing), use the ones defined in your encoder as scalefactor = bands, it will be much simpler. - tonality estimation might also be unnecessary. Just assume the = constant masking for tonal and non-tonal singnals, it will do the job for most signals (you might loose some quality for strong tonal samples but it = might not be too critical). - if you have to include tonality detection - don't calculate it based = on prediction accross frames, lookahead buffers will increase the delay and complexity also. MPEG psycho model 2 has some really unnecessary = lookaheads. Use some other method for tonality estimation (Spectral Flatness Measure = for example). - don't complicate with the spreading function, simple triangular = function will do the job. - detect transients in TIME domain. - estimate scalefactors directly from masking threasholds, don't use inner-and-outter loop method like Psycho 2 recommends (many iterations = slow you down drastically). What I would do is somehting like: - calculate time/freq transformation - calculate energy accros sritical bands - calculate masking (or use constant) - calculate masking threshold as energy * masking - apply spreading function - apply threashold in quiet (this will give you the main result of the psycho analysis - the masking threashold) - convert masking thresholds directly to scalefactors If your quantized spectar doesn't fit the bitrate, just increment ALL scalefacotors at the same time and repeat the quantization. I hope this helped. It you don't understand all this now, don't worry - = you will when you get involved with psychoacoustics some more. Also, take a look at the psychoacoustic model of the Enhanced aacPlus general audio codec from 3GPP - TS 26.403. Regards, Daniel ----- Original Message ----- From: "alexander lerch" <lerch@xxxxxxxx> To: <AUDITORY@xxxxxxxx> Sent: Wednesday, February 08, 2006 1:49 PM Subject: Re: [AUDITORY] computational complexity of psychoacoustic = models The choice is, at least for all MPEG codecs, completely up to the developer. You can decide not to use a psychoacoustic model at all, or you can decide to use a complex model to gain as much quality as = possible. Oftenly used steps are: FFT Critical Band grouping Conversion to dB (Analysis of tonality of possible maskers) calculation of masking threshold via masking model Have a look at the psychoacoustic model 2 in the informative part of the MPEG-1 standard. Kind regards, Alexander #ARIJIT BISWAS# wrote: > Hi List: > > > > I'm interested to know the computational complexity (number of = additions > and multiplications) of psychoacoustic models used in audio coding. > > Well, to be more specific, let's say if I'm targeting to build a = "fast" > psychoacoustic model, which existing model and/or what kind of > computational complexity should I try to beat? > > > > Any help/suggestions/references in this direction will be highly > appreciated. > > > > Best Regards, > > ~Arijit > -- dipl. ing. alexander lerch zplane.development :www.zplane.de katzbachstr.21 d-10965 berlin fon: +49.30.854 09 15.0 fax: +49.30.854 09 15.5 ------_=_NextPart_001_01C636ED.296C89FB Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Diso-8859-1">=0A= <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">=0A= <HTML>=0A= <HEAD>=0A= =0A= <META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version = 6.5.7638.1">=0A= <TITLE>Re: computational complexity of psychoacoustic models</TITLE>=0A= </HEAD>=0A= <BODY>=0A= <DIV id=3DidOWAReplyText35681 dir=3Dltr><SPAN style=3D"COLOR: = black"><FONT face=3DArial =0A= color=3D#0000ff size=3D2><SPAN style=3D"COLOR: black"><FONT face=3DArial = color=3D#0000ff =0A= size=3D2><SPAN style=3D"FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: = Arial"><SPAN =0A= style=3D"FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: 'Courier New'; = mso-ansi-language: EN-US"><SPAN =0A= style=3D"FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: 'Courier New'; = mso-ansi-language: EN-US"><SPAN =0A= style=3D"FONT-SIZE: 12pt; FONT-FAMILY: 'Times New Roman'; = mso-ansi-language: EN-US; mso-fareast-font-family: 'Times New Roman'; = mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><FONT =0A= face=3D"Courier New" color=3D#000080 size=3D2><SPAN =0A= style=3D"FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: 'Courier New'">=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>Dear Daniel &amp; Alexander:<?xml:namespace prefix =3D o = ns =3D =0A= "urn:schemas-microsoft-com:office:office" /><o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>&nbsp;<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>Thanks a lot for nice opinions, and apologies for my = slow =0A= reaction. It was really nice and helpful. <o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>&nbsp;<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>In the meantime, I have more questions for the =0A= list:<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>&nbsp;<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>I was wondering if it is possible to predict the = &#8220;overall level of =0A= the masking threshold&#8221; based on some statistics of the input = signal (e.g. power, =0A= power spectral density, etc). By &#8220;overall level of the masking = threshold&#8221;, I =0A= mean the average of the masking threshold over frequency. =0A= <o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>For example, the threshold in quiet has an overall level = of<SPAN =0A= style=3D"mso-spacerun: yes">&nbsp; </SPAN>~ 32 = dB<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>&nbsp;<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>It would be great to have some kind of relation between = the =0A= overall level of the masking threshold with the level of the input = signal and =0A= the level of the threshold in quiet.</FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000><o:p><SPAN lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>Is there any formulas existing somewhere in the = psychoacoustics =0A= literatures?&nbsp; <o:p></o:p></FONT></SPAN></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>&nbsp;<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>Any suggestions and/or comments in this regard will be = highly =0A= appreciated.<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>&nbsp;<o:p></o:p></FONT></SPAN></P>=0A= <P class=3DMsoNormal dir=3Dltr style=3D"MARGIN: 0cm 0cm 0pt"><SPAN = lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-size: = 12.0pt"><FONT =0A= color=3D#000000>Thanks and regards,<o:p></o:p></FONT></SPAN></P>=0A= <DIV dir=3Dltr><SPAN lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-ansi-language: = EN-US; mso-fareast-font-family: 'Times New Roman'; mso-fareast-language: = EN-US; mso-bidi-language: AR-SA; mso-bidi-font-size: 12.0pt; = mso-bidi-font-family: 'Times New Roman'"><FONT =0A= color=3D#000000>~Arijit </FONT></SPAN></DIV>=0A= <DIV dir=3Dltr><SPAN lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-ansi-language: = EN-US; mso-fareast-font-family: 'Times New Roman'; mso-fareast-language: = EN-US; mso-bidi-language: AR-SA; mso-bidi-font-size: 12.0pt; = mso-bidi-font-family: 'Times New Roman'"><FONT =0A= color=3D#000000></FONT></SPAN>&nbsp;</DIV>=0A= <DIV dir=3Dltr><SPAN lang=3DEN-US =0A= style=3D"FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-ansi-language: = EN-US; mso-fareast-font-family: 'Times New Roman'; mso-fareast-language: = EN-US; mso-bidi-language: AR-SA; mso-bidi-font-size: 12.0pt; = mso-bidi-font-family: 'Times New Roman'"><FONT =0A= color=3D#000000></FONT></SPAN></SPAN>&nbsp;</DIV></DIV></FONT></SPAN></SP= AN></SPAN></SPAN></FONT></SPAN></FONT></SPAN>=0A= <DIV dir=3Dltr><BR>=0A= <HR tabIndex=3D-1>=0A= <FONT face=3DTahoma size=3D2><B>From:</B> AUDITORY Research in Auditory = Perception =0A= on behalf of Danijel Domazet<BR><B>Sent:</B> Thu 2/9/2006 11:55 = AM<BR><B>To:</B> =0A= AUDITORY@xxxxxxxx<BR><B>Subject:</B> Re: computational complexity = of =0A= psychoacoustic models<BR></FONT><BR></DIV>=0A= <DIV>=0A= <P><FONT size=3D2>Hi Arijit,<BR>Try avoiding MPEG psychoacoustic model 2 = - I think =0A= it is too complex.<BR><BR><BR>There are a few things that are important = when =0A= desingning SIMPLE psycho<BR>model:<BR><BR>- try to avoid separate = time/freq. =0A= transformation (usualy FFT). Use the<BR>result of the one that is = already =0A= present in the encoder (MDCT most likely).<BR>It isn't as good but = spares you a =0A= FFT computation. Results are more than<BR>acceptable.<BR><BR>- don't = define =0A= separeate critical bands like in Psycho 2 (that better fit<BR>human = hearing), =0A= use the ones defined in your encoder as scalefactor bands,<BR>it will be = much =0A= simpler.<BR><BR>- tonality estimation might also be unnecessary. Just = assume the =0A= constant<BR>masking for tonal and non-tonal singnals, it will do the job = for =0A= most<BR>signals (you might loose some quality for strong tonal samples = but it =0A= might<BR>not be too critical).<BR><BR>- if you have to include tonality =0A= detection - don't calculate it based on<BR>prediction accross frames, = lookahead =0A= buffers will increase the delay and<BR>complexity also. MPEG psycho = model 2 has =0A= some really unnecessary lookaheads.<BR>Use some other method for = tonality =0A= estimation (Spectral Flatness Measure for<BR>example).<BR><BR>- don't = complicate =0A= with the spreading function, simple triangular function<BR>will do the =0A= job.<BR><BR>- detect transients in TIME domain.<BR><BR>- estimate = scalefactors =0A= directly from masking threasholds, don't use<BR>inner-and-outter loop = method =0A= like Psycho 2 recommends (many iterations slow<BR>you down =0A= drastically).<BR><BR><BR>What I would do is somehting like:<BR>- = calculate =0A= time/freq transformation<BR>- calculate energy accros sritical = bands<BR>- =0A= calculate masking (or use constant)<BR>- calculate masking threshold as = energy * =0A= masking<BR>- apply spreading function<BR>- apply threashold in quiet = (this will =0A= give you the main result of the<BR>psycho analysis - the masking =0A= threashold)<BR>- convert masking thresholds directly to = scalefactors<BR>If your =0A= quantized spectar doesn't fit the bitrate, just increment = ALL<BR>scalefacotors =0A= at the same time and repeat the quantization.<BR><BR><BR>I hope this = helped. It =0A= you don't understand all this now, don't worry - you<BR>will when you = get =0A= involved with psychoacoustics some more.<BR><BR>Also, take a look at the =0A= psychoacoustic model of the Enhanced aacPlus<BR>general audio codec from = 3GPP - =0A= TS 26.403.<BR><BR>Regards,<BR>Daniel<BR><BR><BR><BR>----- Original = Message =0A= -----<BR>From: "alexander lerch" &lt;lerch@xxxxxxxx&gt;<BR>To: =0A= &lt;AUDITORY@xxxxxxxx&gt;<BR>Sent: Wednesday, February 08, 2006 = 1:49 =0A= PM<BR>Subject: Re: [AUDITORY] computational complexity of psychoacoustic =0A= models<BR><BR><BR>The choice is, at least for all MPEG codecs, = completely up to =0A= the<BR>developer. You can decide not to use a psychoacoustic model at = all, =0A= or<BR>you can decide to use a complex model to gain as much quality as =0A= possible.<BR><BR>Oftenly used steps are:<BR><BR>FFT<BR>Critical Band =0A= grouping<BR>Conversion to dB<BR>(Analysis of tonality of possible =0A= maskers)<BR>calculation of masking threshold via masking = model<BR><BR>Have a =0A= look at the psychoacoustic model 2 in the informative part of = the<BR>MPEG-1 =0A= standard.<BR><BR>Kind regards,<BR>Alexander<BR><BR>#ARIJIT BISWAS# =0A= wrote:<BR>&gt; Hi List:<BR>&gt;<BR>&gt;<BR>&gt;<BR>&gt; I&#8217;m = interested to know =0A= the computational complexity (number of additions<BR>&gt; and = multiplications) =0A= of psychoacoustic models used in audio coding.<BR>&gt;<BR>&gt; Well, to = be more =0A= specific, let&#8217;s say if I&#8217;m targeting to build a = &#8220;fast&#8221;<BR>&gt; psychoacoustic =0A= model, which existing model and/or what kind of<BR>&gt; computational = complexity =0A= should I try to beat?<BR>&gt;<BR>&gt;<BR>&gt;<BR>&gt; Any =0A= help/suggestions/references in this direction will be highly<BR>&gt; =0A= appreciated.<BR>&gt;<BR>&gt;<BR>&gt;<BR>&gt; Best = Regards,<BR>&gt;<BR>&gt; =0A= ~Arijit<BR>&gt;<BR><BR>--<BR>dipl. ing.<BR>alexander =0A= lerch<BR><BR>zplane.development<BR>:www.zplane.de<BR>katzbachstr.21<BR>d-= 10965 =0A= berlin<BR><BR>fon: +49.30.854 09 15.0<BR>fax: +49.30.854 09 =0A= 15.5<BR><BR></FONT></P></DIV>=0A= =0A= </BODY>=0A= </HTML> ------_=_NextPart_001_01C636ED.296C89FB--


This message came from the mail archive
http://www.auditory.org/postings/2006/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University