[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: AUDITORY@xxxxxxxxxxxxxxx*Subject*: Re: STFT vs Power Spectral in Musical recognition system ?*From*: Arturo Camacho <acamacho@xxxxxxxxxxxx>*Date*: Thu, 31 Aug 2006 18:32:56 -0400*Delivery-date*: Thu Aug 31 18:49:03 2006*In-reply-to*: <p06230902c114c75cf625@[192.168.1.102]>*List-help*: <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>*List-owner*: <mailto:AUDITORY-request@LISTS.MCGILL.CA>*List-subscribe*: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>*List-unsubscribe*: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>*Reply-to*: Arturo Camacho <acamacho@xxxxxxxxxxxx>*Sender*: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

One problem of the square-root compression is that its slope approaches infinity as the magnitude M approaches zero. A more appropriate approach may be to use log(1+KM), where K is a constant to be determined. The response of this function is almost logarithmic for high magnitudes and almost linear for low magnitudes. Of course, the determination of the optimal value for K given an input is not trivial. Arturo -- __________________________________________________ Arturo Camacho PhD Candidate Computer and Information Science and Engineering University of Florida E-mail: acamacho@xxxxxxxxxxxx Web page: www.cise.ufl.edu/~acamacho __________________________________________________ On Fri, 25 Aug 2006, Richard F. Lyon wrote: > Edwin, > > A power spectral density is only defined for stationary signals, not > music. The STFT generalizes it to short segments, if you use the > squared magnitude. > > The difference between the absolute value, square, log, etc. are just > point nonlinearities that do not change the information content, but > do change the metric structure of the space a bit. Log is too > compressed, leading to too much emphasis on near-silent segments, > while the square (the power you ask about) is too expanded, leading > to too much emphasis on the louder parts. A good compromise is > around a square root or cube root of magnitude (roughly matching > perceptual magnitude via Stevens's law), but the magnitude itself is > also sometimes acceptable, depending on what you're doing. > > Dick > > At 7:12 AM -0700 8/25/06, Edwin Sianturi wrote: > >Content-Type: text/html > >X-MIME-Autoconverted: from 8bit to quoted-printable by > >torrent.cc.mcgill.ca id k7PED6jh031610 > > > >Hello, > > > >I am just a master student, doing my internship. Right now, I am > >building a musical instrument recognition system. I have read > >several papers on it, and I am just curious: > > > >All the papers/journals that I have read use the STFT, a.k.a the > >|X(t,f)| of a signal x(t), in order to extract several (spectral) > >features to be used as the input to the recognition system. > > > >What are the reasons behind using the |X(t,f)| instead of using the > >"power spectral" |X(t,f)|^2 ? > >(technically speaking, a power spectral density is the expectation > >of |X(f)|^2, i.e. E(|X(f)|^2) ) > > > >Thanks in advance, > > > >Edwin SIANTURI > > > >

**Follow-Ups**:**Re: STFT vs Power Spectral in Musical recognition system ?***From:*Richard F. Lyon

**References**:**Re: STFT vs Power Spectral in Musical recognition system ?***From:*Richard F. Lyon

- Prev by Date:
**Re: tympanometry probe tone level?** - Next by Date:
**Re: STFT vs Power Spectral in Musical recognition system ?** - Previous by thread:
**Re: STFT vs Power Spectral in Musical recognition system ?** - Next by thread:
**Re: STFT vs Power Spectral in Musical recognition system ?** - Index(es):