[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: AUDITORY@xxxxxxxxxxxxxxx*Subject*: Re: STFT vs Power Spectral in Musical recognition system ?*From*: "Richard F. Lyon" <DickLyon@xxxxxxx>*Date*: Thu, 31 Aug 2006 16:00:31 -0700*Comments*: To: Arturo Camacho <acamacho@CISE.UFL.EDU>*Delivery-date*: Thu Aug 31 22:00:42 2006*In-reply-to*: <Pine.GSO.4.33.0608311831330.13833-100000@sand.cise.ufl.edu>*List-help*: <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>*List-owner*: <mailto:AUDITORY-request@LISTS.MCGILL.CA>*List-subscribe*: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>*List-unsubscribe*: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>*References*: <Pine.GSO.4.33.0608311831330.13833-100000@sand.cise.ufl.edu>*Reply-to*: "Richard F. Lyon" <DickLyon@xxxxxxx>*Sender*: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dick

One problem of the square-root compression is that its slope approaches infinity as the magnitude M approaches zero. A more appropriate approach may be to use log(1+KM), where K is a constant to be determined. The response of this function is almost logarithmic for high magnitudes and almost linear for low magnitudes. Of course, the determination of the optimal value for K given an input is not trivial.

Arturo -- __________________________________________________

Arturo Camacho PhD Candidate Computer and Information Science and Engineering University of Florida

E-mail: acamacho@xxxxxxxxxxxx Web page: www.cise.ufl.edu/~acamacho __________________________________________________

On Fri, 25 Aug 2006, Richard F. Lyon wrote:

Edwin,

A power spectral density is only defined for stationary signals, not music. The STFT generalizes it to short segments, if you use the squared magnitude.

The difference between the absolute value, square, log, etc. are just point nonlinearities that do not change the information content, but do change the metric structure of the space a bit. Log is too compressed, leading to too much emphasis on near-silent segments, while the square (the power you ask about) is too expanded, leading to too much emphasis on the louder parts. A good compromise is around a square root or cube root of magnitude (roughly matching perceptual magnitude via Stevens's law), but the magnitude itself is also sometimes acceptable, depending on what you're doing.

Dick

At 7:12 AM -0700 8/25/06, Edwin Sianturi wrote: >Content-Type: text/html >X-MIME-Autoconverted: from 8bit to quoted-printable by >torrent.cc.mcgill.ca id k7PED6jh031610 > >Hello, > >I am just a master student, doing my internship. Right now, I am >building a musical instrument recognition system. I have read >several papers on it, and I am just curious: > >All the papers/journals that I have read use the STFT, a.k.a the >|X(t,f)| of a signal x(t), in order to extract several (spectral) >features to be used as the input to the recognition system. > >What are the reasons behind using the |X(t,f)| instead of using the >"power spectral" |X(t,f)|^2 ? >(technically speaking, a power spectral density is the expectation >of |X(f)|^2, i.e. E(|X(f)|^2) ) > >Thanks in advance, > >Edwin SIANTURI >

**References**:**Re: STFT vs Power Spectral in Musical recognition system ?***From:*Arturo Camacho

- Prev by Date:
**Re: STFT vs Power Spectral in Musical recognition system ?** - Next by Date:
**vibrant soundbridge** - Previous by thread:
**Re: STFT vs Power Spectral in Musical recognition system ?** - Next by thread:
**Second call for papers: Auditory Perception Action and Cognition** - Index(es):