[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: STFT vs Power Spectral in Musical recognition system ?
A power spectral density is only defined for stationary signals, not
music. The STFT generalizes it to short segments, if you use the
The difference between the absolute value, square, log, etc. are just
point nonlinearities that do not change the information content, but
do change the metric structure of the space a bit. Log is too
compressed, leading to too much emphasis on near-silent segments,
while the square (the power you ask about) is too expanded, leading
to too much emphasis on the louder parts. A good compromise is
around a square root or cube root of magnitude (roughly matching
perceptual magnitude via Stevens's law), but the magnitude itself is
also sometimes acceptable, depending on what you're doing.
At 7:12 AM -0700 8/25/06, Edwin Sianturi wrote:
X-MIME-Autoconverted: from 8bit to quoted-printable by
torrent.cc.mcgill.ca id k7PED6jh031610
I am just a master student, doing my internship. Right now, I am
building a musical instrument recognition system. I have read
several papers on it, and I am just curious:
All the papers/journals that I have read use the STFT, a.k.a the
|X(t,f)| of a signal x(t), in order to extract several (spectral)
features to be used as the input to the recognition system.
What are the reasons behind using the |X(t,f)| instead of using the
"power spectral" |X(t,f)|^2 ?
(technically speaking, a power spectral density is the expectation
of |X(f)|^2, i.e. E(|X(f)|^2) )
Thanks in advance,