[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

STFT vs Power Spectral in Musical recognition system ?


I am just a master student, doing my internship. Right now, I am building a musical instrument recognition system. I have read several papers on it, and I am just curious:

All the papers/journals that I have read use the STFT, a.k.a the |X(t,f)| of a signal x(t), in order to extract several (spectral) features to be used as the input to the recognition system.

What are the reasons behind using the |X(t,f)| instead of using the "power spectral" |X(t,f)|^2 ?
(technically speaking, a power spectral density is the expectation of |X(f)|^2, i.e. E(|X(f)|^2) )

Thanks in advance,


Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.