[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
STFT vs Power Spectral in Musical recognition system ?
- To: AUDITORY@xxxxxxxxxxxxxxx
- Subject: STFT vs Power Spectral in Musical recognition system ?
- From: Edwin Sianturi <sianturiauditory@xxxxxxxxx>
- Date: Fri, 25 Aug 2006 07:12:59 -0700
- Delivery-date: Fri Aug 25 10:38:51 2006
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=VeiURq6S3lYKfl8iVJyV27soYz6ddRm5fV4rky1OL9Jrra25H+nH2/vZVLTgkYsUfLPfQpXMH1MJKF+uWQOy2VBQl37Fuf4Cs0nW4Lgewcswc57F8m/g/v045TRLgPvJWfSMM7P2dycmczw5J9VTbFHnnTJw4+kudzy/JIG7ZOM= ;
- Reply-to: Edwin Sianturi <sianturiauditory@xxxxxxxxx>
- Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
I am just a master student, doing my internship. Right now, I am building a musical instrument recognition system. I have read several papers on it, and I am just curious:
All the papers/journals that I have read use the STFT, a.k.a the |X(t,f)| of a signal x(t), in order to extract several (spectral) features to be used as the input to the recognition system.
What are the reasons behind using the |X(t,f)| instead of using the "power spectral" |X(t,f)|^2 ?
(technically speaking, a power spectral density is the expectation of |X(f)|^2, i.e. E(|X(f)|^2) )
Thanks in advance,
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.