[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

STFT vs Power Spectral in Musical recognition system ?

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: STFT vs Power Spectral in Musical recognition system ?
From: Edwin Sianturi <sianturiauditory@xxxxxxxxx>
Date: Fri, 25 Aug 2006 07:12:59 -0700
Delivery-date: Fri Aug 25 10:38:51 2006
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=VeiURq6S3lYKfl8iVJyV27soYz6ddRm5fV4rky1OL9Jrra25H+nH2/vZVLTgkYsUfLPfQpXMH1MJKF+uWQOy2VBQl37Fuf4Cs0nW4Lgewcswc57F8m/g/v045TRLgPvJWfSMM7P2dycmczw5J9VTbFHnnTJw4+kudzy/JIG7ZOM= ;
Reply-to: Edwin Sianturi <sianturiauditory@xxxxxxxxx>
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Hello,

I am just a master student, doing my internship. Right now, I am building a musical instrument recognition system. I have read several papers on it, and I am just curious:

All the papers/journals that I have read use the STFT, a.k.a the |X(t,f)| of a signal x(t), in order to extract several (spectral) features to be used as the input to the recognition system.

What are the reasons behind using the |X(t,f)| instead of using the "power spectral" |X(t,f)|^2 ?
(technically speaking, a power spectral density is the expectation of |X(f)|^2, i.e. E(|X(f)|^2) )

Thanks in advance,

Edwin SIANTURI

Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.

Follow-Ups:
- Re: STFT vs Power Spectral in Musical recognition system ?
  - From: Richard F. Lyon

Prev by Date: Cochlear mechanics.
Next by Date: Re: STFT vs Power Spectral in Musical recognition system ?
Previous by thread: Re: Cochlear mechanics.
Next by thread: Re: STFT vs Power Spectral in Musical recognition system ?
Index(es):
- Date
- Thread