Sine-wave speech (=?X-UNKNOWN?Q?T=F3th_L=E1szl=F3?= )


Subject: Sine-wave speech
From:    =?X-UNKNOWN?Q?T=F3th_L=E1szl=F3?=  <tothl(at)INF.U-SZEGED.HU>
Date:    Sun, 9 Mar 2003 17:54:45 +0100

Dear List, listening to the SWS examples Stuart Rosen sent me I realized how huge difference can be between SWS algorithms. I use the original concept of Remez et al.: formant estimation based on LPC, with sinusoid curves constrained to stay continuous. This was, of course, invented with vowels and glides in mind, and obviously, it behaves crazily at closures and fricatives. For example, in the case of an /s/, all three formants run up to 4-6000 Hz, which creates unntatural transition cues. On the opposite, the test examples Dr. Rosen sent me have perfect silence at closures and do not keep the continuity of the sinusoids. This preserves the onset and offset cues and does not create false transitions. His sentences are clearly more intelligible (at least, for me). The moral for me is that one should take care of the algorithmic differences when comparing SWS results. P.S.: By the way, has anyone ever modified the SWS algorithm so that the components retain harmonicity (that is, they are at multiple frequencies of a common fundamental)? Thanks, Laszlo Toth Hungarian Academy of Sciences * Research Group on Artificial Intelligence * "Failure only begins e-mail: tothl(at)inf.u-szeged.hu * when you stop trying" http://www.inf.u-szeged.hu/~tothl *


This message came from the mail archive
http://www.auditory.org/postings/2003/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University