[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sine-wave speech

Dear List,

listening to the SWS examples Stuart Rosen sent me I realized how huge
difference can be between SWS algorithms. I use the original concept of
Remez et al.: formant estimation based on LPC, with sinusoid
curves constrained to stay continuous. This was, of course,
invented with vowels and glides in mind, and obviously, it behaves
crazily at closures and fricatives. For example, in the case of an /s/, all three
formants run up to 4-6000 Hz, which creates unntatural transition cues.
On the opposite, the test examples Dr. Rosen sent me have perfect silence
at closures and do not keep the continuity of the sinusoids. This
preserves the onset and offset cues and does not create false transitions.
His sentences are clearly more intelligible (at least, for me). The moral
for me is that one should take care of the algorithmic differences when
comparing SWS results.

P.S.: By the way, has anyone ever modified the SWS algorithm so that the
components retain harmonicity (that is, they are at multiple frequencies
of a common fundamental)?

               Laszlo Toth
        Hungarian Academy of Sciences         *
  Research Group on Artificial Intelligence   *   "Failure only begins
     e-mail: tothl@inf.u-szeged.hu            *    when you stop trying"
     http://www.inf.u-szeged.hu/~tothl        *