Replies to inquiry re: modifying speech signal

Thank you to everyone who responded to my inquiry regarding how to modify a speech signal so as to alter the apparent vocal tract length of the speaker. I have compiled most of the responses below. Apologies if I missed your reply.

David

***********************************************

Hi David,

Check out Roy Patterson’s web site for info (specifically the publications related to “size perception”).

http://www.mrc-cbu.cam.ac.uk/~roy/

Cheers,

--Maria Chait

***********************************************

Praat (www.praat.org) is a free speech analysis/synthesis program that can do this, apparently quite easily. I've not used this particular function myself (it's part of the "change gender" mechanism) but the built-in help manual is quite good.

Hope this helps,

-Alex Francis

***********************************************

Dear David,

It's fairly straightforward acoustics. Longer tubes yield lower resonances (formants). In general, you simply increase or decrease formant center frequencies by some percentage.

I must note, however, that such simple transformations are not quite right because talkers can employ other articulatory maneuvers to adjust effective formant frequencies, especially effective F1. For example, F1 with female talkers (generally shorter vocal tracts) is not as high as would be predicted on vocal tract length alone even though F2, F3, etc. are higher. I think women do this by adjusting the source spectrum (laryngeal control) in a way that makes the acoustic spectral peak appear at a lower frequency than the resonant frequency. But, that's a longer explanation.

-- Keith Kluender

***********************************************

Hi David,

A longer vocal tract length will in general lower the frequency of the formants, and a shorter one raise their frequency. A simplified transform to achieve the effect of would be to multiply the frequency of the formants by the the inverse of the amount you want to vary the vocal tract by, e.g. multiply the formant frequency by 0.5 if you want double the vocal tract length, or by 2 if you want to half the vocal tract length. Typically the spectral shape of the voie signal is attributed to the vocal tract, so how you do this depends on the spectral shape estimation technique you are using.

One way to do this would be to transpose a tone without formant preservation (e.g time stretching follwoed by resampling), and use the spectral shape of the transposed tone to shape a spectrally flat source at the original pitch. For example to lengthen the vocal tract by a factor of two, transpose the sample down one octave, then use the spectral shape of the tranposed sample to shape a spectrally flat source at the original pitch.

Alternatively, if you are using a sinusoidal model you could multiply the frequencies of each partial by the inverse of the amount you want to vary the vocal tract length by (0.5 to double the length) and use the shape defined by the resulting spectrum to define the spectral shape. To get the amplitude of the transformed partials interpolate the amplitudes at the original frequency using the new spectral shape.

This should give you a first approximation of the effect of varying the vocal tract length, and should work pretty well for relatively small variations.

Sean O'Leary

***********************************************

Hello David,

We use Kawahara-san's vocoder STRAIGHT. It works very well on clean speech recordings. We describe its use in several papers where we scaled speech sounds over a wide range.

Smith, D. R. R., Patterson, R. D., Turner, R., Kawahara, H., and Irino, T. (2005). "The processing and perception of size information in speech sounds," J. Acoust. Soc. Am. 117, 305-318.

Ives, D. T., Smith, D. R. R. and Patterson, R. D. (2005). "Discrimination of speaker size from syllable phrases," J. Acoust. Soc. Am. 118 (6), 3816-3822.

Smith, D. R. R. and Patterson, R. D. (2005). "The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex and age," J. Acoust. Soc. Am. 118, 3177-3186.

It also works on some musical instruments by the way.

Dinther, R. van, Patterson, R. D. (2006). “Perception of acoustic scale and size in musical instrument sounds,” J. Acoust. Soc. Am. 120, 2158-2176.

You can also use PRAAT from Peter Boersema in Amsterdam.

Another useful application is PSOLA from Eindhoven.

Regards Roy Patterson

***********************************************

It seems that you need "resynthesis".

You can google it and will get you a couple of hundreds.

Festival, Synthworks, HLSyn, Praat are some of the most popular options out there.

Heriberto Avelino

***********************************************

Dear David,

I am not sure that this is what you want

to achieve, but Praat (available free at

www.praat.org) has this interesting

"Change gender" button, which allows you to

shift all formants up or down by a given

ratio independent of F0.

Holger Mitterer

***********************************************

STRAIGHT is a program that has been used to modify perceived vocal tract length.

See papers by Patterson RD and colleagues

and the following pages:

http://www.wakayama-u.ac.jp/~kawahara/index-e.html

http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTtrial/

It's available free to academics.

Christopher Long

***********************************************