I don't know anything about timbre but I do know something about vowels and MDS. For many years we have known that MDS analyses of similarities of vowels in American English yield a very nice match to a First Formant-Second Formant space. You may or may not get the third formant coming out but F1 and F2 soak up almost all the variance. To my mind it is one of the success stories of MDS applications
Sorry I can't give the references (I am in Australia at the moment) but ask Rob Fox at Ohio State or search the speech perception literature.
James J. Jenkins
university of South Florida and
Grad Center, City University of New York
From: John Bates <jkbates@xxxxxxxxxxx>
Sent: Tue, 27 Jun 2006 09:44:15 -0400
Subject: Re: MDS-distances
I think Malcolm is on the right track with his idea of an auditory version of three-color vision. I think it can be done. The reason is that I've encountered this sort of perception while doing experiments in classifying timbre of pulsed sequences. I noticed this effect when listening to groups of all possible combinations of sequences of from 1 to 4 successive impulse waveshapes. The impulse shapes I used were either rectangular or overdamped sines with constant amplitude and with ordered time spacings. I found that equal spacings give a tonal timbre (or pitch) while aperiodic spacings give an atonal timbre. I noticed that I heard atonal timbre in terms of the vowels (/ee/, /ih/, /eh/,...../oo/.) By a series of tests with different combinations of shapes and intervals I could locate the combinations that produced the closest match to vowel centers. I found that the
non-matching timbres would generally lie between adjacent centers. For example, if a sample were not an /ah/ it would fall between either /aa/ or /aw/, or it might move toward the back vowel /oo/ as in an umlaut.
These results led me to suppose that a set of vowels could represent cardinal points in a timbre space that applies to human speech as well as to environmental sounds. (Why not? The vowels of the human vocal cavity can be heard in a wide variety of non-speech sounds.) This indicates that a vowel space could be defined perhaps in terms of one like the "RGB" space of color TV. For example, consider an "FMB" space with F for front-closed, M for middle-open, and B for back-closed. Couldn't this space include all of the variants of vowel sounds? Couldn't this also provide a more quantitative calibration of timbre than one based on things like spectral brightness and "bite?"
If anyone feels like trying these tests, the key is to liste
n to the atonal waveshape groups either individually or in a stream, each one separated by more than 25 milliseconds. The 25 ms.separation reduces the possibility of mixing the sound of group repetition with group timbre. Note that the group of the vowel, /ee/, as the only tonal vowel, must contain at least three equally spaced impulses having a repetition rate that defines the third formant. Note also that whispered speech, or all kinds of environmental sounds including transients could be simulated by randomizing the separation and/or mixing of timbre and pitch.
IAn example of the dichotomy of timbre and pitch can be shown in the experiment by E. Terhardt and H. Fastl, "Zum einfluss von stortonen un storgerauschen auf die tonhohe von sinustonen" Acoustica, vol. 25, pp53-61, 1971 This is a study of phase masking vs.amplitude where a 200 Hz tone masks a 400 Hz tone as its phase is varied in steps 0 to 360 degrees. I've been testing this experiment and have
found that the timbre varies with corresponding waveshape changes although the two pitches are constant. My experiments are based on Manfred Schroeder's description in "Models of hearing," Proceedings of the IEEE, Vol. 63, No.9, September,1974. I'm looking for more information on how the experiment was run, since Schroeder's paper was only a summary. So far I haven't found much about it on the Internet.
>>Failed at what? Malcolm, I think you have missed the point.
>Fair enough.. We have different goals. I want a model of timbre >perception (for speech and music sounds) that rivals the three-color >model of color vision science. Spectral brightness and attack time >are not enough of an answer for me.
>I don't think the timbre interpolation work I've seen (the >vibrabone?) shows that we understand timbre space yet. As I >remember t
he data, the synthesized instrument was not on a >perceptual line directly between the source sounds.