[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MDS-distances

Dear List,

I think Malcolm is on the right track with his idea of an auditory version of three-color vision. I think it can be done. The reason is that I've encountered this sort of perception while doing experiments in classifying timbre of pulsed sequences. I noticed this effect when listening to groups of all possible combinations of sequences of from 1 to 4 successive impulse waveshapes. The impulse shapes I used were either rectangular or overdamped sines with constant amplitude and with ordered time spacings. I found that equal spacings give a tonal timbre (or pitch) while aperiodic spacings give an atonal timbre. I noticed that I heard atonal timbre in terms of the vowels (/ee/, /ih/, /eh/,...../oo/.) By a series of tests with different combinations of shapes and intervals I could locate the combinations that produced the closest match to vowel centers. I found that the non-matching timbres would generally lie between adjacent centers. For example, if a sample were not an /ah/ it would fall between either /aa/ or /aw/, or it might move toward the back vowel /oo/ as in an umlaut.

These results led me to suppose that a set of vowels could represent cardinal points in a timbre space that applies to human speech as well as to environmental sounds. (Why not? The vowels of the human vocal cavity can be heard in a wide variety of non-speech sounds.) This indicates that a vowel space could be defined perhaps in terms of one like the "RGB" space of color TV. For example, consider an "FMB" space with F for front-closed, M for middle-open, and B for back-closed. Couldn't this space include all of the variants of vowel sounds? Couldn't this also provide a more quantitative calibration of timbre than one based on things like spectral brightness and "bite?"

If anyone feels like trying these tests, the key is to listen to the atonal waveshape groups either individually or in a stream, each one separated by more than 25 milliseconds. The 25 ms.separation reduces the possibility of mixing the sound of group repetition with group timbre. Note that the group of the vowel, /ee/, as the only tonal vowel, must contain at least three equally spaced impulses having a repetition rate that defines the third formant. Note also that whispered speech, or all kinds of environmental sounds including transients could be simulated by randomizing the separation and/or mixing of timbre and pitch.

IAn example of the dichotomy of timbre and pitch can be shown in the experiment by E. Terhardt and H. Fastl, "Zum einfluss von stortonen un storgerauschen auf die tonhohe von sinustonen" Acoustica, vol. 25, pp53-61, 1971 This is a study of phase masking vs.amplitude where a 200 Hz tone masks a 400 Hz tone as its phase is varied in steps 0 to 360 degrees. I've been testing this experiment and have found that the timbre varies with corresponding waveshape changes although the two pitches are constant. My experiments are based on Manfred Schroeder's description in "Models of hearing," Proceedings of the IEEE, Vol. 63, No.9, September,1974. I'm looking for more information on how the experiment was run, since Schroeder's paper was only a summary. So far I haven't found much about it on the Internet.

John Bates


Failed at what? Malcolm, I think you have missed the point.

Fair enough.. We have different goals. I want a model of timbre perception (for speech and music sounds) that rivals the three-color model of color vision science. Spectral brightness and attack time are not enough of an answer for me.

I don't think the timbre interpolation work I've seen (the vibrabone?) shows that we understand timbre space yet. As I remember the data, the synthesized instrument was not on a perceptual line directly between the source sounds.