I don't know of a paper on this topic, but here are some impressions.

It is clear that people change their attention to dimensions depending on the set. For example, if there are large pitch variations among the stimuli, listeners' ratings are dominated by that dimension, whereas they attend more specifically to timbre dimensions if that pitch variation is removed. On the face of it, it thus seems very plausible that listeners can only attend to a small number of dimensions at a time. It is certainly the case that higher dimensions in MDS solutions become progressively less interpretable, which suggests that they may just be modeling noise.

That being said, it is hard to say whether this reveals an attentional limitation or is a measurement issue with rating scales and the MDS procedure. For example, I find that there is always much more unaccounted variance when I am using real recordings than when I am using a set (usually speech stimuli) that have been synthesized to vary on only a small number of dimensions. For example, a set that has been synthesized to be two dimensional will fit into a two-dimensional solution far better than a set of natural recordings will fit into a two-dimensional solution. I think that this indicates that listeners are indeed sensitive to higher dimensions in the natural stimuli; the unaccounted variance in such MDS experiments is not just noise. However, the relative contribution of those dimensions begins to be small enough to merge with the level of noise in the data, such that they can no longer be modeled very well by MDS. That is, there is usually enough gain to measure only a few of the most influential dimensions that drove the rating-scale judgements.

Date:    Sun, 19 Oct 2008
From:    Christian Kaernbach
Subject: multidimensional scaling of timbre

Dear list,

I seem to remember that one lesson from multidimensional scaling of
timbres was that the type of dimensions found depends strongly on the
selection of the stimuli. If my memory serves me right, the similarity
data would alway yield two- to three-dimensional spaces, regardless of
whether the stimuli were quite divers (all types of instruments of the
classical orchestra) or from a narrow subgroup (say, all woodwinds). In
other words, people seem to be able to manage two to three dimensions in
their cognitive space representing the entirety of the stimuli of a
certain experiment. Is that correct, and is there a reference referring
to this phenomenon?

