identification test procedure


A colleague of mine is using a particular procedure to test how well
synthetic piano tones resemble "original" piano tones. The "original" tones
are actually analyzed and resynthesized using a phase vocoder method (in
order to eliminate background noise) and the synthetic tones are
resynthesized with a fair amount of data reduction. Pitches between E1 and
D7 of equal numbers of original and synthetic tones are presented in random
order to the listener, and the listener's task is to choose which tones are
synthetic. My colleague's contention is that if the listener were to simply
guess, the score would be 50%, and that any score less than 50% indicates
poor ability to distinguish, thus proving the efficacy of the data
reduction method. Indeed, the scores ranged from 15% to 45%.

My question is -- Does this really work? If the listener tries to maximize
his/her score, then he/she might tend to guess, moving the score towards
50%. On the other hand, if the subject is truly honest and he/she cannot
distinguish, he/she would choose no tones and would score 0%. I guess it
would depend on how the subjects are instructed to make their decisions,
but then can you trust them to always follow instructions? Shouldn't the
false positives also be reported?

Is there a way to properly evaluate the results of this test?

What is the best paradigm to test the efficacy of a data reduction method?

Jim Beauchamp
University of Illinois at Urbana-Champaign

