C. Rose Rabinov
Bruce R. Gerratt
Div. of Head/Neck Surgery, UCLA School of Medicine, CHS 62-132, Los Angeles, CA 90024
VA Med. Ctr., West Los Angeles, Audiol./Speech Pathol. 126), Wilshire and Sawtelle Blvds., Los Angeles, CA 90073
Acoustic analysis is often favored over perceptual evaluation of voice because it is considered objective, and thus reliable. Specifically, jitter is frequently used as an index of pathologic voice quality because of its moderate to high correlation with vocal roughness. This study examined the relative reliability of human listeners and automatic algorithms in the evaluation of pathologic voices. Ten experienced listeners rated the roughness of 50 voice samples (ranging from normal to severely disordered) on a 75-mm visual analog scale. Rating reliability within and across listeners was compared to the reliability of jitter measures produced by several automatic voice analysis packages (CSpeech, SoundScope, CSL, and an interactive hand marking system). Results showed that overall listeners agreed as well or better than ``objective'' algorithms. Further, listeners disagreed in predictable ways, while automatic algorithms differed in seemingly random fashions. Finally, listener reliability increased with severity of pathology; objective methods quickly broke down as severity increased. These findings suggest that listeners and analysis packages differ greatly in their measurement characteristics, but that reliability is not a good reason for preferring acoustic to perceptual measures.