3aSP23 A noninvasive EM based speech feature estimation method for

ASA 128th Meeting - Austin, Texas - 1994 Nov 28 .. Dec 02

3aSP23. A noninvasive EM based speech feature estimation method for analysis and recognition of vocal fold cancer.

Liliana Gavidia-Ceballos

Robust Speech Processing Lab., Dept. of Biomedical Engineering, Duke Univ., Box 90291, Durham, NC 27708-0291

John H. L. Hansen

Duke Univ., Durham, NC 27708-0291

The focus of this study is to formulate a speech parameter estimation algorithm for analysis/detection of vocal fold cancer, which does not require direct estimation of the glottal flow waveform. The proposed method separates speech components under healthy and assumed pathology conditions using a mixed excitation speech model. This problem is addressed using an iterative maximum-likelihood (ML) estimation procedure, based on the estimation-maximization (EM) algorithm. Two new features, termed enhanced spectral pathology component (ESPC) and mean area peak value (MAPV) index are estimated and shown to vary consistently between healthy and pathology conditions. For classification, a hidden Markov model recognizer is formulated using MAPV and/or ESPC spectral features. Classifier evaluations using speech recordings from healthy and vocal fold cancer patients for sustained vowels, showed that while MAPV is a useful feature for vocal fold cancer detection (88.7%), superior performance was achieved using a finer spectral representation of ESPC (92.8%). Since direct glottal flow estimation is not necessary, the inability to accurately characterize vocal fold pathology due to incomplete glottal closure is no longer an issue. The results suggest that the ESPC feature can provide a noninvasive approach for analysis, detection, and characterization of speech production under vocal fold pathology.