2pSPb4. Computer imaging features for classifying semivowels in speech spectrograms.

Session: Tuesday Afternoon, May 14

Time: 3:00

Author: Ben Pinkowski
Author: Jack Finnegan-Green
Location: Dept. of Comput. Sci., Western Michigan Univ., Kalamazoo, MI 49008


Speech spectrograms can be analyzed using computer image processing techniques to yield high recognition rates [B. Pinkowski, Pattern Recognition 26, 1593--1602 (1993)]. In particular, Fourier descriptors (FD's) have proven useful for characterizing the boundary of segmented isolated words containing the English semivowels (/w/, /y/, /l/, /r/). This study examines the appropriateness of FD's combined with 17 other general features for classifying spectrogram images. The other features include eigenvalues and eigenvectors, gray-level variance and covariance, run-length and chain encodings, and segment size, shape, and compactness. Principal components (PC's) are used for feature reduction on a speaker-dependent data set consisting of 80 sounds representing 20 speaker-dependent words containing semivowels. With eight combined features, including four 32-point FD's and four general features obtained from principal component analysis, a 97.5% recognition rate was obtained using a linear discriminant function. This rate was higher than that observed for any group of features considered separately. [Work supported by NIH.]

from ASA 131st Meeting, Indianapolis, May 1996