ITT Aerosp./Commun. Div., 10060 Carroll Canyon Rd., San Diego, CA 92131
The manual marking of syllabic nuclei is not difficult for a trained person. For a large amount of speech data, it is time consuming, and human error can introduce inconsistencies. The problem becomes worse when the person is not familiar with the language. To provide an automatic procedure for reducing the marking time and eliminate human errors, an ad hoc algorithm was developed to find syllabic nuclei using a set of acoustic features. The initial error rate is 15%. The overall time spent marking automatically and correcting errors manually, however, was less than hand-marking by itself. An improved machine learning approach using a backward-error-propagation multi-layered perceptron, with the same acoustic features as inputs, was applied to the training speech. The hand-corrected markings was used as the teacher. Testing on the training data showed only 3%--5% misalignment. The time required to verify markings has further been reduced. When the training extended to a database of 43 different languages, the marking errors dropped to 2%, less than the errors in the initial hand-markings. Finally, when the network is used to mark 7 h of a multi-language database, there is no significant increase in marking errors.