ASA 128th Meeting - Austin, Texas - 1994 Nov 28 .. Dec 02

3pSP5. Processing of continuous speech by a hierarchical neural network.

Wolf Dieter Brandt

Holger Behme

Drittes Physikalisches Inst. der Univ. Goettingen, Buergerstr. 42-44, D-37073 Goettingen, Germany

A multilevel neural network has been developed for the tasks of psychoacoustic preprocessing of speech, segmentation, segment classification, and recognition. Various neurophysiological and psychoacoustical results have been taken into account. The network relies heavily on unsupervised learning. On increasing and nonlinear time scales, each level of the network extracts segments from the stream of input data and classifies them using topology-conserving vector quantizers (self-organizing feature maps SOFM, ``neural gas'' algorithm (NGA) and passes the results to the next level. Modified learning algorithms for feature maps have been developed to achieve better representation of low-energetic consonants and transient parts in the first level. On higher levels variants of the NGA are used. The segmentation algorithm uses, depending on the level, the topological relationships within the SOFM or statistical information extracted from the training data. Within each segment, dynamic time normalization is achieved by appropriate temporal integration. The output of the topmost level is passed to a recognition network containing a linguistic model to perform continuous speech recognition. Results using this purely neural and in many parts self-organizing method will be presented and compared to classical methods. [Work supported by BMFT.]