Wu Chou Tatsuo Matsuoka Biing-Hwang Juang Chin-Hui Lee
AT&T Bell Labs., 600 Mountain Ave., Murray Hill, NJ 07974
A new accurate string hypothesization algorithm to find multiple-string hypotheses for speech recognition is proposed. The algorithm differs from the conventional N-best search algorithms in that it allows the use of the same long-term language (bigram) models and the same set of subword models, including the interword models, to perform both forward tree search and backward tree-trellis stack decoding. The proposed A*-based backward tree-trellis stack decoding can handle interword units at the word boundaries, including the word boundaries of one-phone words. Therefore, the interword context dependency is exactly preserved in both forward and backward multiple-string hypotheses search. The search efficiency is maximized by applying the same high-resolution acoustic and language models in both search directions. When the search heuristics are used, the proposed approach provides a more accurate string model matching than that of the conventional time-synchronous beam search decoder. The proposed algorithm was tested on 24 000 connected digit strings recorded over the telephone network and collected from ten regions across the United States. Using a set of interword subword unit models, the test results showed that 62.4% string errors can be corrected if the second-best string hypothesis can be used to replace the wrong strings. The string error rate was decreased from 2.75% (657) to 1.0% (247) using the top two string hypotheses.