### ASA 124th Meeting New Orleans 1992 October

## 3aSP2. Statistical grammar inference.

**Yves Schabes
**

**
**
*Dept. of Comput. Inform. Sci., Univ. of Pennsylvania, Philadelphia, PA
19104-6389
*

*
*
Language can be talked, written, printed, or encoded in numerous different
ways. As any form of communication, each of these codings can be analyzed
statistically for the purpose of comparative judgment and predicting power.
Early proposals of language models such as Markov models, N-gram models [C. E.
Shannon, Bell Syst. Tech. J. 27(3), 379--423 (1948)] although efficient in
practice, have been quickly refuted in theory since they are unable to capture
long distance dependencies or to describe hierarchically the syntax of natural
languages. Stochastic context-free grammar [T. Booth, in 10th Annual IEEE Symp.
on Switching and Automata Theory (1969)] is a hierarchical model that assigns a
probability to each context-free rewriting rule. However, none of such
proposals perform as well as the simpler Markov models because of the
difficulty of capturing lexical information [Jelinek et al., Tech. Rep. RC
16374 (72684), IBM, Yorktown Heights, NY 10598 (1990)] [K. Lari and S. J.
Young, Comput. Speech Lang. 4, 35--56 (1990)] This is the case even if
supervised training is used [F. Pereira and Y. Schabes, in ACL '1992].
Stochastic lexicalized tree-adjoining grammar (SLTAG) has been recently
suggested as the basis for training algorithm [Y. Schabes, in COLING '1992].
The parameters of a SLTAG correspond to the probability of combining two
structures each one associated with a word. This system reconciles abstract
structure and concrete data in natural language processing since it is is
lexically sensitive and yet hierarchical. The need for a lexical and still
hierarchical statistical language model is partially corroborated by
preliminary experiments which show that SLTAG enable better statistical
modeling than its hierarchical counterpart, stochastic context-free grammars.
The experiments also show that SLTAG is capable of capturing lexical
distributions as well as bigram models while maintaining the hierarchical
nature of language.