Abstract

Combining error-driven pruning and classification for partial parsing. Claire Cardie, Scott Mardis, and David Pierce. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 87-96, 1999.

We present a new approach to partial parsing of natural language texts that relies on machine learning methods. The approach combines corpus-based grammar induction with a very simple pattern-matching algorithm and an optional constituent verification step. The grammar induction algorithm acquires a set of rules for each level of linguistic analysis using a new technique for error-driven pruning of treebank grammars. The constituent verification step employs standard inductive learning techniques as an additional precision-enhancing device. We evaluate the approach on four data sets and find that performance is very good (93% precision and recall) for applications that require or prefer fairly simple constituent bracketing. As the complexity of the partial parsing task increases, however, our approach lags the performance of competing approaches. We explain these differences in terms of the knowledge sources employed by each method and describe a number of features that make the approach attractive for large-scale, practical NLP applications.

[abstract, ps, pdf]