Semantic Parsing on an Open Domain Corpus

Semantic parsing describes the task of automatically constructing formal meaning representations of natural language sentences. Until recently, approaches usually relied on large hand-crafted resources like semantic lexicons and composition rules. Over the last years, researchers have begun to develop probabilistic methods for semantic parsing that learn a statistical parsing model from semantically annotated corpora. One such semantic parsing approach is λ-WASP, a supervised statistical system that performed well on a comparatively small, closed domain. However, no statement can be made about the applicability of the system in an open-domain, "real-world" environment. The subject of this thesis is the application of λ-WASP to the Groningen Meaning Bank, an open-domain corpus providing deep semantic annotation. Although we select only short, simple sentences for the evaluation, and experiment only on a domain that is known to the system from training, the results we obtain are poor. λ-WASP is not able to parse more than half of the sentences in the evaluation set. Not one of the meaning representations created by λ-WASP during evaluation is correct; precision and recall on the lexical level are low. Furthermore, we find that λ-WASP is not able to deal with larger amounts of data due to its cubic run time.