Learning to Model Text Structure
Regina Barzilay
 
The natural language processing community has struggled for years to develop computational models of text structure.  Such models are essential both for interpretation of human-written text and for evaluation of machine-generated text.
 
In this talk, I will present our first steps towards learning to model text structure. I will describe two models that are induced from a large collection of unannotated texts. The first model captures the notion of text cohesion by
considering connectivity patterns characteristic of well-formed texts. These patterns are inferred from a matrix that combines distributional and syntactic information about text entities. The second model captures the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. I will present an effective method for learning content models, utilizing a novel adaptation of algorithms for Hidden Markov Models. To conclude my talk, I will show how these text models can be effectively integrated in natural language generation and summarization systems.
 
This is joint work with Mirella Lapata and Lillian Lee.

back to IGK4 schedule