Learning to Model Text Structure
Regina Barzilay
The natural
language processing community has struggled for years to develop
computational models of text structure. Such models are essential
both for interpretation of human-written text and for evaluation of
machine-generated text.
In this talk, I will present our first steps towards learning to
model text structure. I will describe two models that are induced from
a large collection of unannotated texts. The first model captures the
notion of text cohesion by
considering connectivity patterns characteristic of well-formed
texts. These patterns are inferred from a matrix that combines
distributional and syntactic information about text entities. The
second model captures the content structure of texts within a specific
domain, in terms of the topics the texts address and the order in which
these topics appear. I will present an effective method for learning
content models, utilizing a novel adaptation of algorithms for Hidden
Markov Models. To conclude my talk, I will show how these text models
can be effectively integrated in natural language generation and
summarization systems.
This is joint work with Mirella Lapata and Lillian Lee.
back to IGK4 schedule