The ability to perform monolingual text-to-text generation is an important step in solving many natural language processing problems. For example, when generating novel text at the sentence-level, abstractive summarization systems may need to compress sentences or fuse multiple sentences together; the evaluation of translation systems may require additional paraphrases to use as reference gold standards; and answers to questions may need to be generated automatically from extracted sentences.
The community of researchers examining monolingual text-to-text generation has grown steadily in recent years, introducing the need for a focused venue to communicate results in this area. To this end, we proposed and organised this workshop at ACL with endorsement from SIGGEN. We hope that this is the first of many text-to-text generation workshops to come.
We were excited to receive 18 submissions which were judged in accordance with the standard reviewing practices of the ACL 2011 main conference. As we intended that the workshop serve as a new forum for the community, our aim in the selection process was to choose high quality papers which would spark discussion amongst the participants.
We selected seven long papers and four short papers. Together, they tackle a diverse range of research questions: reflecting upon the scope of what might be generated in a text-to-text process, examining new generation methods, and addressing the ever challenging issue of evaluation.
Proceeding Downloads
Learning to simplify sentences using Wikipedia
In this paper we examine the sentence simplification problem as an English-to-English translation problem, utilizing a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. This data set contains the ...
Web-based validation for contextual targeted paraphrasing
In this work, we present a scenario where contextual targeted paraphrasing of sub-sentential phrases is performed automatically to support the task of text revision. Candidate paraphrases are obtained from a preexisting repertoire and validated in the ...
An unsupervised alignment algorithm for text simplification corpus construction
We present a method for the sentence-level alignment of short simplified text to the original text from which they were adapted. Our goal is to align a medium-sized corpus of parallel text, consisting of short news texts in Spanish with their simplified ...
Comparing phrase-based and syntax-based paraphrase generation
Paraphrase generation can be regarded as machine translation where source and target language are the same. We use the Moses statistical machine translation toolkit for paraphrasing, comparing phrase-based to syntax-based approaches. Data is derived ...
Text specificity and impact on quality of news summaries
In our work we use an existing classifier to quantify and analyze the level of specific and general content in news documents and their human and automatic summaries. We discover that while human abstracts contain a more balanced mix of general and ...
Towards strict sentence intersection: decoding and evaluation strategies
We examine the task of strict sentence intersection: a variant of sentence fusion in which the output must only contain the information present in all input sentences and nothing more. Our proposed approach involves alignment and generalization over the ...
Learning to fuse disparate sentences
We present a system for fusing sentences which are drawn from the same source document but have different content. Unlike previous work, our approach is supervised, training on real-world examples of sentences fused by professional journalists in the ...
Framework for abstractive summarization using text-to-text generation
We propose a new, ambitious framework for abstractive summarization, which aims at selecting the content of a summary not from sentences, but from an abstract representation of the source documents. This abstract representation relies on the concept of ...
Creating disjunctive logical forms from aligned sentences for grammar-based paraphrase generation
We present a method of creating disjunctive logical forms (DLFs) from aligned sentences for grammar-based paraphrase generation using the OpenCCG broad coverage surface realizer. The method takes as input word-level alignments of two sentences that are ...
Paraphrastic sentence compression with a character-based metric: tightening without deletion
We present a substitution-only approach to sentence compression which "tightens" a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60% of the original length. In support ...
Evaluating sentence compression: pitfalls and suggested remedies
This work surveys existing evaluation methodologies for the task of sentence compression, identifies their shortcomings, and proposes alternatives. In particular, we examine the problems of evaluating paraphrastic compression and comparing the output of ...
Index Terms
Proceedings of the Workshop on Monolingual Text-To-Text Generation


