Abstract
Extractive summarization, a process that automatically picks exemplary sentences from a text (or spoken) document with the goal of concisely conveying key information therein, has seen a surge of attention from scholars and practitioners recently. Using a language modeling (LM) approach for sentence selection has been proven effective for performing unsupervised extractive summarization. However, one of the major difficulties facing the LM approach is to model sentences and estimate their parameters more accurately for each text (or spoken) document. We extend this line of research and make the following contributions in this work. First, we propose a position-aware language modeling framework using various granularities of position-specific information to better estimate the sentence models involved in the summarization process. Second, we explore disparate ways to integrate the positional cues into relevance models through a pseudo-relevance feedback procedure. Third, we extensively evaluate various models originated from our proposed framework and several well-established unsupervised methods. Empirical evaluation conducted on a broadcast news summarization task further demonstrates performance merits of the proposed summarization methods.
- J. Carbonell and J. Goldstein. 1998. The use of MMR, diversity based reranking for reordering documents and producing summaries. In Proceedings of ACM SIGIR Conference. 335--336. Google Scholar
Digital Library
- A. Celikyilmaz and D. Hakkani-Tur. 2010. A hybrid hierarchical model for multi-document summarization. In Proceedings of the Association for Computational Linguistics. 815--824. Google Scholar
Digital Library
- B. Chen and K.-Y. Chen. 2013. Leveraging relevance cues for language modeling in speech recognition. Inf. Process. Manag. 49, 4, 807--816. Google Scholar
Digital Library
- B. Chen, J.-W. Kuo, and W.-H. Tsai. 2004. Lightly supervised and data-driven approaches to mandarin broadcast news transcription. In Proceedings of the IEEE International Conference on Acoustic Speech Signal Processing. 777--780.Google Scholar
- B. Chen, S.-H. Lin, Y.-M. Chang, and J.-W. Liu. 2013. Extractive speech summarization using evaluation metric-related training criteria. Inf. Process. Manag. 49, 1, 1--12. Google Scholar
Digital Library
- H.-S. Chiu, K.-Y. Chen, and B. Chen. 2014. Leveraging topical and positional cues for language modeling in speech recognition. Multimed. Tools Appl. 72, 2, 1465--1481. Google Scholar
Digital Library
- G. Erkan and D. R. Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intelli. Res. 22, 1, 457--479. Google Scholar
Digital Library
- M. A. Fattah and F. Ren. 2009. GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput. Speech Lang. 23, 1, 126--144. Google Scholar
Digital Library
- S. Furui, L. Deng, M. Gales, H. Ney, and K. Tokuda. 2012. Fundamental technologies in modern speech recognition. IEEE Signal Process. Mag. 29, 6, 16--17.Google Scholar
Cross Ref
- S. Furui, T. Kikuchi, Y. Shinnaka, and C. Hori. 2004. Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Trans. Speech Audio Process. 12, 4, 401--408.Google Scholar
Cross Ref
- M. Galley. 2006. Skip-chain conditional random field for ranking meeting utterances by importance. In Proceedings of Empirical Methods in Natural Language Processing. 364--372. Google Scholar
Digital Library
- Y. Gong and X. Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of ACM SIGIR Conference. 19--25. Google Scholar
Digital Library
- A. Haghighi and L. Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of the NAACL HLT Conference. 362--370. Google Scholar
Digital Library
- M. Kagebäck, O. Mogren, N. Tahmasebi, and D. Dubhashi. 2014. Extractive summarization using continuous vector space models. In Proceedings of Workshop on Continuous Vector Space Models and their Compositionality. 31--39.Google Scholar
- J. Kupie, J. Pedersen, and F. Chen. 1995. A trainable document summarizer. In Proceedings of the ACM SIGIR Conference. 68--73. Google Scholar
Digital Library
- V. Lavrenko and B. Croft. 2001. Relevance-based language models. In Proceedings of the ACM SIGIR Conference. 120--127. Google Scholar
Digital Library
- L.-S. Lee and B. Chen. 2005. Spoken document understanding and organization. IEEE Signal Process. Mag. 22, 5, 42--60.Google Scholar
Cross Ref
- H. Lin and J. Bilmes. 2010. Multi-document summarization via budgeted maximization of submodular functions. In Proceedings of the NAACL HLT Conference. 912--920. Google Scholar
Digital Library
- S.-H. Lin, Y.-M. Yeh, and B. Chen. 2011. Leveraging kullback-leibler divergence measures and information-rich cues for speech summarization. Trans. Audio, Speech, Lang. Process. 19, 4, 871--882. Google Scholar
Digital Library
- C.-Y. Lin. 2003. ROUGE: Recall-oriented understudy for gisting evaluation. Retrieved from http://haydn.isi.edu/ROUGE/.Google Scholar
- Y. Liu and D. Hakkani-Tur. 2011. Speech Summarization. Chapter 13, Wiley, New York.Google Scholar
- S.-H. Liu, K.-Y. Chen, B. Chen, H.-M. Wang, H.-C. Yen, and W.-L. Hsu. 2015. Positional language modeling for extractive broadcast news speech summarization. In Proceedings of Interspeech.Google Scholar
- Y. Lv and C.-X. Zhai. 2010. Positional relevance model for pseudo-relevance feedback. In Proceedings of the ACM SIGIR Conference. 579--586. Google Scholar
Digital Library
- I. Mani and M. T. Maybury (Eds.) 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- K. McKeown, J. Hirschberg, M. Galley, and S. Maskey. 2005. From text to speech summarization. In Proceedings of the IEEE International Conference on Acoustic Speech Signal Processing. 997--1000Google Scholar
- A. Nenkova and K. McKeown. 2011. Automatic summarization. Found. Trends Inf. Retr. 5, 2--3, 103--233.Google Scholar
Cross Ref
- M. Ostendorf. 2008. Speech technology and information access. IEEE Signal Process. Mag. 25, 3, 150--152.Google Scholar
Cross Ref
- H. Palangi, L. Deng, Y. Shen, J. Gao, X. He, J. Chen, X. Song, and R. Ward. 2015. Deep sentence embedding using the long short term memory network: Analysis and application to information retrieval. In Proceedings of the International Conference on Machine Learning.Google Scholar
- G. Penn and X. Zhu. 2008. A critical reassessment of evaluation baselines for speech summarization. In Proceedings of the Association of Computational Linguistics. 470--478.Google Scholar
- K. Riedhammer, B. Favrec, and D. Hakkani-Tür. 2010. Long story short—global unsupervised models for keyphrase based meeting summarization. Speech Commun. 52, 10, 801--815. Google Scholar
Digital Library
- Torres-Moreno Juan-Manuel. 2014. Automatic Text Summarization. Wiley-ISTE.Google Scholar
- X. Wan and J. Yang. 2008. Multi-document summarization using cluster-based link analysis. In Proceedings of the ACM SIGIR Conference. 299--306. Google Scholar
Digital Library
- H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng. 2005. MATBN: A mandarin chinese broadcast news corpus. Int. J. Comput. Linguist. and Chin. Lang. Process. 10, 2, 219--236.Google Scholar
- X. Wang, Y. Yoshida, T. Hirao, K. Sudoh, and M. Nagata. 2015. Summarization based on task-oriented discourse parsing. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23, 8, 1358--1367.Google Scholar
Digital Library
- C.-X. Zhai. 2008. Statistical language models for information retrieval: A critical review. Found. Trends Inf. Retr. 2, 3, 137--213. Google Scholar
Digital Library
- J. Zhang and P. Fung. 2007. Speech summarization without lexical features for Mandarin broadcast news. In Proceedings of the NAACL HLT. 213--216. Google Scholar
Digital Library
- X. Zhu, G. Penn, and F. Rudzicz. 2009. Summarizing multiple spoken documents: Finding evidence from untranscribed audio. In Proceedings of the Joint Conference of ACL and IJCNLP. 549--557. Google Scholar
Digital Library
Index Terms
A Position-Aware Language Modeling Framework for Extractive Broadcast News Speech Summarization
Recommendations
Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization
The primary task of extractive summarization is to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important theme of the original document. Recently, language modeling (LM) has ...
Extractive text summarization using clustering-based topic modeling
AbstractText summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Combining relevance language modeling and clarity measure for extractive speech summarization
Extractive speech summarization, which purports to select an indicative set of sentences from a spoken document so as to succinctly represent the most important aspects of the document, has garnered much research over the years. In this paper, we cast ...






Comments