Abstract
Text summarization is one of the significant tasks of natural language processing, which automatically converts text into a summary. Some summarization systems, for short/long English, and short Chinese text, benefit from advances in the neural encoder-decoder model because of the availability of large datasets. However, the long Chinese text summarization research has been limited to datasets of a couple of hundred instances. This article aims to explore the long Chinese text summarization task. To begin with, we construct a first large-scale, long Chinese text summarization corpus, the Long Chinese Summarization of Police Inquiry Record Text (LCSPIRT). Based on this corpus, we propose a sequence-to-sequence (Seq2Seq) model that incorporates a global encoding process with an attention mechanism. Our model achieves a competitive result on the LCSPIRT corpus compared with several benchmark methods.
- Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Asli Celikyilmaz, Antoine Bosselut, Xiaodong He, and Yejin Choi. 2018. Deep communicating agents for abstractive summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1662--1675.Google Scholar
Cross Ref
- Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. In Proceedings of the 54th Meeting of the Association for Computational Linguistics. 484--494.Google Scholar
Cross Ref
- Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder--decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST’14). 103--111.Google Scholar
Cross Ref
- Sumit Chopra, Michael Auli, and Alexander M. Rush. 2016. Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 93--98.Google Scholar
- Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 615--621.Google Scholar
Cross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.Google Scholar
- Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, and Jackie Chi Kit Cheung. 2018. BanditSum: Extractive summarization as a contextual bandit. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 3739--3748.Google Scholar
Cross Ref
- Günes Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Arti. Intell. Res. 22 (2004), 457--479.Google Scholar
Cross Ref
- Alexander R. Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R. Radev. 2019. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. 1074--1084.Google Scholar
- Sebastian Gehrmann, Yuntian Deng, and Alexander Rush. 2018. Bottom-up abstractive summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4098--4109.Google Scholar
Cross Ref
- Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O. K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Meeting of the Association for Computational Linguistics. 1631--1640.Google Scholar
- Yongshuai Hou, Yang Xiang, Buzhou Tang, Qingcai Chen, Xiaolong Wang, and Fangze Zhu. 2017. Identifying high quality document--summary pairs through text matching. Information 8, 2 (2017), 64.Google Scholar
Cross Ref
- Baotian Hu, Qingcai Chen, and Fangze Zhu. 2015. LCSTS: A large scale Chinese short text summarization dataset. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1967--1972.Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2019. Scoring sentence singletons and pairs for abstractive summarization. arXiv:arXiv preprint arXiv:1906.00077 (2019).Google Scholar
- Piji Li, Wai Lam, Lidong Bing, and Zihao Wang. 2017. Deep recurrent generative decoder for abstractive text summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2091--2100.Google Scholar
Cross Ref
- Wei Li, Xinyan Xiao, Yajuan Lyu, and Yuanzhuo Wang. 2018. Improving neural abstractive document summarization with explicit information selection modeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1787--1796.Google Scholar
Cross Ref
- Zhixin Li, Zhi Peng, Suqin Tang, Canlong Zhang, and Huifang Ma. 2020. Text summarization method based on double attention pointer network. IEEE Access 8 (2020), 11279--11288.Google Scholar
Cross Ref
- Chin-Yew Lin and Eduard Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics.Google Scholar
Digital Library
- Junyang Lin, Xu Sun, Shuming Ma, and Qi Su. 2018. Global encoding for abstractive summarization. In Proceedings of the 56th Meeting of the Association for Computational Linguistics (Volume 2). 163--169.Google Scholar
Cross Ref
- Jeffrey Ling and Alexander Rush. 2017. Coarse-to-fine attention models for document summarization. In Proceedings of the Workshop on New Frontiers in Summarization. 33--42.Google Scholar
Cross Ref
- Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating Wikipedia by summarizing long sequences. arXiv:arXiv preprint arXiv:1801.10198 (2018).Google Scholar
- Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3721--3731.Google Scholar
Cross Ref
- Thang Luong, Ilya Sutskever, Quoc Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 11--19.Google Scholar
Cross Ref
- Shuming Ma and Xu Sun. 2017. A semantic relevance based neural network for text summarization and text simplification. arXiv:arXiv preprint arXiv:1710.02318 (2017).Google Scholar
- Shuming Ma, Xu Sun, Junyang Lin, and Xuancheng Ren. 2018. A hierarchical end-to-end model for jointly improving text summarization and sentiment classification. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 4251--4257.Google Scholar
Cross Ref
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
- Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Caglar Gulcehre, and Bing Xiang. 2016. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 280--290.Google Scholar
Cross Ref
- Kishore Papineni, Salim Roukos, Todd Ward, and Weijing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Meeting of the Association for Computational Linguistics. 311--318.Google Scholar
- Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 379--389.Google Scholar
Cross Ref
- Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Meeting of the Association for Computational Linguistics. 1073--1083.Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google Scholar
Cross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5998--6008.Google Scholar
- Bingzhen Wei, Xuancheng Ren, Yi Zhang, Xiaoyan Cai, Qi Su, and Xu Sun. 2019. Regularizing output distribution of abstractive Chinese social media text summarization for improved semantic consistency. ACM Trans. Low-resour. Lang. Inf. Proc. 18, 3 (2019), 1--15.Google Scholar
Digital Library
- Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, and Steffen Eger. 2019. MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 563--578.Google Scholar
Cross Ref
- Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, and Tiejun Zhao. 2018. Neural document summarization by jointly learning to score and select sentences. In Proceedings of the 56th Meeting of the Association for Computational Linguistics. 654--663.Google Scholar
Cross Ref
Index Terms
Global Encoding for Long Chinese Text Summarization
Recommendations
A Comparative Analysis on Hindi and English Extractive Text Summarization
Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English ...
Multi-document Hyperedge-based Ranking for Text Summarization
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementIn a multi-document settings, graph-based extractive summarization approaches build a similarity graph out of sentences in each cluster of documents then use graph centrality approaches to measure the importance of sentences. The similarity is computed ...
Sentiment diversification for short review summarization
WI '17: Proceedings of the International Conference on Web IntelligenceWith the abundance of reviews published on the Web about a given product, consumers are looking for ways to view major opinions that can be presented in a quick and succinct way. Reviews contain many different opinions, making the ability to show a ...






Comments