Editorial Notes
The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected Version of Record was published on May 18, 2023. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this citation page.
Abstract
Over the past few years, researchers are showing huge interest in sentiment analysis and summarization of documents. The primary reason being that huge volumes of information are available in textual format, and this data has proven helpful for real-world applications and challenges. The sentiment analysis of a document will help the user comprehend the content’s emotional intent. Abstractive summarization algorithms generate a condensed version of the text, which can then be used to determine the emotion represented in the text using sentiment analysis. Recent research in abstractive summarization concentrates on neural network-based models, rather than conjunctions-based approaches, which might improve the overall efficiency. Neural network models like attention mechanism are tried out to handle complex works with promising results. The proposed work aims to present a novel framework that incorporates the part of speech tagging feature to the word embedding layer, which is then used as the input to the attention mechanism. With POS feature being part of the input layer, this framework is capable of dealing with words containing contextual and morphological information. The relevance of POS tagging here is due to its strong reliance on the language’s syntactic, contextual, and morphological information. The three main elements in the work are pre-processing, POS tagging feature in the embedding phase, and the incorporation of it into the attention mechanism. The word embedding provides the semantic concept about the word, while the POS tags give an idea about how significant the words are in the context of the content, which corresponds to the syntactic information. The proposed work was carried out in Malayalam, one of the prominent Indian languages. A widely used and accepted dataset from the English language was translated to Malayalam for conducting the experiments. The proposed framework gives a ROUGE score of 28, which outperformed the baseline models.
Supplemental Material
Available for Download
Version of Record for "Abstractive Summarization of Text Document in Malayalam Language: Enhancing Attention Model Using POS Tagging Feature" by Nambiar et al., ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, No. 2 (TALLIP 22:2).
- [1] . 2018. A POS tagger for Malayalam using conditional random fields. Int. J. Appl. Eng. Res. 13, 3 (2018).Google Scholar
- [2] . 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 344–354.
DOI: .arxiv:1603.07252. Google ScholarCross Ref
- [3] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- [4] Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, and Rebecca J. Passonneau. 2015. Abstractive multi-document summarization via phrase selection and merging. arXiv preprint arXiv:1506.01597 (2015).Google Scholar
- [5] . 2018. Deep communicating agents for abstractive summarization. arXiv preprint arXiv:1803.10357 (2018).Google Scholar
- [6] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- [7] Deepali K. Gaikwad and C. Namrata Mahender. 2016. A review paper on text summarization. International Journal of Advanced Research in Computer and Communication Engineering 5, 3 (2016), 154–160.Google Scholar
- [8] . 2007. A morphological processor for malayalam language. South Asia Res. 27, 2 (2007), 173–186.Google Scholar
Cross Ref
- [9] . 2018. Towards a new hybrid approach for abstractive summarization. Procedia Comput. Sci. 142 (2018), 286–293.Google Scholar
Digital Library
- [10] . 2014. A review on abstractive summarization methods. J. Theoret. Appl. Inf. Technol. 59, 1 (2014), 64–72.
DOI: Google ScholarDigital Library
- [11] . 2015. A framework for multi-document abstractive summarization based on semantic role labelling. Appl. Soft Comput. J. 30 (2015), 737–747.
DOI: Google ScholarDigital Library
- [12] . 2020. Attention: Sequence 2 Sequence model with Attention Mechanism. Retrieved from https://towardsdatascience.com/sequence-2-sequence-model-with-attention-mechanism-9e9ca2a613a.Google Scholar
- [13] . 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google Scholar
- [14] Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out. 74–81.Google Scholar
- [15] . 2019. Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318 (2019).Google Scholar
- [16] . 1958. The automatic creation of literature abstracts. IBM J. Res. Devel. 2, 2 (1958), 159–165.Google Scholar
Digital Library
- [17] . 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- [18] . 2012. Semantic graph reduction approach for abstractive text summarization. Proceedings of the International Conference on Computer Engineering and Systems. 132–138.
DOI: Google ScholarCross Ref
- [19] . 1997. Grammatical relations and clause structure in Malayalam.Google Scholar
- [20] . 2021. Attention based abstractive summarization of Malayalam document. Procedia Comput. Sci. 189 (2021), 250–257.Google Scholar
Cross Ref
- [21] . 2021. Abstractive summarization of Malayalam document using sequence to sequence model. In Proceedings of the 7th International Conference on Advanced Computing and Communication Systems (ICACCS’21). IEEE, 347–352.Google Scholar
Cross Ref
- [22] . 2018. A deep reinforced model for abstractive summarization. In Proceedings of the 6th International Conference on Learning Representations. 1–12.
arxiv:1705.04304 .Google Scholar - [23] . 2010. A classification of Sandhi rules for suffix separation in Malayalam. In Proceedings of the 38th All India Conference of Dravidian Linguists.Google Scholar
- [24] . 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017).Google Scholar
- [25] . 2017. End-to-end optimization of goal-driven and visually grounded dialogue systems. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence. 2765–2771.
DOI: Google ScholarCross Ref
- [26] George Tsatsaronis, Iraklis Varlamis, and Kjetil Nørvåg. 2010. SemanticRank: ranking keywords and sentences using semantic graphs. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling’10). 1074–1082.Google Scholar
- [27] . 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning. PMLR, 11328–11339.Google Scholar
- [28] . 2018. Neural document summarization by jointly learning to score and select sentences. arXiv preprint arXiv:1807.02305 (2018).Google Scholar
Index Terms
Abstractive Summarization of Text Document in Malayalam Language: Enhancing Attention Model Using POS Tagging Feature
Recommendations
A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus
NISS19: Proceedings of the 2nd International Conference on Networking, Information Systems & SecurityPart-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic ...
Experiments on POS tagging and data driven dependency parsing for Telugu language
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and InformaticsIn this paper we present our experiments on Part-Of-Speech tagging and data driven dependency Parsing for Telugu language. We adopted three Part-Of-Speech taggers named as Brill tagger, Maximum Entropy tagger and Trigrams 'n' Tags tagger (TnT) to Telugu ...
Abstractive Text Summarization Using Pointer-Generator Networks With Pre-trained Word Embedding
SoICT '19: Proceedings of the 10th International Symposium on Information and Communication TechnologyAbstractive text summarization is the task of generating a summary that captures the main content of a text document. As a state-of-the-art method for abstractive summarization, the pointer-generator network produces more fluent summaries and solves two ...






Comments