skip to main content
research-article

Chinese Short Text Classification with Mutual-Attention Convolutional Neural Networks

Authors Info & Claims
Published:04 August 2020Publication History
Skip Abstract Section

Abstract

The methods based on the combination of word-level and character-level features can effectively boost performance on Chinese short text classification. A lot of works concatenate two-level features with little processing, which leads to losing feature information. In this work, we propose a novel framework called Mutual-Attention Convolutional Neural Networks, which integrates word and character-level features without losing too much feature information. We first generate two matrices with aligned information of two-level features by multiplying word and character features with a trainable matrix. Then, we stack them as a three-dimensional tensor. Finally, we generate the integrated features using a convolutional neural network. Extensive experiments on six public datasets demonstrate improved performance of our new framework over current methods.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283.Google ScholarGoogle Scholar
  2. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6077--6086.Google ScholarGoogle ScholarCross RefCross Ref
  3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Proceedings of the International Conference on Learning Representations (2014).Google ScholarGoogle Scholar
  4. Danushka Bollegala, Vincent Atanasov, Takanori Maehara, and Ken-ichi Kawarabayashi. 2018. ClassiNet -- Predicting missing features for short-text classification. ACM Transactions on Knowledge Discovery from Data 12, 5 (2018), 1--29.Google ScholarGoogle Scholar
  5. Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1994. Signature verification using a “siames” time delay nerual network. In Advances in Neural Information Processing Systems. 737--744.Google ScholarGoogle Scholar
  6. Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 6298--6306.Google ScholarGoogle ScholarCross RefCross Ref
  7. Alexis Conneau, Holger Schwenk, Loc Barrault, and Yann Lecun. 2017. Very deep convolutional networks for text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  9. Edouard Grave, Tomas Mikolov, Armand Joulin, and Piotr Bojanowski. 2017. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers, Vol. 2. 427--431.Google ScholarGoogle Scholar
  10. Ming Hao, Bo Xu, Xucheng Yin, and Fangyuan Wang. 2018. Improve language identification method by means of n-gram frequency. Acta Automatica Sinica 44, 3 (2018).Google ScholarGoogle Scholar
  11. Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).Google ScholarGoogle Scholar
  12. Kevin Gimpel, Karen Livescu, John Wieting, Mohit Bansal. 2016. CHARAGRAM: Embedding words and sentences via character n-gram. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1504--1515.Google ScholarGoogle Scholar
  13. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1746--1751.Google ScholarGoogle ScholarCross RefCross Ref
  14. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  15. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Vol. 333. 2267--2273.Google ScholarGoogle Scholar
  16. Yuxuan Lai, Yansong Feng, Xiaohan Yu, Zheng Wang, Kun Xu, and Dongyan Zhao. 2019. Lattice CNNs for matching based Chinese question answering. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5 (2017), 365--378.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ji Young Lee and Franck Dernoncourt. 2016. Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 515--520.Google ScholarGoogle ScholarCross RefCross Ref
  19. Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, and Xiaoyong Du. 2018. Analogical reasoning on Chinese morphological and semantic relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 138--143.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yan Li, Yinghua Zhang, Xiaoping Huang, Xucheng Yin, and Hongwei Hao. 2015. Chinese word segmentation with local and global context representation learning. High Technology 1 (2015), 71--77.Google ScholarGoogle Scholar
  21. Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1903--1911.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  23. Maarten De Rijke. 2017. Leveraging contextual sentence relations for extractive summarization using a neural attention model. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 95--104.Google ScholarGoogle Scholar
  24. Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics. 1201--1211.Google ScholarGoogle Scholar
  25. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 1. 1556--1566.Google ScholarGoogle Scholar
  26. A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo, W. Ling, C. Dyer, and T. Lus. 2015. Finding function in form: Compositional character models for open vocabulary word representation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1520--1530.Google ScholarGoogle Scholar
  27. Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Joint embedding of words and labels for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  28. Jun Wang. 2017. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2051--2059.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. [n.d.]. Combining knowledge with deep convolutional neural networks for short text classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence., Vol. 350. 2915--2921.Google ScholarGoogle Scholar
  30. Peng Wang, Bo Xu, Jiaming Xu, Guanhua Tian, Cheng-Lin Liu, and Hongwei Hao. 2016. Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174 (2016), 806--814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2018. Empirical exploring word-character relationship for Chinese sentence representation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 17, 3 (2018), 14.Google ScholarGoogle Scholar
  32. Joonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C. Barros. 2017. A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In International Joint Conference on Neural Networks (IJCNN’17). IEEE, 2384--2391.Google ScholarGoogle Scholar
  33. Fei Sun, Sujian Li, Yanran Li, and Wenjie Li. 2015. Component-enhanced Chinese character embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 829--834.Google ScholarGoogle Scholar
  34. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association of Computational Linguistics 4, 1 (2016), 259--272.Google ScholarGoogle ScholarCross RefCross Ref
  35. David Sontag, Alexander M. Rush, Yoon Kim, and Yacine Jernite. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2741--2749.Google ScholarGoogle Scholar
  36. Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14): Technical Papers. 2335--2344.Google ScholarGoogle Scholar
  37. Ming Zhang, Chenguang Wang, Yangqiu Song, Haoran Li, and Jiawei Han. 2016. Text classification with heterogeneous information network kernels. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2130--2136.Google ScholarGoogle Scholar
  38. Xingxing Zhang, Liang Lu, and Mirella Lapata. 2016. Top-down tree long short-term memory networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 310--320.Google ScholarGoogle ScholarCross RefCross Ref
  39. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. 649--657.Google ScholarGoogle Scholar
  40. Yue Zhang and Jie Yang. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1554--1564.Google ScholarGoogle ScholarCross RefCross Ref
  41. Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. Proceedings of the 26th International Conference on Computational Linguistics (COLING’16): Technical Papers (2016), 3485--3495.Google ScholarGoogle Scholar
  42. Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Short Papers, Vol. 2. ACM, 207--212.Google ScholarGoogle ScholarCross RefCross Ref
  43. Yujun Zhou, Bo Xu, Jiaming Xu, Lei Yang, Changliang Li, and Bo Xu. 2016. Compositional recurrent neural networks for Chinese short text classification. In Proceedings of the 2016 IEEE/WIC/ACM International Conference on Intelligence. IEEE, 137--144.Google ScholarGoogle ScholarCross RefCross Ref
  44. Yujun Zhou, Jiaming Xu, Jie Cao, Bo Xu, and Changliang Li. 2017. Hybrid attention networks for Chinese short text classification. Computacióny Sistemas 21, 4 (2017), 759--769.Google ScholarGoogle Scholar
  45. Xiaobin Zhu, Zhuangzi Li, Xianbo Li, Shanshan Li, and Feng Dai. 2020. Attention-aware perceptual enhancement nets for low-resolution image classification. Information Sciences 515 (2020), 233--247.Google ScholarGoogle ScholarCross RefCross Ref
  46. Xiaobin Zhu, Zhuangzi Li, Xiao-Yu Zhang, Changsheng Li, Yaqi Liu, and Ziyu Xue. 2019. Residual invertible spatio-temporal network for video super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5981--5988.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Chinese Short Text Classification with Mutual-Attention Convolutional Neural Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!