Abstract
The methods based on the combination of word-level and character-level features can effectively boost performance on Chinese short text classification. A lot of works concatenate two-level features with little processing, which leads to losing feature information. In this work, we propose a novel framework called Mutual-Attention Convolutional Neural Networks, which integrates word and character-level features without losing too much feature information. We first generate two matrices with aligned information of two-level features by multiplying word and character features with a trainable matrix. Then, we stack them as a three-dimensional tensor. Finally, we generate the integrated features using a convolutional neural network. Extensive experiments on six public datasets demonstrate improved performance of our new framework over current methods.
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283.Google Scholar
- Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6077--6086.Google Scholar
Cross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Proceedings of the International Conference on Learning Representations (2014).Google Scholar
- Danushka Bollegala, Vincent Atanasov, Takanori Maehara, and Ken-ichi Kawarabayashi. 2018. ClassiNet -- Predicting missing features for short-text classification. ACM Transactions on Knowledge Discovery from Data 12, 5 (2018), 1--29.Google Scholar
- Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1994. Signature verification using a “siames” time delay nerual network. In Advances in Neural Information Processing Systems. 737--744.Google Scholar
- Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. 2017. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 6298--6306.Google Scholar
Cross Ref
- Alexis Conneau, Holger Schwenk, Loc Barrault, and Yann Lecun. 2017. Very deep convolutional networks for text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Google Scholar
Cross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Edouard Grave, Tomas Mikolov, Armand Joulin, and Piotr Bojanowski. 2017. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers, Vol. 2. 427--431.Google Scholar
- Ming Hao, Bo Xu, Xucheng Yin, and Fangyuan Wang. 2018. Improve language identification method by means of n-gram frequency. Acta Automatica Sinica 44, 3 (2018).Google Scholar
- Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).Google Scholar
- Kevin Gimpel, Karen Livescu, John Wieting, Mohit Bansal. 2016. CHARAGRAM: Embedding words and sentences via character n-gram. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1504--1515.Google Scholar
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1746--1751.Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Vol. 333. 2267--2273.Google Scholar
- Yuxuan Lai, Yansong Feng, Xiaohan Yu, Zheng Wang, Kun Xu, and Dongyan Zhao. 2019. Lattice CNNs for matching based Chinese question answering. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5 (2017), 365--378.Google Scholar
Cross Ref
- Ji Young Lee and Franck Dernoncourt. 2016. Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 515--520.Google Scholar
Cross Ref
- Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, and Xiaoyong Du. 2018. Analogical reasoning on Chinese morphological and semantic relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 138--143.Google Scholar
Cross Ref
- Yan Li, Yinghua Zhang, Xiaoping Huang, Xucheng Yin, and Hongwei Hao. 2015. Chinese word segmentation with local and global context representation learning. High Technology 1 (2015), 71--77.Google Scholar
- Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1903--1911.Google Scholar
Digital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579--2605.Google Scholar
- Maarten De Rijke. 2017. Leveraging contextual sentence relations for extractive summarization using a neural attention model. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 95--104.Google Scholar
- Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics. 1201--1211.Google Scholar
- Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 1. 1556--1566.Google Scholar
- A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo, W. Ling, C. Dyer, and T. Lus. 2015. Finding function in form: Compositional character models for open vocabulary word representation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1520--1530.Google Scholar
- Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Joint embedding of words and labels for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Google Scholar
Cross Ref
- Jun Wang. 2017. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2051--2059.Google Scholar
Digital Library
- Jin Wang, Zhongyuan Wang, Dawei Zhang, and Jun Yan. [n.d.]. Combining knowledge with deep convolutional neural networks for short text classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence., Vol. 350. 2915--2921.Google Scholar
- Peng Wang, Bo Xu, Jiaming Xu, Guanhua Tian, Cheng-Lin Liu, and Hongwei Hao. 2016. Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174 (2016), 806--814.Google Scholar
Digital Library
- Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2018. Empirical exploring word-character relationship for Chinese sentence representation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 17, 3 (2018), 14.Google Scholar
- Joonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C. Barros. 2017. A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In International Joint Conference on Neural Networks (IJCNN’17). IEEE, 2384--2391.Google Scholar
- Fei Sun, Sujian Li, Yanran Li, and Wenjie Li. 2015. Component-enhanced Chinese character embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 829--834.Google Scholar
- Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association of Computational Linguistics 4, 1 (2016), 259--272.Google Scholar
Cross Ref
- David Sontag, Alexander M. Rush, Yoon Kim, and Yacine Jernite. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2741--2749.Google Scholar
- Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14): Technical Papers. 2335--2344.Google Scholar
- Ming Zhang, Chenguang Wang, Yangqiu Song, Haoran Li, and Jiawei Han. 2016. Text classification with heterogeneous information network kernels. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 2130--2136.Google Scholar
- Xingxing Zhang, Liang Lu, and Mirella Lapata. 2016. Top-down tree long short-term memory networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 310--320.Google Scholar
Cross Ref
- Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. 649--657.Google Scholar
- Yue Zhang and Jie Yang. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1554--1564.Google Scholar
Cross Ref
- Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. Proceedings of the 26th International Conference on Computational Linguistics (COLING’16): Technical Papers (2016), 3485--3495.Google Scholar
- Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Short Papers, Vol. 2. ACM, 207--212.Google Scholar
Cross Ref
- Yujun Zhou, Bo Xu, Jiaming Xu, Lei Yang, Changliang Li, and Bo Xu. 2016. Compositional recurrent neural networks for Chinese short text classification. In Proceedings of the 2016 IEEE/WIC/ACM International Conference on Intelligence. IEEE, 137--144.Google Scholar
Cross Ref
- Yujun Zhou, Jiaming Xu, Jie Cao, Bo Xu, and Changliang Li. 2017. Hybrid attention networks for Chinese short text classification. Computacióny Sistemas 21, 4 (2017), 759--769.Google Scholar
- Xiaobin Zhu, Zhuangzi Li, Xianbo Li, Shanshan Li, and Feng Dai. 2020. Attention-aware perceptual enhancement nets for low-resolution image classification. Information Sciences 515 (2020), 233--247.Google Scholar
Cross Ref
- Xiaobin Zhu, Zhuangzi Li, Xiao-Yu Zhang, Changsheng Li, Yaqi Liu, and Ziyu Xue. 2019. Residual invertible spatio-temporal network for video super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5981--5988.Google Scholar
Digital Library
Index Terms
Chinese Short Text Classification with Mutual-Attention Convolutional Neural Networks
Recommendations
Offline Handwritten English Character Recognition Based on Convolutional Neural Network
DAS '12: Proceedings of the 2012 10th IAPR International Workshop on Document Analysis SystemsThis paper applies Convolutional Neural Networks (CNNs) for offline handwritten English character recognition. We use a modified LeNet-5 CNN model, with special settings of the number of neurons in each layer and the connecting way between some layers. ...
3D multi-resolution wavelet convolutional neural networks for hyperspectral image classification
Hyperspectral images contain abundant spectral information, and three-dimensional (3D) feature extraction methods have been shown to be effective for classification. In this paper, we propose a hyperspectral image classification method that uses 3D ...
Entity-Based Short Text Classification Using Convolutional Neural Networks
Knowledge Engineering and Knowledge ManagementAbstractIt is beyond human capabilities to analyze a huge amount of short text produced on the World Wide Web in the form of search queries, social media platforms, etc. Due to many difficulties underlying short text for automated processing, i.e, ...






Comments