Abstract
The need to capture intra-word information in natural language processing (NLP) tasks has inspired research in learning various word representations at word, character, or morpheme levels, but little attention has been given to syllables from a syllabic alphabet. Motivated by the success of compositional models in morphological languages, we present a Convolutional-long short term memory (Conv-LSTM) model for constructing Swahili word representation vectors from syllables. The unified architecture addresses the word agglutination and polysemous nature of Swahili by extracting high-level syllable features using a convolutional neural network (CNN) and then composes quality word embeddings with a long short term memory (LSTM). The word embeddings are then validated using a syllable-aware language model (31.267) and a part-of-speech (POS) tagging task (98.78), both yielding very competitive results to the state-of-art models in their respective domains. We further validate the language model using Xhosa and Shona, which are syllabic-based languages. The novelty of the study is in its capability to construct quality word embeddings from syllables using a hybrid model that does not use max-over-pool common in CNN and then the exploitation of these embeddings in POS tagging. Therefore, the study plays a crucial role in the processing of agglutinative and syllabic-based languages by contributing quality word embeddings from syllable embeddings, a robust Conv–LSTM model that learns syllables for not only language modeling and POS tagging, but also for other downstream NLP tasks.
- A. Adedjouma Sèmiyou, John O. R. Aoga, and Mamoud A. Igue. 2012. Part-of-speech tagging of Yoruba standard, language of Niger-Congo family. Research Journal of Computer and Information Technology Sciences 1 (2013), 2.Google Scholar
- Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International conference on Computational Linguistics (COLING’18). 1638–1649.Google Scholar
- Assibi Apatewon Amidu. 1995. Kiswahili: People, language, literature and lingua franca. Nordic J. Afric. Stud. 4, 1 (1995), 104–123.Google Scholar
- H. Arvi. 1999. Swahili language manager-SALAMA. Nordic J. Afric. Stud. 8, 2 (1999), 139–157.Google Scholar
- Ethel O. Ashton. 1947. Kiswahili Grammar, Longmans.Google Scholar
- Zhenisbek Assylbekov, Rustem Takhanov, Bagdat Myrzakhmetov, and Jonathan N. Washington. 2017. Syllable-aware neural language models: A failure to beat character-aware ones. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1866–1872. DOI:https://doi.org/10.18653/v1/D17-1199Google Scholar
- Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. 2013. Joint language and translation modeling with recurrent neural networks. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1044--1054.Google Scholar
- Kaspars Balodis and Daiga Deksne. 2019. FastText-based intent detection for inflected languages. Information 10, 5 (2019), 161.Google Scholar
Cross Ref
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3, Feb. (2003), 1137–1155. Google Scholar
Digital Library
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 (2017), 135–146.Google Scholar
Cross Ref
- Jan Botha and Phil Blunsom. 2014. Compositional morphology for word representations and language modelling. In Proceedings of the International Conference on Machine Learning. 1899–1907. Google Scholar
Digital Library
- Kris Cao and Marek Rei. 2016. A joint model for word embedding and word morphology. In Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, 18–26. DOI:https://doi.org/10.18653/v1/W16-1603Google Scholar
Cross Ref
- Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huanbo Luan. 2015. Joint learning of character and word embeddings. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. Google Scholar
Digital Library
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.Google Scholar
Cross Ref
- Sanghyuk Choi, Taeuk Kim, Jinseok Seol, and Sang-goo Lee. 2017. A syllable-based technique for word embeddings of Korean words. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 36.Google Scholar
Cross Ref
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160–167. Google Scholar
Digital Library
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, Aug. (2011), 2493–2537. Google Scholar
Digital Library
- Ryan Cotterell and Hinrich Schütze. 2015. Morphological word-embeddings. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1287–1292.Google Scholar
Cross Ref
- Mathias Creutz and Krista Lagus. 2005. Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0. Helsinki University of Technology Helsinki.Google Scholar
- Guy De Pauw and Gilles-Maurice De Schryver. 2008. Improving the computational morphological analysis of a Swahili corpus for lexicographic purposes. Lexikos 18, 1 (2008).Google Scholar
- Guy De Pauw, Gilles-Maurice de Schryver, and Janneke van de Loo. 2012. Resource-light Bantu part-of-speech tagging. In Proceedings of the Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL 8-AFLAT’12). European Language Resources Association, 85–92.Google Scholar
- Guy De Pauw, Gilles-Maurice De Schryver, and Peter W Wagacha. 2006. Data-driven part-of-speech tagging of Kiswahili. In Proceedings of the International Conference on Text, Speech and Dialogue. Springer, 197–204. Google Scholar
Digital Library
- Guy De Pauw, Peter Waiganjo Wagacha, and Gilles-Maurice de Schryver. 2011. Towards English-Swahili machine translation. In Research Workshop of the Israel Science Foundation.Google Scholar
- Kamil Ud Deen Salah Ud Deen. 2002. The acquisition of Nairobi Swahili: The morphosyntax of inflectional prefixes and subjects. Ph.D. Dissertation. University of California, Los Angeles.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19).Google Scholar
- Jiachen Du, Lin Gui, Yulan He, Ruifeng Xu, and Xuan Wang. 2019. Convolution-based neural attention with applications to sentiment classification. IEEE Access 7 (2019), 27983--27992.Google Scholar
Cross Ref
- Jeffrey L. Elman. 1990. Finding structure in time. Cog. Sci. 14, 2 (1990), 179–211.Google Scholar
Cross Ref
- R. Elwell. 2008. Using syllables as features in Morpheme tagging in Swahili. In Proceedings of the 5th Midwest Computational Linguistics Colloquium.Google Scholar
- G. David Forney. 1973. The Viterbi algorithm. Proc. IEEE 61, 3 (1973), 268–278.Google Scholar
Cross Ref
- Hadrien Gelas, Laurent Besacier, and Francois Pellegrino. 2012. Developments of Swahili resources for an automatic speech recognition system. In Proceedings of the Workshop on Spoken Language Technologies for Under-Resourced Languages, Cape-Town, South Africa (SLTU’12). Retrieved from http://hal.inria.fr/hal-00954048.Google Scholar
- John Goldsmith, Yu Hu, Irina Matveeva, and Colin Sprague. 2005. A Heuristic for Morpheme Discovery Based on String Edit Distance. Technical Report of Computer Science Department, University of Chicago.Google Scholar
- Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Languages Resources Association (ELRA), 3483–3487. Retrieved from https://www.aclweb.org/anthology/L18-1550.Google Scholar
- Prakhar Gupta, Matteo Pagliardini, and Martin Jaggi. 2019. Better word embeddings by disentangling contextual n-gram information. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google Scholar
Cross Ref
- Abdalraouf Hassan and Ausif Mahmood. 2018. Convolutional recurrent deep learning model for sentence classification. IEEE Access 6 (2018), 13949–13957.Google Scholar
Cross Ref
- Georg Heigold, Guenter Neumann, and Josef van Genabith. 2017. An extensive empirical evaluation of character-based morphological tagging for 14 languages. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 505–513.Google Scholar
Cross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. Google Scholar
Digital Library
- Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google Scholar
- Arvi Hurskainen. 1996. Disambiguation of morphological analysis in Bantu languages. In Proceedings of the 16th Conference on Computational Linguistics. Association for Computational Linguistics, 568–573. Google Scholar
Digital Library
- A. Hurskainen. 2016. Helsinki Corpus of Swahili 2.0 (HCS 2.0) Annotated Version. http://urn.fi/urn:nbn:fi:lb-201608301.Google Scholar
- Go Inoue, Hiroyuki Shindo, and Yuji Matsumoto. 2017. Joint prediction of morphosyntactic categories for fine-grained Arabic part-of-speech tagging exploiting tag dictionary information. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL’17). 421–431.Google Scholar
Cross Ref
- Ishani Joshi, Purvi Koringa, and Suman Mitra. 2019. Word embeddings in low resource Gujarati language. In Proceedings of the International Conference on Document Analysis and Recognition Workshops (ICDARW’19). IEEE, 110–115.Google Scholar
Cross Ref
- Jurgita Kapočiūtė-Dzikienė, Robertas Damaševičius, and Marcin Woźniak. 2019. Sentiment analysis of Lithuanian texts using traditional and deep learning approaches. Computers 8, 1 (2019), 4.Google Scholar
Cross Ref
- Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128–3137.Google Scholar
Cross Ref
- Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1097–1105. Google Scholar
Digital Library
- Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- Angeliki Lazaridou, Marco Marelli, Roberto Zamparelli, and Marco Baroni. 2013. Compositionally derived representations of morphologically complex words in distributional semantics. In Proceedings of the 51st Meeting of the Association for Computational Linguistics. 1517–1526.Google Scholar
- Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. Handb. Brain Theor Neural Netw. 3361, 10 (1995).Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google Scholar
- Wang Ling, Chris Dyer, Alan W. Black, Isabel Trancoso, Ramón Fermandez, Silvio Amir, Luís Marujo, and Tiago Luís. 2015. Finding function in form: Compositional character models for open vocabulary word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1520–1530. DOI:https://doi.org/10.18653/v1/D15-1176Google Scholar
Cross Ref
- Qihe Liu, Xiaonan Hu, Mao Ye, Xianqiong Cheng, and Fan Li. 2015. Gas recognition under sensor drift by using deep learning. Int. J. Intell. Syst. 30, 8 (2015), 907–922. Google Scholar
Digital Library
- Thang Luong, Richard Socher, and Christopher Manning. 2013. Better word representations with recursive neural networks for morphology. In Proceedings of the 17th Conference on Computational Natural Language Learning. 104–113.Google Scholar
- Christopher D. Manning. 2011. Part-of-speech tagging from 97% to 100%: Is it time for some linguistics? In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 171–189. Google Scholar
Digital Library
- Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Comput. Ling. 19, 2 (1993), 313--330. Google Scholar
Digital Library
- Giuseppe Marra, Andrea Zugarini, Stefano Melacci, and Marco Maggini. 2018. An unsupervised character-aware neural approach to word and context representation learning. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 126–136.Google Scholar
Cross Ref
- Innocent Jacodah Masengo. 2018. Cross-Linguistic Influence in Third Language Production among Kiswahili Learners. Ph.D. Dissertation. Makerere University, Uganda.Google Scholar
- Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 6294–6305. Google Scholar
Digital Library
- Warren S. McCulloch and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 4 (1943), 115–133.Google Scholar
Cross Ref
- Fandong Meng, Zhengdong Lu, Mingxuan Wang, Hang Li, Wenbin Jiang, and Qun Liu. 2015. Encoding source language with convolutional neural network for machine translation. arXiv preprint arXiv:1503.01838 (2015).Google Scholar
- Tomáš Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations, (ICLR’13).Google Scholar
- Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Conference of the International Speech Communication Association. Google Scholar
Digital Library
- Tomáš Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 3111–3119. Google Scholar
Digital Library
- Tomáš Mikolov, Ilya Sutskever, Anoop Deoras, Hai-Son Le, Stefan Kombrink, and Jan Cernocky. 2012. Subword Language Modeling with Neural Networks. Faculty of Information Technology, Brno University of Technology.Google Scholar
- Seonwoo Min, Byunghan Lee, and Sungroh Yoon. 2017. Deep learning in bioinformatics. Brief. Bioinf. 18, 5 (2017), 851–869.Google Scholar
- Wanjiku Ng’ang’a. 2003. Semantic analysis of Kiswahili words using the self organizing map. Nordic J. Afric. Stud. 12, 3 (2003), 407–425.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.Google Scholar
Cross Ref
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018).Google Scholar
- Edgar C. Polomé. 1967. Swahili Language Handbook. ERIC.Google Scholar
- Siyu Qiu, Qing Cui, Jiang Bian, Bin Gao, and Tie-Yan Liu. 2014. Co-learning of word representations and morpheme representations. In Proceedings of, the 25th International Conference on Computational Linguistics (COLING’14). 141–150.Google Scholar
- Cicero D. Santos and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1818–1826. Google Scholar
Digital Library
- Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sig. Proc. 45, 11 (1997), 2673–2681. Google Scholar
Digital Library
- Devashish Shankar, Sujay Narumanchi, H. A. Ananya, Pramod Kompalli, and Krishnendu Chaudhury. 2017. Deep learning based large scale visual recommendation and search for e-commerce. arXiv preprint arXiv:1703.02344 (2017).Google Scholar
- Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 373–374. Google Scholar
Digital Library
- Casper S. Shikali, Zhou Sijie, Liu Qihe, and Refuoe Mokhosi. 2019. Better word representation vectors using syllabic alphabet: A case study of Swahili. Appl. Sci. 9, 18 (2019), 3648.Google Scholar
Cross Ref
- Yi Mon Shwe Sin and Khin Mar Soe. 2019. Attention-based syllable level neural machine translation system for Myanmar to English language pair. International Journal on Natural Language Computing (IJNLC) 8, 2 (2019).Google Scholar
- Anders Søgaard. 2011. Semisupervised condensed nearest neighbor for part-of-speech tagging. In Proceedings of the 49th Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 48–52. Google Scholar
Digital Library
- Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway networks. Proceedings of the Deep Learning Workshop International Conference on Machine Learning. 1504–1515.Google Scholar
- Rupesh K. Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training very deep networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2377–2385. Google Scholar
Digital Library
- Chi Sun, Xipeng Qiu, and Xuan-Jing Huang. 2019. VCWE: Visual character-enhanced word embeddings. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2710–2719.Google Scholar
- Ilya Sutskever, James Martens, and Geoffrey E. Hinton. 2011. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1017–1024. Google Scholar
Digital Library
- Ahmet Üstün, Murathan Kurfalı, and Burcu Can. 2018. Characters or morphemes: How to represent words? In Proceedings of the 3rd Workshop on Representation Learning for NLP. 144–153.Google Scholar
Cross Ref
- Clara Vania and Adam Lopez. 2017. From characters to words to in between: Do we capture morphology? In Proceedings of the 55th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2016–2027. DOI:https://doi.org/10.18653/v1/P17-1184Google Scholar
Cross Ref
- Csaba Veres and Paul Kapustin. 2020. Enhancing subword embeddings with open N-grams. In Proceedings of the International Conference on Applications of Natural Language to Information Systems. Springer, 3–15.Google Scholar
Cross Ref
- John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Charagram: Embedding words and sentences via character n-grams. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1504–1515. DOI:https://doi.org/10.18653/v1/D16-1157Google Scholar
Cross Ref
- Huijia Wu, Jiajun Zhang, and Chengqing Zong. 2017. A dynamic window neural network for CCG supertagging. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 3337–3343. Google Scholar
Digital Library
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).Google Scholar
- Wen-tau Yih, Xiaodong He, and Christopher Meek. 2014. Semantic parsing for single-relation question answering. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics. 643–648.Google Scholar
- Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13, 3 (2018), 55–75.Google Scholar
Cross Ref
- Seunghak Yu, Nilesh Kulkarni, Haejun Lee, and Jihie Kim. 2017. Syllable-level neural language model for agglutinative language. In Proceedings of the 1st Workshop on Subword and Character Level Models in NLP. Association for Computational Linguistics, 92–96. DOI:https://doi.org/10.18653/v1/W17-4113Google Scholar
Cross Ref
- Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. In Proceedings of the International Conference on Learning Representations. 1–8.Google Scholar
- Jinman Zhao, Sidharth Mudgal, and Yingyu Liang. 2018. Generalizing word embeddings using bag of subwords. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 601–606.Google Scholar
Cross Ref
- Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. 2015. A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015).Google Scholar
- Andrea Zugarini, Stefano Melacci, and Marco Maggini. 2019. Neural poetry: Learning to generate poems using syllables. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 313–325.Google Scholar
Cross Ref
Index Terms
Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech Tagging
Recommendations
Korean Part-of-speech Tagging Based on Morpheme Generation
Two major problems of Korean part-of-speech (POS) tagging are that the word-spacing unit is not mapped one-to-one to a POS tag and that morphemes should be recovered during POS tagging. Therefore, this article proposes a novel two-step Korean POS tagger ...
A Cross-lingual Part-of-Speech Tagging for Malay Language
ICAART 2015: Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance
of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay
is experimented as the less-resourced ...
An integrated approach to chinese word segmentation and part-of-speech tagging
ICCPOL'06: Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges aheadThis paper discusses and compares various integration schemes of Chinese word segmentation and part-of-speech tagging in the framework of true-integration and pseudo-integration. A true-integration approach, named ‘the divide-and-conquer integration', ...






Comments