Abstract
An approach is proposed for Chinese spelling error detection and correction, in which an inverted index list with a rescoring mechanism is used. The inverted index list is a structure for mapping from word to desired sentence, and for representing nodes in lattices constructed through character expansion (according to predefined phonologically and visually similar character sets). Pruning based on a contextual dependency confidence measure was used to markedly reduce the search space and computational complexity. Relevant mapping relations between the original input and desired input were obtained using a scoring mechanism composed of class-based language and maximum entropy correction models containing character, word, and contextual features. The proposed method was evaluated using data sets provided by SigHan 7 bakeoff. The experimental results show that the proposed method achieved acceptable performance in terms of recall rate or precision rate in error sentence detection and error location detection, and it outperformed other approaches in error location detection and correction.
- Farooq Ahmad and Grzegorz Kondrak. 2005. Learning a spelling error model from search query logs. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT’05). Association for Computational Linguistics, Stroudsburg, PA, 955--962. DOI: http://dx.doi.org/10.3115/1220575.1220695. Google Scholar
Digital Library
- Mayce Al Azawi, and Thomas M. Breuel. 2014. Context-dependent confusions rules for building error model using weighted finite state transducers for OCR post-processing. In Proceedings of the IEEE 11th IAPR International Workshop on Document Analysis Systems (DAS’14). 116--120. DOI: http://dx.doi.org/10.1109/DAS.2014.75. Google Scholar
Digital Library
- Mohamed I. Alkanhal, Mohamed A. Al-Badrashiny, Mansour M. Alghamdi and Abdulaziz O. Al-Qabbany. 2012. Automatic stochastic Arabic spelling correction with emphasis on space insertions and deletions. IEEE Trans. Audio Speech Lang. Process. 20, 7, 2111--2122. DOI: http://doi.acm.org/10.1109/TASL.2012.2197612. Google Scholar
Digital Library
- Zhuowei Bao, Benny Kimelfeld, and Yunyao Li. 2011. A graph approach to spelling correction in domain-centric search. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 1, Association for Computational Linguistics, Stroudsburg, PA, 905--914. Google Scholar
Digital Library
- Youssef Bassil and Mohammad Alwani. 2012. Post-editing error correction algorithm for speech recognition using Bing spelling suggestion. arXiv preprint arXiv:1203.5255. DOI: http://10.14569/IJACSA.2012.030217.Google Scholar
- Andrew Carlson and Ian Fette. 2007. Memory-based context-sensitive spelling correction at web scale. In Proceedings of the IEEE 6th International Conference on Machine Learning and Applications (ICMLA’07). 166--171. DOI: http://dx.doi.org/10.1109/ICMLA.2007.50. Google Scholar
Digital Library
- Richard G. Casey and Eric Lecolinet. 1996. A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18, 7, 690--706. DOI: http://dx.doi.org/10.1109/34.506792. Google Scholar
Digital Library
- Chao-Huang Chang. 1995. A new approach for automatic Chinese spelling correction. In Proceedings of Natural Language Processing Pacific Rim Symposium. 278--283.Google Scholar
- Tao-Hsing Chang, Hsueh-Chih Chen, Yuen-Hsien Tseng, and Jian-Liang Zheng. 2013. Automatic detection and correction for Chinese misspelled words using phonological and orthographic similarities. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 97--101.Google Scholar
- Kuan-Yu Chen, Hung-Shin Lee, Chung-Han Lee, Hsin-Min Wang and Hsin-Hsi Chen. 2013b. A Study of language modeling for Chinese spelling check. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 79--83.Google Scholar
- Wei-Te Chen, Su-Chu Lin, Shu-Ling Huang, You-Shan Chung, and Keh-Jiann Chen. 2010. E-HowNet and automatic construction of a lexical ontology. In Proceedings of the 23rd International Conference on Computational Linguistics. 45--48. Google Scholar
Digital Library
- Yong Zhi Chen, Shih Hung Wu, Ping Che Yang, Tsun Ku, and Gwo Dong Chen. 2011. Improve the detection of improperly used Chinese characters in students’ essays with error model. Int. J. Continuing Engin. Educ. Life Long Learning 21, 1, 103--116. DOI: http://dx.doi.org/10.1504/IJCEELL.2011.039697.Google Scholar
Cross Ref
- Hsun-wen Chiu, Jian-cheng Wu, and Jason S. Chang. 2013. Chinese spelling checker based on statistical machine translation. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 49--53.Google Scholar
- John N. Darroch and Douglas Ratcliff. 1972. Generalized iterative scaling for log-linear models. Ann. Math. Statistics 43, 5, 1470--1480. DOI: http://dx.doi.org/10.1214/aoms/1177692379.Google Scholar
Cross Ref
- Sebastian Deorowicz and Marcin G. Ciura. 2005. Correcting spelling errors by modelling their causes. Int. J. Appl. Math. Comput. Sci. 15, 2, 275--285.Google Scholar
- Ramy Eskander, Nizar Habash, Ann Bies, Seth Kulick, and Mohamed Maamouri. 2013. Automatic correction and extension of morphological annotations. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. 1--10.Google Scholar
- Fellbaum Christiane. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). MIT Press. https://mitpress.mit.edu/books/wordnet.Google Scholar
- Michael Gamon. 2010. Using mostly native data to correct errors in learners’ writing: A meta-classifier approach. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. 163--171. Google Scholar
Digital Library
- Jianfeng Gao, Xiaolong Li, Daniel Micol, Chris Quirk, and Xu Sun. 2010. A Large Scale Ranker-Based System for Search Query Spelling Correction. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 358--366. Google Scholar
Digital Library
- Andrew R. Golding and Dan Roth. 1999. A winnow-based approach to context-sensitive spelling correction. Mach. Learning 34, 1--3, 107--130. Google Scholar
Digital Library
- Google. 2010. A Java API for Google spelling check service. http://code.google.com/p/google-api-spellingjava/.Google Scholar
- Dongxu Han and Baobao Chang. 2013. A maximum entropy approach to Chinese spelling check. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 74--78.Google Scholar
- HANDICT. 2010. A source for traditional and simplified Chinese characters. http://www.zdic.net/appendix/f19.htm.Google Scholar
- Yu He and Guohong Fu. 2013. Description of HLJU Chinese spelling checker for SIGHAN Bakeoff 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 84--87.Google Scholar
- Yu-Ming Hsieh, Ming-Hong Bai, and Keh-Jiann Chen. 2013. Introduction to CKIP Chinese spelling check system for SIGHAN Bakeoff 2013 Evaluation. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 5--63.Google Scholar
- Chuen-Min Huang, Mei-Chen Wu, and Ching-Che Chang. 2008. Error detection and correction based on Chinese phonemic alphabet in Chinese text. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 16, 1, 89--105. DOI: http://dx.doi.org/10.1142/S0218488508005261.Google Scholar
Cross Ref
- Yu-Jui Huang, Ming-chin Yen, Guan-Huei Wu, Yao-Yi Wang, and Jui-Feng Yeh. 2011. Print pickets combined language models and knowledge resources in web. In ROCLING 2011 Poster Papers. Association for Computational Linguistics, Stroudsburg, PA, 297--309. Google Scholar
Digital Library
- Anton K. Ingason, Skúli B. Jóhannsson, Eiríkur Rögnvaldsson, Hrafn Loftsson, and Sigrún Helgadóttir. 2009. Context-Sensitive Spelling Correction and Rich Morphology. In Proceedings of the Nordic Conference on Computational Linguistics. 231--234.Google Scholar
- Aminul Islam and Diana Inkpen. 2009. Real-word spelling correction using google web 1t 3-grams. In Proceedings of Empirical Methods in Natural Language Processing. 1241--1249. Google Scholar
Digital Library
- Zhongye Jia, Peilu Wang, and Hai Zhao. 2013. Graph model for Chinese spell checking. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 88--92.Google Scholar
- Ying Jiang, Tong Wang, Tao Lin, Fangjie Wang, Wenting Cheng, Xiaofei Liu, Chenghui Wang, and Weijian Zhang. 2012. A rule based Chinese spelling and grammar detection system utility. In Proceedings of the International Conference on System Science and Engineering. 437--440. DOI: http://dx.doi.org/10.1109/ICSSE.2012.6257223.Google Scholar
Cross Ref
- Karen Kukich. 1992. Techniques for automatically correcting words in text. ACM Comput. Surv. 24, 4, 377--439. DOI: http://doi.acm.org/10.1145/146370.146380. Google Scholar
Digital Library
- Jianhua Li and Xiaolong Wang. 2002. Combining trigram and automatic weight distribution in Chinese spelling error correction. J. Comput. Sci. Technol. 17, 6, 915--923. DOI: http://doi.acm.org/10.1007/BF02960784. Google Scholar
Digital Library
- Mu Li, Yang Zhang, Muhua Zhu, and Ming Zhou. 2006. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 1025--1032. DOI: http://dx.doi.org/10.3115/1220175.1220304. Google Scholar
Digital Library
- Yanen Li, Huizhong Duan, and ChengXiang Zhai. 2011. CloudSpeller: Spelling correction for search queries by using a unified hidden Markov model with web-scale resources. In Proceedings of the Spelling Alteration for Web Search Workshop. 10--14.Google Scholar
- C.-L. Liu, M.-H. Lai, K.-W. Tien, Y.-H. Chuang, S.-H. Wu, and C.-Y. Lee. 2011. Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications. ACM Trans. Asian Lang. Inform. Process. 10, 2, Article 10, 39 pages. DOI: http://doi.acm.org/10.1145/1967293.1967297. Google Scholar
Digital Library
- Xiaodong Liu, Fei Cheng, Yanyan Luo, Kevin Duh, and Yuji Matsumoto. 2013. A hybrid Chinese spelling correction using language model and statistical machine translation with reranking. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 54--58.Google Scholar
- Eric Mays, Fred J. Damerau, and Robert L. Mercer. 1991. Context based spelling correction. Inf. Process. Manage. 27, 5, 517--522. DOI: http://doi.acm.org/10.1016/0306-4573(91)90066-U. Google Scholar
Digital Library
- Microsoft web n-gram services. 2010. http://research.microsoft.com/web-ngram.Google Scholar
- Roger Mitton. 2009. Ordering the suggestions of a spellchecker without using context. Nat. Lang. Eng. 15, 2, 173--192. DOI: http://doi.acm.org/10.1017/S1351324908004804. Google Scholar
Digital Library
- Roger Mitton. 2010. Fifty years of spellchecking. Writing Syst. Research, 2, 1, 1--7. DOI: http://doi.acm.org/10.1093/wsr/wsq004.Google Scholar
Cross Ref
- Michael N. Nawar and Moheb M. Ragheb. 2014. Fast and robust Arabic error correction system. In Proceedings of the Arabic Natural Language Processing Workshop. 143.Google Scholar
- Steffen Remus. 2014. Unsupervised relation extraction of in-domain data from focused crawls. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics. 11--20.Google Scholar
Cross Ref
- Fuji Ren, Hongchi Shi, and Qiang Zhou. 2001. A hybrid approach to automatic Chinese text checking and error correction. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Vol. 3, 1693--1698. DOI: http://doi.acm.org/10.1109/ICSMC.2001.973529.Google Scholar
- M. Rodphon, K. Siriboon, and B. Kruatrachue. 2001. Thai OCR error correction using token passing algorithm. In Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM’01). Vol. 2, 599--602. DOI: http://doi.acm.org/10.1109/PACRIM.2001.953704.Google Scholar
- C. Anton Rytting, David M. Zajic, Paul Rodrigues, Sarah C. Wayland, Christian Hettick, Tim Buckwalter, and Charles C. 2011. Spelling correction for dialectal Arabic dictionary lookup. ACM Trans. Asian Lang. Inform. Process. 10, 1, 3. DOI: http://doi.acm.org/10.1145/1929908.1929911. Google Scholar
Digital Library
- Seobook. 2010. Keyword typo generator. http://tools.seobook.com/spelling/keywordstypos.Google Scholar
- Khaled Shaalan, Younes Samih, Mohammed Attia, Pavel Pecina, and Josef van Genabith. 2012. Arabic word generation and modelling for spell checking. In Proceedings of the Language Resources and Evaluation Conference. 719--725.Google Scholar
- Gong Shuai, Xiong Jinhua, Zhang Cheng, and Liu Zhiyong. 2013. Identifying semantic-related search tasks in Query Log. In Web Technologies and Applications. Springer, 518--525. DOI: http://dx.doi.org/10.1007/978-3-642-37401-2_51.Google Scholar
- Cucerzan Silviu and Eric Brill. 2004. Spelling Correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of the Conference on Empirical Methods on Natural Language Processing. 293--300.Google Scholar
- Xu Sun, Jianfeng Gao, Daniel Micol, and Chris Quirk. 2010. Learning phrase-based spelling error models from clickthrough data. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). Association for Computational Linguistics, Stroudsburg, PA, 266--274. Google Scholar
Digital Library
- Chia-Hung Tai, Jia-Zen Fan, Shu-Ling Huang, and Keh-Jiann Chen. 2009. Automatic sense derivation for determinative-measure compounds under the framework of E-HowNet. Int. J. Computat. Linguistics Chinese Language Process. 14, 1.Google Scholar
- Kristina Toutanova and Robert C. Moore. 2002. Pronunciation modeling for improved spelling correction. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Stroudsburg, PA, 144--151. DOI: http://dx.doi.org/10.3115/1073083.1073109. Google Scholar
Digital Library
- Chung-Hsien Wu and Gwo-Lang Yan. 2005. Speech Act Modeling and Verification of Spontaneous Speech With Disfluency in a Spoken Dialogue System. IEEE Trans. Speech Audio Process. 13, 3, 330--344. DOI: http://dx.doi.org/10.1109/TSA.2005.845820.Google Scholar
Cross Ref
- Chun-Hung Wang, Jason S. Chang, and Jian-Cheng Wu. 2013a. Automatic Chinese confusion words extraction using conditional random fields and the web. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 64--68.Google Scholar
- Yih-Ru Wang, Yuan-Fu Liao, Yeh-Kuang Wu, and Liang-Chun Chang. 2013b. Conditional random field-based parser and language model for traditional Chinese spelling checker. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 69--73.Google Scholar
- Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee. 2013. Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 35--42.Google Scholar
- Jinhua Xiong, Qiao Zhao, Jianpeng Hou, Qianbo Wang, Yuanzhuo Wang, and Xueqi Cheng. 2014. Extended HMM and ranking models for Chinese spelling correction. In Proceedings of the 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 133--138.Google Scholar
Cross Ref
- Ting-Hao Yang, Yu-Lun Hsieh, Yu-Hsuan Chen, Michael Tsang, Cheng-Wei Shih, and Wen-Lian Hsu. 2013. Sinica-IASL Chinese spelling check system at SIGHAN-7. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 93--96.Google Scholar
- C. C. Ye. 1987. Basic Vocabulary Table of Modern Chinese Characters. Beijing Education Publishing House.Google Scholar
- Liang-Chih Yu, Chao-Hong Liu, and Chung-Hsien Wu. 2013. Candidate scoring using web-based measure for Chinese spelling error correction. In Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing. 108--112.Google Scholar
- Taha Zerrouki, Khaled Alhawaity, and Amar Balla. 2014. Autocorrection of Arabic common errors for large text corpus. In Proceedings of the Arabic Natural Language Processing Workshop. 127--143.Google Scholar
Cross Ref
- Lei Zhang, Changning Huang, Ming Zhou, and Haihua Pan. 2000. Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL’00). Association for Computational Linguistics, Stroudsburg, PA, 248--254. DOI: http://dx.doi.org/10.3115/1075218.1075250. Google Scholar
Digital Library
- Dong Zhendong and Qiang Dong. 2006. HowNet and the Computation of Meaning. World Scientific Publishing Co. Pte. Ltd. DOI: http://dx.doi.org/10.1142/9789812774675_0010. Google Scholar
Digital Library
Index Terms
Chinese Spelling Checker Based on an Inverted Index List with a Rescoring Mechanism
Recommendations
A Probabilistic Framework for Chinese Spelling Check
Special Issue on Chinese Spell CheckingChinese spelling check (CSC) is still an unsolved problem today since there are many homonymous or homomorphous characters. Recently, more and more CSC systems have been proposed. To the best of our knowledge, language modeling is one of the major ...
Correcting Chinese Spelling Errors with Word Lattice Decoding
Special Issue on Chinese Spell CheckingChinese spell checkers are more difficult to develop because of two language features: 1) there are no word boundaries, and a character may function as a word or a word morpheme; and 2) the Chinese character set contains more than ten thousand ...
A spell checker and corrector for the native South African language, South Sotho
SACLA '09: Proceedings of the 2009 Annual Conference of the Southern African Computer Lecturers' AssociationWe describe a multithreaded, spell checking and correcting software application for the Windows platform, called eSpellingPro sa Sesotho sa Leboa. The application is specifically targeted to check South Sotho typed text for misspelled words, suggest ...






Comments