Abstract
User-generated text in social media communication (SMC) is mainly characterized by non-standard form. It may contain code switching (CS) text, a widespread phenomenon in SMC, in addition to noisy elements used, especially in written conversations (use of abbreviations, symbols, emoticons) or misspelled words. All of these factors constitute a wall in front of text mining applications. Common text mining tools are dedicated to standard use of standard languages but cannot deal with other forms, especially written text in social media. To overcome these problems, in this work we present our solution for the normalization of non-standard use of standard and non-standard languages (dialects) in SMC text with the use of existent resources and tools. The main processing in our solution consists of CS normalization from multiple to one language by the use of a machine translation--like approach. This processing relies on a linguistic approach of CS, which aims at identifying automatically the translation source and target languages (without human intervention). The remaining processing operations concern the normalization of SMC special expressions and spelling correction of out-of-vocabulary words. To preserve the coded-switched sentence meaning across translation, we adopt a knowledge-based approach for word sense translation disambiguation reinforced with a multi-lingual vertical context. All of these processes are embedded in what we refer to as the machine normalization system. Our solution can be used as a front-end of text mining processing, enabling the analysis of SMC noisy text. The conducted experiments show that our system performs better than considered baselines.
- Eneko Agirre, De Lacalle, and Aitor Soroa. 2014. Random walks for knowledge-based word sense disambiguation. Computational Linguistics 40, 1 (2014), 57--84. DOI:https://doi.org/10.1162/COLIGoogle Scholar
Digital Library
- Eneko Agirre and Aitor Soroa. 2009. Personalizing PageRank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. 33--41. DOI:https://doi.org/10.3115/1609067.1609070Google Scholar
Digital Library
- Tiago A. Almeida, Tiago P. Silva, Igor Santos, and José M. Gómez Hidalgo. 2016. Text normalization and semantic indexing to enhance instant messaging and SMS spam filtering. Knowledge-Based Systems 108 (2016), 25--32. DOI:https://doi.org/10.1016/j.knosys.2016.05.001Google Scholar
Digital Library
- Alexandra Antonova and Alexey Misyurev. 2014. Improving the precision of automatically constructed human-oriented translation dictionaries. In Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra’14). 58--66.Google Scholar
Cross Ref
- Marianna Apidianaki, Guillaume Wisniewski, Artem Sokolov, Aurelien Max, and Francois Yvon. 2012. WSD for n-best reranking and local language modeling in SMT. In Proceedings of the 6th Workshop on Syntax, Semantics, and Structure in Statistical Translation (SSST-6’12). 1--9. Retrieved from http://www.aclweb.org/anthology-new/W/W12/W12-4201.pdf.Google Scholar
- Mohammad Arshi Saloot, Norisma Idris, Liyana Shuib, Ram Gopal Raj, and Aiti Aw. 2015. Toward tweets normalization using maximum entropy. In Proceedings of the ACL 2015 Workshop Workshop on Noisy User-Generated Text. 19--27. DOI:https://doi.org/10.18653/v1/W15-4303Google Scholar
- Timothy Baldwin. 2017. Language Identification in the Wild. Retrieved February 24, 2020 from https://people.eng.unimelb.edu.au/tbaldwin/pubs/mlp2017-langid.pdf.Google Scholar
- Satanjeev Banerjee and Ted Pedersen. 2002. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the 4th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING’02).Google Scholar
Cross Ref
- Satanjeev Banerjee and Ted Pedersen. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). 805--810.Google Scholar
Digital Library
- Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 21--31. DOI:https://doi.org/10.13140/2.1.3385.6967Google Scholar
Cross Ref
- Pierpaolo Basile, Annalina Caputo, and Giovanni Semeraro. 2014. An enhanced lesk word sense disambiguation algorithm through a distributional semantic model. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING’14). 1591--1600.Google Scholar
- Arianna Bisazza and Marcello Federico. 2016. A survey of word reordering in statistical machine translation: Computational models and language phenomena. Computational Linguistics 42, 2 (2016), 163--205. DOI:https://doi.org/10.1162/COLIGoogle Scholar
Digital Library
- Louis Patrick Boumans. 1998. The Syntax of Codeswitching Analysing Moroccan Arabic/Dutch Conversations. Tilburg University Press, the Netherlands.Google Scholar
- Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, (1993), 263--311.Google Scholar
Digital Library
- Marine Carpuat and Dekai Wu. 2007. Improving statistical machine translation using word sense disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 61--72. DOI:https://doi.org/10.3115/1219840.1219888Google Scholar
- Özlem Çetinoğlu, Sarah Schulz, and Ngoc Thang Vu. 2016. Challenges of computational processing of code-switching. In Proceedings of the 2nd Workshop on Computational Approaches to Code Switching. 1--11. DOI:https://doi.org/10.18653/v1/W16-5801Google Scholar
Cross Ref
- Ys Chan, Ht Ng, and David Chiang. 2007. Word sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 33--40.Google Scholar
- Percy Cheung and Pascale Fung. 2005. Translation disambiguation in mixed language queries. Machine Translation 18, 4 (2005), 251--273. DOI:https://doi.org/10.1007/s10590-004-7692-5Google Scholar
Digital Library
- Antonio M. Corbí-Bellot, Mikel L. Forcada, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, et al. 2005. An open-source shallow-transfer machine translation engine for the romance languages of Spain. In EAMT Conference Proceedings. 79--86.Google Scholar
- Marta R. Costa-Jussà and Jordi Centelles. 2015. Description of the Chinese-to-Spanish rule-based machine translation system developed using a hybrid combination of human annotation and statistical techniques. ACM Transactions on Asian and Low-Resource Language Information Processing 15, 1 (2015), 1--13.Google Scholar
Digital Library
- Marta R. Costa-Jussà and José A. R. Fonollosa. 2015. Latest trends in hybrid machine translation and its applications. Computer Speech 8 Language 32, 1 (2015), 3--10. DOI:https://doi.org/10.1016/j.csl.2014.11.001Google Scholar
- Josep Maria Crego, Joshua Johanson, and Jean Senellart. 2014. SYSTRAN RBMT engine: Hybridization experiments. In Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra’14).Google Scholar
- Fred J. Damerau. 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 3 (1964), 171--176. DOI:https://doi.org/10.1145/363958.363994Google Scholar
Digital Library
- Amitava Das and Björn Gambäck. 2013. Code-mixing in social media text the last language identification frontier? Traitement Automatique des Langues 54, 3 (2013), 41--64.Google Scholar
- Mrinal Dhar. 2018. Enabling code-mixed translation: Parallel corpus creation and MT augmentation approach. In Proceedings of the 1st Workshop on Linguistic Resources for Natural Language Processing. 131--140.Google Scholar
- L. E. Dostert. 1959. Approaches to the reduction of ambiguity in machine translation. Journal of the SMPTE 68, 4 (1959), 234--235.Google Scholar
Cross Ref
- Heba Elfardy and Mona Diab. 2012. Token level identification of linguistic code switching. In Proceedings of COLING 2012: Posters. 287--296.Google Scholar
- Atefeh Farzindar, Diana Inkpen, Graeme Hirst (Eds.). 2017. Natural Language Processing for Social Media (2nd ed.). Morgan 8 Claypool.Google Scholar
- C. Fellbaum. 1988. WordNet: An electronic lexical database. MIT Press, Cambridge, MA.Google Scholar
- Radu Florian, Silviu Cucerzan, Charles Schafer, and David Yarowsky. 2002. Combining classifiers for word sense disambiguation. Natural Language Engineering 8, 4 (2002), 327--341. DOI:https://doi.org/10.1017/S1351324902002978Google Scholar
Digital Library
- Mikel L. Forcada, Felipe Sánchez-Martínez, Gema Ramirez-Sánchez, and Francis M. Tyers. 2011. Apertium: A free/open-source platform for rule-based machine translation. Machine Translation 25, 2 (2011), 127--144. DOI:https://doi.org/10.1007/s10590-011-9090-0Google Scholar
Digital Library
- Pascale Fung, Liu Xiaohu, and Cheung Chi Shun. 1999. Mixed language query disambiguation. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 333--340. DOI:https://doi.org/10.3115/1034678.1034732Google Scholar
Digital Library
- William A. Gale, Kenneth W. Church, David Yarowsky, and Murray Hill Nj. 1992. One sense per discourse. In Proceedings of the Workshop on Speech and Natural Language. 233--237.Google Scholar
Digital Library
- Dirk Goldhahn, Thomas Eckart, and Uwe Quasthoff. 2012. Building large monolingual dictionaries at the Leipzig Corpora Collection: From 100 to 200 languages. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 759--765.Google Scholar
- Maarten Van Gompel and Antal Van Den Bosch. 2014. Translation assistance by translation of L1 fragments in an L2 context. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 871--880.Google Scholar
Cross Ref
- Josiane F. Hamers and Michel Blanc. 1983. Bilingualité et Bilinguisme, P. Mardaga (Ed.). Psychologie et Sciences Humaines. P. Mardaga, Bruxelles, Belgium.Google Scholar
- Bo Han, Paul Cook, and Timothy Baldwin. 2012. Automatically constructing a normalisation dictionary for microblogs. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 421--432.Google Scholar
Digital Library
- Einar Haugen. 1950. The analysis of linguistic borrowing. Language (Baltimore) 26, 2 (1950), 210--231.Google Scholar
Cross Ref
- Kenneth Heafield, Ivan Pouzyrevsky, and Jonathan H. Clark. 2013. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 690--696.Google Scholar
- Amal Htait, Sébastien Fournier, and Patrice Bellot. 2018. Unsupervised creation of normalization dictionaries for micro-blogs in Arabic, French and English. Computacion y Sistemas 22, 3 (2018), 729--737. DOI:https://doi.org/10.13053/cys-22-3-3034Google Scholar
- W. John Hutchins. 1986. Machine Translation: Past, Present, Future. Ellis Horwood, Chichester, UK.Google Scholar
Digital Library
- Nancy Ide and Jean Véronis. 1998. Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics 24, 1 (1998), 1--40. DOI:https://doi.org/10.1016/j.csl.2004.05.005Google Scholar
Digital Library
- Hamid Jaafar. 2012. Le nom et l'adjectif dans l'arabe marocain: Etude lexicologique. Ph.D. Dissertation. University Sidi Mohammed Ben Abbdellah.Google Scholar
- Jay J. Jiang and David W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the 10th Research on Computational Linguistics International Conference. 19--33. DOI:https://doi.org/10.1.1.269.3598Google Scholar
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, et al. 2017. Google's multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339--351.Google Scholar
Cross Ref
- Aravind K. Joshi. 1985. Processing of sentences with intrasentential code switching. In Natural Language Parsing, D. R. Dowty, L. Karttunen, and A. M. Zwicky (Eds.). Cambridge University Press, 190--205.Google Scholar
- Max Kaufmann and J. Kalita. 2010. Syntactic normalization of Twitter messages. In Proceedings of the International Conference on Natural Language Processing. 1--7.Google Scholar
- Adam Kilgarriff and Joseph Rosenzweig. 2000. English SENSEVAL: Report and results. In Proceedings of the 2nd Conference on Language Resources and Evaluation. 1239--1244. DOI:https://doi.org/10.1023/A:1002693207386Google Scholar
- Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the Machine Translation Summit. 79--86.Google Scholar
- Philipp Koehn, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Christine Moran, Chris Dyer, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL 2007 Demo and Poster Sessions. 177--180.Google Scholar
Cross Ref
- Claudia Leacock and Martin Chodorow. 1998. Combining local context and WordNet similarity for word sense identification. In WordNet: An Electronic Lexical Database. WordNet An Electron. Lex. database. MIT Press, Cambridge, MA, 265--283. DOI:https://doi.org/citeulike-article-id:1259480Google Scholar
- Yoong Keok Lee and Hwee Tou Ng. 2002. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. 41--48. DOI:https://doi.org/10.3115/1118693.1118699Google Scholar
Digital Library
- Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th International Conference on Systems Documentation (SIGDOC’86). 24--26.Google Scholar
Digital Library
- Sheng Li. 2015. Lifetime achievement award translating today into tomorrow. Computational Linguistics 41, 4 (2015), 4943. DOI:https://doi.org/10.1162/COLIGoogle Scholar
Digital Library
- Dekang Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning (ICML’98). 296--304. DOI:https://doi.org/10.1.1.55.1832Google Scholar
Digital Library
- Wang Ling, Guang Xiang, Chris Dyer, Alan Black, and Isabel Trancoso. 2013. Microblogs as parallel corpora. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 176--186.Google Scholar
- Veronica Lopez Ludeña, Rubén San Segundo, Juan Manuel Montero, Roberto Barra Chicote, and Jaime Lorenzo. 2012. Architecture for text normalization using statistical machine translation techniques. In Proceedings of the IberSPEECH 2012 Workshop. 112--122. DOI:https://doi.org/10.1016/j.jacc.2018.03.023Google Scholar
- Massimo Lusetti, Tatyana Ruzsics, Anne Göhring, Tanja Samardžic, and Elisabeth Stark. 2018. Encoder-decoder methods for text normalization. In Proceedings of the 5th Workshop on NLP for Similar Languages, Varieties, and Dialects. 18--28.Google Scholar
- Esmé Manandise and Claudia Gdaniec. 2011. Morphology to the rescue redux: Resolving borrowings and code-mixing in machine translation. Communications in Computer and Information Sciences 100 (2011), 86--97. DOI:https://doi.org/10.1007/978-3-642-23138-4_6Google Scholar
Cross Ref
- Diana McCarthy, Rob Koeling, and John Carroll. 2007. Unsupervised acquisition of predominant word senses. Computational Linguistics 33, 4 (2007), 553--590.Google Scholar
Digital Library
- Paul McNamee. 2005. Language identification: A solved problem suitable for undergraduate instruction. Journal of Computer Sciences in Colleges 20, 3 (2005), 94--101.Google Scholar
Digital Library
- Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In Proceedings of the 8th Conference on Computational Natural Language Learning (CoNLL’04). 182--183.Google Scholar
- Pieter Muysken. 1995. Cross-disciplinary perspectives on code-switching. In One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching, L. Milroy and P. Muysken (Eds.). Cambrige University Press, Cambridge, UK, 177--198.Google Scholar
- Carol Myers-Scotton. 1995. A lexically based model of code-switching. In One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching, L. Milroy and P. Muysken (Eds.). Cambridge University Press, Cambridge, UK, 233--256. DOI:https://doi.org/10.1017/CBO9780511620867.011Google Scholar
- Carol Myers-Scotton. 1997. Duelling Languages: Grammatical Structure in Codeswitching. Clarendon Press, Oxford, UK.Google Scholar
- Carol Myers-Scotton and J. Jake. 2001. Explaining aspects of codeswitching and their implications. In One Mind, Two Languages: Bilingual Language Processing, J. Nicol (Ed.). Blackwell, Oxford, UK, 84--116.Google Scholar
- Roberto Navigli. 2009. Word sense disambiguation: A survey. ACM Computing Surveys 41, 2 (2009), 69. DOI:https://doi.org/10.1145/1459352.1459355Google Scholar
Digital Library
- Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193 (2012), 217--250. DOI:https://doi.org/10.1016/j.artint.2012.07.001Google Scholar
Digital Library
- Shana Poplack. 1980. Sometimes I'll start a sentence in Spanish y termino en ESPAÑOL: Toward a typology of code-switching. Linguistics 18, 7--8 (1980), 581--618.Google Scholar
Cross Ref
- Philip Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI’95). 448--453.Google Scholar
Digital Library
- Alex Rudnick, Annette Rios, and Michael Grasser. 2014. Enhancing a rule-based MT system with cross-lingual WSD. In Proceedings of the SaLTMiLWorkshop on Free/Open-Source Language Resources for the Machine Translation of Less-Resourced Languages (LREC’14). 31--36.Google Scholar
- Yves Scherrer and Nikola Ljubešic. 2016. Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation. In Proceedings of the 13th Conference on Natural Language Processing (KONVENS’16). 248--255.Google Scholar
- Kiril Simov, Petya Osenova, and Alex Popov. 2016. Towards semantic-based hybrid machine translation between Bulgarian and English. In Proceedings of the 2nd Workshop on Semantics-Driven Machine Translation. 22--26. DOI:https://doi.org/10.18653/v1/W16-0604Google Scholar
Cross Ref
- R. Mahesh K. Sinha and Anil Thakur. 2005. Machine translation of bi-lingual Hindi-English (Hinglish) text. In Proceedings of the 10th Machine Translation Summit. 149--156.Google Scholar
- Thamar Solorio and Yang Liu. 2008. Part-of-speech tagging for English-Spanish code-switched text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 1051. DOI:https://doi.org/10.3115/1613715.1613852Google Scholar
Digital Library
- Francis M. Tyers, Felipe Sánchez-Martinez, and Mikel L. Forcada. 2012. Flexible finite-state lexical selection for rule-based machine translation. Proceedings of the 16th International Conference of the European Association for Machine Translation. 213--220.Google Scholar
- Francis M. Tyers, Felipe Sánchez-Martínez, Sergio Ortiz-Rojas, and Mikel L. Forcada. 2010. Free/open-source resources in the Apertium platform for machine translation research and development. Prague Bulletin of Mathematical Linguistics 93 (2010), 67--76. DOI:https://doi.org/10.2478/v10108-010-0015-5.PBMLGoogle Scholar
Cross Ref
- Florentina Vasilescu, Philippe Langlais, and Guy Lapalme. 2004. Evaluating variants of the Lesk approach for disambiguating words. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04).Google Scholar
- David Vickrey, Luke Biewald, Marc Teyssier, and Daphne Koller. 2005. Word-sense disambiguation for machine translation. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT’05). 771--778. DOI:https://doi.org/10.3115/1220575.1220672Google Scholar
Digital Library
- Clare Voss, Stephen Tratz, Jamal Laoudi, and Douglas Briesch. 2014. Finding romanized Arabic dialect in code-mixed tweets. In Proceedings of the 9th International Conference on Language Resources and Evaluation. 188--199.Google Scholar
- Li Wang, Masao Fuketa, Kazuhiro Morita, and Jun-Ichi Aoe. 2011. Context constraint disambiguation of word semantics by field association schemes. Information Processing 8 Management 47, 4 (2011) 560--574. DOI:https://doi.org/10.1016/j.ipm.2011.01.001Google Scholar
- Yorick Wilks and Mark Stevenson. 1997. The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation. Natural Language Engineering 4, 1 (1997) 135--143. DOI:https://doi.org/10.1017/S1351324998001946Google Scholar
Digital Library
- Jennifer Williams and Charlie K. Dagli. 2017. Developing ground truth for Twitter language identification of similar languages and dialects. In Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties, and Dialects. 1--6.Google Scholar
- David Yarowsky. 1993. One sense per collocation. In Proceedings of the Workshop on Human Language Technology (HLT’93). 266--271. DOI:https://doi.org/10.3115/1075671.1075731Google Scholar
Digital Library
- David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics. 189--196.Google Scholar
Digital Library
- Younes Samih. 2016. Detecting code-switching in Moroccan Arabic social media. In Proceedings of the SocialNLP workshop at IJCAI 2016. DOI:https://doi.org/10.13140/RG.2.2.18663.85928Google Scholar
- Randa Zarnoufi, Hamid Jaafar, and Mounia Abik. 2019. Language identification for user generated content in social media. In Information Systems and Technologies to Support Learning. Smart Innovation, Systems and Technologies, Vol. 111. Springer, 672--678. DOI:https://doi.org/10.1007/978-3-030-03577-8_73Google Scholar
- Wei Zhang, Robert A. J. Clark, Yongyuan Wang, and Wen Li. 2016. Unsupervised language identification based on latent Dirichlet allocation. Computer Speech and Langugage 39 (2016), 47--66. DOI:https://doi.org/10.1016/j.csl.2016.02.001Google Scholar
Digital Library
Index Terms
Machine Normalization: Bringing Social Media Text from Non-Standard to Standard Form
Recommendations
Word Sense Based Hindi-Tamil Statistical Machine Translation
Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
Neural Machine Translation Enhancements through Lexical Semantic Network
ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and SimulationIn most languages, many words have multiple senses, thus machine translation systems have to choose between several candidates representing different senses of an input word. Although neural machine translation has recently become a dominant paradigm ...
An architecture for Malay Tweet normalization
Research in natural language processing has increasingly focused on normalizing Twitter messages. Currently, while different well-defined approaches have been proposed for the English language, the problem remains far from being solved for other ...






Comments