skip to main content
research-article

Identifying and Analyzing Different Aspects of English-Hindi Code-Switching in Twitter

Published:23 July 2019Publication History
Skip Abstract Section

Abstract

Code-switching or the juxtaposition of linguistic units from two or more languages in a single utterance, has, in recent times, become very common in text, thanks to social media and other computer mediated forms of communication. In this exploratory study of English-Hindi code-switching on Twitter, we automatically create a large corpus of code-switched tweets and devise techniques to identify the relationship between successive components in a code-switched tweet. More specifically, we identify pragmatic functions such as narrative-evaluative, negative reinforcement, translation or semantically equivalent statements, and so on characterizing the relation between successive components. We analyze the difference/similarity between switching patterns in code-switched and monolingual multi-component tweets. We observe strong dominance of narrative-evaluative (non-opinion to opinion or vice versa) switching in case of both code-switched and monolingual multi-component tweets in around 40% of cases. Polarity switching appears to be a prevalent switching phenomenon (10%) specifically in code-switched tweets (three to four times higher than monolingual multi-component tweets) where preference of expressing negative sentiment in Hindi is approximately twice compared to English. Positive reinforcement appears to be an important pragmatic function for English multi-component tweets, whereas negative reinforcement plays a key role for Devanagari multi-component tweets. Our results also indicate that the extent and nature of code-switching also strongly depend on the topic (sports, politics, etc.) of discussion.

References

  1. Prabhat Agarwal, Ashish Sharma, Jeenu Grover, Mayank Sikka, Koustav Rudra, and Monojit Choudhury. 2017. I may talk in English but gaali toh Hindi mein hi denge: A study of English-Hindi code-switching and swearing pattern on social networks. In Proceedings of the 9th International Conference on Social Networking Workshop, Communication Systems and Networks (COMSNETS’17). IEEE, 554--557.Google ScholarGoogle ScholarCross RefCross Ref
  2. Mohamed Al-Badrashiny and Mona Diab. 2016. LILI: A simple language independent approach for language identification. In Proceedings of the 26th International Conference on Computational Linguistics. 1211--1219.Google ScholarGoogle Scholar
  3. Jannis Androutsopoulos. 2015. Networked multilingualism: Some language practices on Facebook and their implications. Int. J. Biling. 19, 2 (2015), 185--205.Google ScholarGoogle ScholarCross RefCross Ref
  4. Elayaperumal Annamalai. 2001. Managing Multilingualism in India: Political and Linguistic Manifestations, Vol. 8. SAGE Publications Pvt. Limited.Google ScholarGoogle Scholar
  5. Peter Auer. 1995. The pragmatics of code-switching: A sequential approach. In One Speaker, Two Languages. Cambridge University Press, 115--135.Google ScholarGoogle Scholar
  6. Akshat Bakliwal, Piyush Arora, and Vasudeva Varma. 2012. Hindi subjective lexicon: A lexical resource for Hindi adjective polarity classification. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). ELRA, 1189--1196.Google ScholarGoogle Scholar
  7. Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. “I am borrowing ya mixing?” An analysis of English-Hindi code mixing in Facebook. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching, Empirical Methods in Natural Language Processing (EMNLP’14). 116--126.Google ScholarGoogle ScholarCross RefCross Ref
  8. Somnath Banerjee, Sudip Kumar Naskar, Paolo Rosso, and Sivaji Bandyopadhyay. 2016. The first cross-script code-mixed question answering corpus. In Proceedings of the Workshop on Modeling, Learning and Mining for Cross/Multilinguality (MultiLingMine’16). 56--65.Google ScholarGoogle Scholar
  9. Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching, Empirical Methods in Natural Language Processing (EMNLP’14). 13--23.Google ScholarGoogle ScholarCross RefCross Ref
  10. Inma Muñoa Barredo. 1997. Pragmatic functions of code-switching among Basque-Spanish bilinguals. Retrieved on October 26 (1997), 528--541. http://ssl.webs.uvigo.es/actas1997/04/Munhoa.pdf.Google ScholarGoogle Scholar
  11. Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, and Niloy Ganguly. 2016. Functions of code-switching in tweets: An annotation scheme and some initial experiments. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). 1644--1650.Google ScholarGoogle Scholar
  12. Erman Boztepe. 2003. Issues in code-switching: Competing theories and models. Teacher’s College Columbia University Working Papers in TESOL and Applied Linguistics 3, 2.Google ScholarGoogle Scholar
  13. Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. Predicting depression via social media. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM’13). 128--137.Google ScholarGoogle Scholar
  14. Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. In Proceedings of the 51st Meeting of the Association for Computational Linguistics. Vol. 1. ACL, 250--259.Google ScholarGoogle Scholar
  15. Jean-Marc Dewaele and Wei Li. 2014. Intra-and inter-individual variation in self-reported code-switching patterns of adult multilinguals. Int. J. Multiling. 11, 2 (2014), 225--246.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jean-Marc Dewaele and Li Wei. 2014. Attitudes towards code-switching among adult mono-and multilingual language users. J. Multiling. Multicult. Dev. 35, 3 (2014), 235--251.Google ScholarGoogle ScholarCross RefCross Ref
  17. Anik Dey and Pascale Fung. 2014. A Hindi-English code-switching corpus. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 2410--2413.Google ScholarGoogle Scholar
  18. Ahmad Abdel Tawwab Sharaf Eldin. 2014. Socio linguistic study of code switching of the arabic language speakers on social networking. Int. J. Eng. Ling. 4, 6 (2014), 78.Google ScholarGoogle Scholar
  19. Andrew Finch, Lemao Liu, Xiaolin Wang, and Eiichiro Sumita. 2016. Target-bidirectional neural models for machine transliteration. In Proceedings of the 6th Named Entity Workshop. Association for Computational Linguistics, 78--82.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. A. Fishman. 1971. Sociolinguistics. Rowley, Newbury, MA.Google ScholarGoogle Scholar
  21. Björn Gambäck and Amitava Das. 2016. Comparing the level of code-switching in corpora. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16).Google ScholarGoogle Scholar
  22. Archana Garg, Vishal Gupta, and Manish Jindal. 2014. A survey of language identification techniques and applications. J. Emerg. Technol. Web Intell. 6, 4 (2014), 388--400.Google ScholarGoogle Scholar
  23. Spandana Gella, Jatin Sharma, and Kalika Bali. 2013. Query word labeling and back transliteration for indian languages: Shared task system description. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’13) Working Notes.Google ScholarGoogle Scholar
  24. François Grosjean. 1982. Life with Two Languages: An Introduction to Bilingualism. Harvard University Press, Cambridge, MA.Google ScholarGoogle Scholar
  25. John J. Gumperz. 1982. Discourse Strategies. Vol. 1. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  26. John. J. Gumprez and E. Hernández-Chávez. 1972. Bilingualism, bidialectalism and classroom interaction. In Language in Social Groups. Stanford University Press, Stanford, CA. 311--339.Google ScholarGoogle Scholar
  27. Kanika Gupta, Monojit Choudhury, and Kalika Bali. 2012. Mining Hindi-English transliteration pairs from online Hindi lyrics. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 2459--2465.Google ScholarGoogle Scholar
  28. Gualberto A Guzmán, Joseph Ricard, Jacqueline Serigos, Barbara Bullock, and Almeida Jacqueline Toribio. 2017. Moving code-switching research toward more empirically grounded methods. In Proceedings of the Workshop on Corpora in the Digital Humanities ([email protected]’17). 1--9.Google ScholarGoogle Scholar
  29. Hindisentiwordnet. 2015. Hindi SentiWordnet—Sentiment Lexicon for Hindi. Retrieved from http://www.cfilt.iitb.ac.in/resources/senti/HSWN_downloaderInfo.php.Google ScholarGoogle Scholar
  30. BBC. 2012. English or Hinglish—which will India choose? Retrieved from http://www.bbc.com/news/magazine-20500312.Google ScholarGoogle Scholar
  31. A. K. Joshi. 1985. Processing of sentences with intrasentential code switching. In Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives. Cambridge University Press, Cambridge, UK. 190--205.Google ScholarGoogle Scholar
  32. David Jurgens, Yulia Tsvetkov, and Dan Jurafsky. 2017. Incorporating dialectal variability for socially equitable language identification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vol. 2. 51--57.Google ScholarGoogle ScholarCross RefCross Ref
  33. Braj Kachru. 1978. Code-mixing as a Communicative Strategy in India. Georgetown University Press, Washington, DC. 107--124 pages.Google ScholarGoogle Scholar
  34. Mitesh M. Khapra, Ananthakrishnan Ramanathan, Anoop Kunchukuttan, Karthik Visweswariah, and Pushpak Bhattacharyya. 2014. When transliteration met crowdsourcing: An empirical study of transliteration via crowdsourcing using efficient, non-redundant and fair quality control. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 196--202.Google ScholarGoogle Scholar
  35. Vivek Kulkarni, Bryan Perozzi, and Steven Skiena. 2016. Freshman or fresher? Quantifying the geographic variation of language in online social media. In Proceedings of the 10th International AAAI Conference on Weblogs and Social Media (ICWSM’16). 615--618.Google ScholarGoogle Scholar
  36. William Labov. 1971. The Notion of System in Creole Languages. Cambridge University Press, Cambridge, UK. 447--472.Google ScholarGoogle Scholar
  37. Hanna Lantto. 2014. Code-switching, swearing and slang: The colloquial register of Basque in greater Bilbao. Int. J. Biling. 18, 6 (2014), 633--648.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jeff MacSwan. 2014. A Minimalist Approach to Intrasentential Code Switching. Routledge, Abingdon, UK.Google ScholarGoogle Scholar
  39. Sunita Malhotra. 1980. Hindi-English code-switching and language choice in urban upper-middle-class Indian families. Kansas Working Papers in Linguistics 5, 2 (1980), 39--46.Google ScholarGoogle Scholar
  40. Yael Maschler. 1991. The language games bilinguals play: Language alternation at language boundaries. Language and Communication 11, 4 (1991), 263--289.Google ScholarGoogle ScholarCross RefCross Ref
  41. Yael Maschler. 1994. Appreciation ha’araxa ’o ha’arasta? {valuing or admiration}. Negotiating Contrast in Bilingual Disagreement Talk 14, 2 (1994), 207--238.Google ScholarGoogle Scholar
  42. MicrosoftAPI 2017. Microsoft Translator Text API. Retrieved from https://www.microsoft.com/en-us/translator/business/translator-api/.Google ScholarGoogle Scholar
  43. Saif Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. 2013. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In Proceedings of the 7th International Workshop on Semantic Evaluation Exercises (SemEval’13). Retrieved from https://arxiv.org/abs/1308.6242.Google ScholarGoogle Scholar
  44. Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29, 3 (2013), 436--465.Google ScholarGoogle ScholarCross RefCross Ref
  45. Dong Nguyen and Leonie Cornips. 2016. Automatic detection of intra-word code-switching. In Proceedings of the 14th Workshop on Computational Research in Phonetics, Phonology, and Morphology (SIGMORPHON’16)). ACL, 82--86.Google ScholarGoogle ScholarCross RefCross Ref
  46. Miwa Nishimura. 1995. A functional analysis of Japanese/English code-switching. J. Pragmatics 23, 2 (1995), 157--181.Google ScholarGoogle ScholarCross RefCross Ref
  47. Umangi Oza, Rashmi Prasad, Sudheer Kolachina, Dipti Misra Sharma, and Aravind Joshi. 2009. The Hindi discourse relation bank. In Proceedings of the 3rd Linguistic Annotation Workshop (LAW’09). ACL, 158--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’02). 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Abhipsa Basu, Prithwish Mukherjee, Monojit Choudhury, and Animesh Mukherjee. 2017. All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 2264--2274.Google ScholarGoogle ScholarCross RefCross Ref
  50. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12 (2011), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Carol W. Pfaff. 1979. Constraints on language mixing: Intrasentential code-switching and borrowing in Spanish/English. Language 55, 2 (1979), 291--318.Google ScholarGoogle ScholarCross RefCross Ref
  52. Simone Paolo Ponzetto and Michael Strube. 2007. Knowledge derived from Wikipedia for computing semantic relatedness. J. Artific. Intell. Res. 30 (2007), 181--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Shana Poplack. 1980. Sometimes I’ll start a sentence in Spanish y termino en español: Toward a typology of code-switching. Linguistics 18, 7--8 (1980), 581--618.Google ScholarGoogle ScholarCross RefCross Ref
  54. Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind K. Joshi, and Bonnie L. Webber. 2008. The Penn Discourse TreeBank 2.0. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC’08).Google ScholarGoogle Scholar
  55. Ashequl Qadir. 2009. Detecting opinion sentences specific to product features in customer reviews using typed dependency relations. In Proceedings of the Workshop on Events in Emerging Text Types (eETTs’09). Association for Computational Linguistics, 38--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Khyathi Chandu Raghavi, Manoj Kumar Chinnakotla, and Manish Shrivastava. 2015. “Answer ka type kya he?”: Learning to classify questions in code-mixed language. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). ACM, 853--858. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Shruti Rijhwani, Royal Sequiera, Monojit Choudhury, Kalika Bali, and Chandra Shekhar Maddila. 2017. Estimating code-switching on Twitter with a novel generalized word-level language detection technique. In Proceedings of the 55th Meeting of the Association for Computational Linguistics, Vol. 1. 1971--1982.Google ScholarGoogle ScholarCross RefCross Ref
  58. Alan Ritter, Sam Clark, Mausam Etzioni, and Oren Etzioni. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1524--1534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Glívia Angélica Rodrigues Barbosa, Ismael S. Silva, Mohammed Zaki, Wagner Meira, Jr., Raquel O. Prates, and Adriano Veloso. 2012. Characterizing the effectiveness of Twitter hashtags to detect and track online population sentiment. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI EA’12). 2621--2626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Suzanne Romaine. 1989. Bilingualism. Blackwell, Oxford, UK.Google ScholarGoogle Scholar
  61. Koustav Rudra, Niloy Ganguly, Pawan Goyal, and Saptarshi Ghosh. 2018. Extracting and summarizing situational information from the Twitter social media during disasters. ACM Trans. Web 12, 3, Article 17 (2018), 35 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Koustav Rudra, Subham Ghosh, Niloy Ganguly, Pawan Goyal, and Saptarshi Ghosh. 2015. Extracting situational information from microblogs during disaster events: A classification-summarization approach. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM’15). ACM, 583--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Koustav Rudra, Shruti Rijhwani, Rafiya Begum, Kalika Bali, Monojit Choudhury, and Niloy Ganguly. 2016. Understanding language preference for expression of opinion and sentiment: What do Hindi-English speakers do on Twitter? In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 1131--1141.Google ScholarGoogle ScholarCross RefCross Ref
  64. Rosaura Sánchez. 1983. Chicano Discourse: Socio-historic Perspectives. Arte Público Press, University of Houston, Houston, TX.Google ScholarGoogle Scholar
  65. Carol Scotton and William Ury. 1977. Bilingual strategies: The social functions of codeswitching. Int. J. Soc. Lang. 13 (1977), 5--20.Google ScholarGoogle Scholar
  66. Royal Sequiera, Monojit Choudhury, Parth Gupta, Paolo Rosso, Shubham Kumar, Somnath Banerjee, Sudip Kumar Naskar, Sivaji Bandyopadhyay, Gokul Chittaranjan, Amitava Das, and Kunal Chakma. 2015. Overview of FIRE-2015 shared task on mixed script information retrieval. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15). 21--27.Google ScholarGoogle Scholar
  67. Anug Si. 2011. A diachronic investigation of Hindi-English code-switching, using Bollywood film scripts. Int. J. Biling. 15, 4 (2011), 388--407.Google ScholarGoogle ScholarCross RefCross Ref
  68. Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Gohneim, Abdelati Hawwari, Fahad AlGhamdi, Julia Hirschberg, Alison Chang, Pascale Fung. 2014. Overview for the first shared task on language identification in code-switched data. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching, Empirical Methods in Natural Language Processing (EMNLP’14). 62--72.Google ScholarGoogle ScholarCross RefCross Ref
  69. Thamar Solorio and Yang Liu. 2008. Learning to predict code-switching points. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 973--981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Thamar Solorio and Yang Liu. 2008. Part-of-speech tagging for English-Spanish code-switched text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). 1051--1060. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Nagesh Bhattu Sristy, N. Satya Krishna, B. Shiva Krishna, and Vadlamani Ravi. 2017. Language identification in mixed script. In Proceedings of the 9th Forum for Information Retrieval Evaluation (FIRE’17). 14--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Simo Tchokni, D. O. Séaghdha, and Daniele Quercia. 2014. Emoticons and phrases: Status symbols in social media. In Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM’14). 485--494.Google ScholarGoogle Scholar
  73. Catriona Tullo and James Hurford. 2003. Modelling Zipfian distributions in language. In Proceedings of the 15th European Summer School on Logic Language and Information (ESSLLI’03). 62--75.Google ScholarGoogle Scholar
  74. Twitter-language-api 2015. GET help/languages | Twitter Developers. Retrieved from https://dev.twitter.com/rest/reference/get/help/languages.Google ScholarGoogle Scholar
  75. Twitter-search-api 2015. GET search/tweets | Twitter Developers. Retrieved from https://dev.twitter.com/rest/reference/get/search/tweets.Google ScholarGoogle Scholar
  76. Svitlana Volkova, Theresa Wilson, and David Yarowsky. 2013. Exploring sentiment in social media: Bootstrapping subjectivity clues from multilingual Twitter streams. In Proceedings of the 51st Meeting of the Association for Computational Linguistics, Vol. 2. 505--510.Google ScholarGoogle Scholar
  77. Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. 2014. POS tagging of English-Hindi code-mixed social media content. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 974--979.Google ScholarGoogle ScholarCross RefCross Ref
  78. Meng Xuan Xia and Jackie Chi Kit Cheung. 2016. Accurate Pinyin-English codeswitched language identification. In Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP’16). ACL, 71--79.Google ScholarGoogle ScholarCross RefCross Ref
  79. Andrew Yates, Arman Cohan, and Nazli Goharian. 2017. Depression and self-harm risk assessment in online forums. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17). ACL, 2968--2978.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Identifying and Analyzing Different Aspects of English-Hindi Code-Switching in Twitter

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format .

                View HTML Format
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!