Abstract
Habitual switching of languages is a common behaviour among polyglots when searching for information on the Web. Studies in information retrieval (IR) and multilingual information retrieval (MLIR) suggest that part of the reason for such regular switching of languages is the topic of search. Unlike survey-based studies, this study uses query and click-through logs. It exploits the querying and results selection behaviour of Swahili MLIR system users to explore how topic of search (query) is associated with language preferences—topic-language preferences. This article is based on a carefully controlled study using Swahili-speaking Web users in Tanzania who interacted with a guided multilingual search engine. From the statistical analysis of queries and click-through logs, it was revealed that language preferences may be associated with the topics of search. The results also suggest that language preferences are not static; they vary along the course of Web search from query to results selection. In most of the topics, users either had significantly no language preference or preferred to query in Kiswahili and changed their preference to either English or no preference for language when selecting/clicking on the results. The findings of this study might provide researchers with more insights in developing better MLIR systems that support certain types of users and in certain scenarios.
- Hany M. Alsalmi. 2021. Information-seeking in multilingual digital libraries: Comparative case studies of five university students. Library Hi Tech 39, 1 (2021), 80–100. https://doi.org/10.1108/LHT-06-2019-0119Google Scholar
Cross Ref
- Anne Aula and Melanie Kellar. 2009. Multilingual search strategies. In Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA'09). ACM, Boston, MA, 3865–3870. Google Scholar
Digital Library
- Bettina Berendt and Anett Kralisch. 2009. A user-centric approach to identifying best deployment strategies for language tools: The impact of content and access language on Web user behaviour and attitudes. Info. Retriev. 12, 3 (2009), 380. Google Scholar
Digital Library
- Olivier Chapelle and Yi Chang. 2011. Yahoo! learning to rank challenge overview. In Proceedings of the Learning to Rank Challenge. JMLR, 1–24. Google Scholar
Digital Library
- Olivier Chapelle and Ya Zhang. 2009. A dynamic Bayesian network click model for web search ranking. In Proceedings of the 18th International Conference on World Wide Web. Madrid, Spain (WWW'09). Association for Computing Machinery, New York, NY, 1–10. DOI:https://doi.org/10.1145/1526709.1526711 Google Scholar
Digital Library
- Peng Chu and Anita Komlodi. 2017. TranSearch: A multilingual search user interface accommodating user interaction and preference. In Proceedings of the CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA'17). Association for Computing Machinery, New York, NY, 2466–2472. DOI:https://doi.org/10.1145/3027063.3053262 Google Scholar
Digital Library
- Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synth. Lect. Info. Concepts Retriev. Serv. 7, 3 (2015), 1–115.Google Scholar
Cross Ref
- Paul Clough and Irene Eleta. 2010. Investigating language skills and field of knowledge on multilingual information access in digital libraries. Int. J. Dig. Libr. Syst. 1, 1 (2010), 89–103. Google Scholar
Digital Library
- Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'08). Association for Computing Machinery, New York, NY, 87–94. DOI:https://doi.org/10.1145/1341531.1341545 Google Scholar
Digital Library
- Gil Domingues and Carla Teixeira Lopes. 2019. Characterizing and comparing Portuguese and English Wikipedia medicine-related articles. In Proceedings of the World Wide Web Conference (WWW'19). Association for Computing Machinery, New York, NY, 1203–1207. DOI:https://doi.org/10.1145/3308560.3316758 Google Scholar
Digital Library
- Georges E. Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). Association for Computing Machinery, New York, NY, 331–338. DOI:https://doi.org/10.1145/1390334.1390392 Google Scholar
Digital Library
- Christian Fluhr, Robert E. Frederking, Doug Oard, Akitoshi Okumura, Kai Ishikawa, and Kenji Satoh. 1999. Multilingual (or cross-lingual) information retrieval. Proceedings of the Multilingual Information Management: Current Levels and Future Abilities .10–13.Google Scholar
- Artem Grotov, Aleksandr Chuklin, Ilya Markov, Luka Stout, Finde Xumara, and Maarten de Rijke. 2015. A comparative study of click models for web search. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, Josanne Mothe, Jacques Savoy, Jaap Kamps, Karen Pinel-Sauvagnat, Gareth Jones, Eric San Juan, Linda Capellato, and Nicola Ferro (Eds.). Springer International Publishing, Cham, 78–90. Google Scholar
Digital Library
- Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael Taylor, Yi-Min Wang, and Christos Faloutsos. 2009b. Click chain model in web search. In Proceedings of the 18th International Conference on World Wide Web. Madrid, Spain (WWW'09). Association for Computing Machinery, New York, NY, 11–20. DOI:https://doi.org/10.1145/1526709.1526712 Google Scholar
Digital Library
- Fan Guo, Chao Liu, and Yi Min Wang. 2009a. Efficient multiple-click models in web search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09). Association for Computing Machinery, New York, NY, 124–131. DOI:https://doi.org/10.1145/1498759.1498818 Google Scholar
Digital Library
- Qi Guo, Ryen W. White, Yunqiao Zhang, Blake Anderson, and Susan T. Dumais. 2011. Why searchers switch: Understanding and predicting engine switching rationales. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). Association for Computing Machinery, New York, NY, 335–344. DOI:https://doi.org/10.1145/2009916.2009964 Google Scholar
Digital Library
- Bernard J. Jansen and Amanda Spink. 2006. How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Info. Process. Manage. 42, 1 (2006), 248–263. Google Scholar
Digital Library
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2017a. Accurately interpreting clickthrough data as implicit feedback. SIGIR Forum 51, 1 (Aug. 2017), 4–11. DOI:https://doi.org/10.1145/3130332.3130334 Google Scholar
Digital Library
- Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017b. Unbiased learning-to-rank with biased feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining (WSDM'17). Association for Computing Machinery, New York, NY, 781–789. DOI:https://doi.org/10.1145/3018661.3018699 Google Scholar
Digital Library
- Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2018. Neural machine translation for low-resource languages without parallel corpora. Mach. Trans. 32, 1 (2018), 167–189. Google Scholar
Digital Library
- Anett Kralisch and Thomas Mandl. 2006. Barriers to information access across languages on the internet: Network and language effects. In Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06), Vol. 3. IEEE, 54b–54b. Google Scholar
Digital Library
- Harry T. Lawless and Hildegarde Heymann. 2013. Sensory Evaluation of Food: Principles and Practices. Springer Science & Business Media.Google Scholar
- Chenjun Ling, Ben Steichen, and Alexander G. Choulos. 2018a. A comparative user study of interactive multilingual search interfaces. In Proceedings of the Conference on Human Information Interaction & Retrieval (CHIIR'18). ACM, 211–220. Google Scholar
Digital Library
- Chenjun Ling, Ben Steichen, and Alexander G. Choulos. 2018b. A comparative user study of interactive multilingual search interfaces. In Proceedings of the Conference on Human Information Interaction & Retrieval. ACM, 211–220. Google Scholar
Digital Library
- Chenjun Ling, Ben Steichen, and Silvia Figueira. 2020. Multilingual news—An investigation of consumption, querying, and search result selection behaviors. Int. J. Human–Comput. Interact. 36, 6 (2020), 516–535. DOI:https://doi.org/10.1080/10447318.2019.1662636arXiv:https://doi.org/10.1080/10447318.2019.1662636Google Scholar
Cross Ref
- Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval. Springer Science & Business Media, Berlin, Germany.Google Scholar
- Carla Teixeira Lopes and Cristina Ribeiro. 2013. Measuring the value of health query translation: An analysis by user language proficiency. J. Amer. Soc. Info. Sci. Technol. 64, 5 (2013), 951–963.Google Scholar
Cross Ref
- Ryan Lowe and Ben Steichen. 2017. Multilingual search user behaviors—Exploring multilingual querying and result selection through crowdsourcing. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (UMAP'17). ACM, 303–307. Google Scholar
Digital Library
- Jennifer Marlow, Paul Clough, Juan Cigarrán Recuero, and Javier Artiles. 2008. Exploring the effects of language skills on multilingual web search. In Advances in Information Retrieval, Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White (Eds.). Springer, Berlin, 126–137. Google Scholar
Digital Library
- Morten C. Meilgaard, B. Thomas Carr, and Gail Vance Civille. 2006. Sensory Evaluation Techniques. CRC Press, Boca Raton, FL.Google Scholar
- Willi Mueller, Thiago H. Silva, Jussara M. Almeida, and Antonio A. F. Loureiro. 2017. Gender matters! analyzing global cultural gender preferences for venues using social sensing. EPJ Data Sci. 6, 1 (2017), 5.Google Scholar
Cross Ref
- Peggy Nzomo, Isola Ajiferuke, Liwen Vaughan, and Pamela McKenzie. 2016. Multilingual information retrieval & use: Perceptions and practices amongst bi/multilingual academic users. J. Acad. Librarian.ship 42, 5 (2016), 495–502.Google Scholar
Cross Ref
- Peggy Nzomo, Liwen Vaughan, Isola Ajiferuke, and Pam McKenzie. 2019. Multilingual information access (MLIA) tools on Google and WorldCat: Bi/multilingual university students' experience and perceptions. J. Library Admin. 59, 8 (2019), 831–853.Google Scholar
Cross Ref
- Carol Peters, Martin Braschler, and Paul Clough. 2012. Multilingual Information Retrieval—From Research to Practice. Springer Science & Business Media, Berlin.Google Scholar
- Razieh Rahimi, Azadeh Shakery, and Irwin King. 2015. Multilingual information retrieval in the language modeling framework. Info. Retriev. J. 18, 3 (2015), 246–281. Google Scholar
Digital Library
- Mark Sanderson. 2008. Ambiguous queries: Test collections need more sense. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 499–506. Google Scholar
Digital Library
- Li Si, Qiuyu Pan, and Xiaozhe Zhuang. 2017. An empirical analysis of user behaviour on multilingual information retrieval. Electron. Library 35, 3 (2017), 410–426. Google Scholar
Digital Library
- Ben Steichen and Luanne Freund. 2015. Supporting the modern polyglot. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI'15). ACM, 3483–3492. Google Scholar
Digital Library
- Ben Steichen, M. Rami Ghorab, Alexander O'Connor, Séamus Lawless, and Vincent Wade. 2014. Towards personalized multilingual information access-exploring the browsing and search behavior of multilingual users. In Proceedings of the International Conference on User Modeling, Adaptation, and Personalization. Springer, Berlin, 435–446.Google Scholar
Cross Ref
- Joseph P. Telemala and Hussein Suleman. 2018. Exploring information needs and search behaviour of Swahili speakers in Tanzania. In Maturity and Innovation in Digital Libraries, Milena Dobreva, Annika Hinze, and Maja Žumer (Eds.). Springer International Publishing, Cham, 185–190.Google Scholar
- Evgenia Vassilakaki, Emmanouel Garoufallou, Frances Johnson, and R. J. Hartley. 2015. An exploration of users' needs for multilingual information retrieval and access. In Metadata and Semantics Research, P. Gaitanou, E. Garoufallou, and R. Hartley (Eds.). Vol. 544. Springer, Cham, 249–258.Google Scholar
- Markel Vigo, Nicolas Matentzoglu, Caroline Jay, and Robert Stevens. 2019. Comparing ontology authoring workflows with Protégé: In the laboratory, in the tutorial and in the “wild.” J. Web Semant. 57 (2019), 100473.Google Scholar
Cross Ref
- Jieyu Wang and Anita Komlodi. 2018. Switching languages in online searching : A qualitative study of web users' code -switching search behaviors *. In Proceedings of the ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR'18). ACM, New Brunswick, NJ, 201–208. Google Scholar
Digital Library
- Jieyu Wang, Anita Komlodi, and Omar Ka. 2018. Understanding multilingual web users' code-switching behaviors in online searching. Proc. Assoc. Info. Sci. Technol. 55, 1 (2018), 534–543.Google Scholar
Cross Ref
- Anping Wu and Jiangping Chen. 2019. Sustaining multilinguality: Case studies of two American multilingual digital libraries. In Proceedings of theiConference. 1–5.Google Scholar
Cross Ref
- Yusuke Yamamoto and Takehiro Yamamoto. 2020. Personalization finder: A search interface for identifying and self-controlling web search personalization. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL'20). Association for Computing Machinery, New York, NY, 37–46. DOI:https://doi.org/10.1145/3383583.3398519 Google Scholar
Digital Library
- Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, and Kevin Duh. 2019. Robust document representations for cross-lingual information retrieval in low-resource settings. In Proceedings of the 17th Machine Translation Summit. European Association for Machine Translation, 12–20. https://www.aclweb.org/anthology/W19-6602.Google Scholar
- Rabih Zbib, Lingjun Zhao, Damianos Karakos, William Hartmann, Jay DeYoung, Zhongqiang Huang, Zhuolin Jiang, Noah Rivkin, Le Zhang, Richard Schwartz, and John Makhoul. 2019. Neural-network lexical translation for cross-lingual IR from text and speech. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19). Association for Computing Machinery, New York, NY, 645–654. DOI:https://doi.org/10.1145/3331184.3331222 Google Scholar
Digital Library
- Rui Zhang, Caitlin Westerfield, Sungrok Shim, Garrett Bingham, Alexander Fabbri, William Hu, Neha Verma, and Dragomir Radev. 2019. Improving low-resource cross-lingual document retrieval by reranking with deep bilingual representations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3173–3179. DOI:https://doi.org/10.18653/v1/P19-1306Google Scholar
Cross Ref
Index Terms
Exploring Topic-language Preferences in Multilingual Swahili Information Retrieval in Tanzania
Recommendations
Language-Preference-Based Re-ranking for Multilingual Swahili Information Retrieval
ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information RetrievalApproaches for merging results in multilingual information retrieval (MLIR) systems strive for topical relevance, regardless of whether they are heuristic or machine learning (ML)-based. However, to build on topical relevance, current MLIR results ...
Exploring the sawa corpus: collection and deployment of a parallel corpus English--Swahili
Research in machine translation and corpus annotation has greatly benefited from the increasing availability of word-aligned parallel corpora. This paper presents ongoing research on the development and application of the sawa corpus, a two-million-word ...
Multilingual Topic Models for Bilingual Dictionary Extraction
A machine-readable bilingual dictionary plays a crucial role in many natural language processing tasks, such as statistical machine translation and cross-language information retrieval. In this article, we propose a framework for extracting a bilingual ...






Comments