skip to main content
research-article

Exploring Topic-language Preferences in Multilingual Swahili Information Retrieval in Tanzania

Authors Info & Claims
Published:12 August 2021Publication History
Skip Abstract Section

Abstract

Habitual switching of languages is a common behaviour among polyglots when searching for information on the Web. Studies in information retrieval (IR) and multilingual information retrieval (MLIR) suggest that part of the reason for such regular switching of languages is the topic of search. Unlike survey-based studies, this study uses query and click-through logs. It exploits the querying and results selection behaviour of Swahili MLIR system users to explore how topic of search (query) is associated with language preferences—topic-language preferences. This article is based on a carefully controlled study using Swahili-speaking Web users in Tanzania who interacted with a guided multilingual search engine. From the statistical analysis of queries and click-through logs, it was revealed that language preferences may be associated with the topics of search. The results also suggest that language preferences are not static; they vary along the course of Web search from query to results selection. In most of the topics, users either had significantly no language preference or preferred to query in Kiswahili and changed their preference to either English or no preference for language when selecting/clicking on the results. The findings of this study might provide researchers with more insights in developing better MLIR systems that support certain types of users and in certain scenarios.

References

  1. Hany M. Alsalmi. 2021. Information-seeking in multilingual digital libraries: Comparative case studies of five university students. Library Hi Tech 39, 1 (2021), 80–100. https://doi.org/10.1108/LHT-06-2019-0119Google ScholarGoogle ScholarCross RefCross Ref
  2. Anne Aula and Melanie Kellar. 2009. Multilingual search strategies. In Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA'09). ACM, Boston, MA, 3865–3870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bettina Berendt and Anett Kralisch. 2009. A user-centric approach to identifying best deployment strategies for language tools: The impact of content and access language on Web user behaviour and attitudes. Info. Retriev. 12, 3 (2009), 380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Olivier Chapelle and Yi Chang. 2011. Yahoo! learning to rank challenge overview. In Proceedings of the Learning to Rank Challenge. JMLR, 1–24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Olivier Chapelle and Ya Zhang. 2009. A dynamic Bayesian network click model for web search ranking. In Proceedings of the 18th International Conference on World Wide Web. Madrid, Spain (WWW'09). Association for Computing Machinery, New York, NY, 1–10. DOI:https://doi.org/10.1145/1526709.1526711 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Peng Chu and Anita Komlodi. 2017. TranSearch: A multilingual search user interface accommodating user interaction and preference. In Proceedings of the CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA'17). Association for Computing Machinery, New York, NY, 2466–2472. DOI:https://doi.org/10.1145/3027063.3053262 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synth. Lect. Info. Concepts Retriev. Serv. 7, 3 (2015), 1–115.Google ScholarGoogle ScholarCross RefCross Ref
  8. Paul Clough and Irene Eleta. 2010. Investigating language skills and field of knowledge on multilingual information access in digital libraries. Int. J. Dig. Libr. Syst. 1, 1 (2010), 89–103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'08). Association for Computing Machinery, New York, NY, 87–94. DOI:https://doi.org/10.1145/1341531.1341545 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gil Domingues and Carla Teixeira Lopes. 2019. Characterizing and comparing Portuguese and English Wikipedia medicine-related articles. In Proceedings of the World Wide Web Conference (WWW'19). Association for Computing Machinery, New York, NY, 1203–1207. DOI:https://doi.org/10.1145/3308560.3316758 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Georges E. Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). Association for Computing Machinery, New York, NY, 331–338. DOI:https://doi.org/10.1145/1390334.1390392 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Christian Fluhr, Robert E. Frederking, Doug Oard, Akitoshi Okumura, Kai Ishikawa, and Kenji Satoh. 1999. Multilingual (or cross-lingual) information retrieval. Proceedings of the Multilingual Information Management: Current Levels and Future Abilities .10–13.Google ScholarGoogle Scholar
  13. Artem Grotov, Aleksandr Chuklin, Ilya Markov, Luka Stout, Finde Xumara, and Maarten de Rijke. 2015. A comparative study of click models for web search. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, Josanne Mothe, Jacques Savoy, Jaap Kamps, Karen Pinel-Sauvagnat, Gareth Jones, Eric San Juan, Linda Capellato, and Nicola Ferro (Eds.). Springer International Publishing, Cham, 78–90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael Taylor, Yi-Min Wang, and Christos Faloutsos. 2009b. Click chain model in web search. In Proceedings of the 18th International Conference on World Wide Web. Madrid, Spain (WWW'09). Association for Computing Machinery, New York, NY, 11–20. DOI:https://doi.org/10.1145/1526709.1526712 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fan Guo, Chao Liu, and Yi Min Wang. 2009a. Efficient multiple-click models in web search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09). Association for Computing Machinery, New York, NY, 124–131. DOI:https://doi.org/10.1145/1498759.1498818 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Qi Guo, Ryen W. White, Yunqiao Zhang, Blake Anderson, and Susan T. Dumais. 2011. Why searchers switch: Understanding and predicting engine switching rationales. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). Association for Computing Machinery, New York, NY, 335–344. DOI:https://doi.org/10.1145/2009916.2009964 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bernard J. Jansen and Amanda Spink. 2006. How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Info. Process. Manage. 42, 1 (2006), 248–263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2017a. Accurately interpreting clickthrough data as implicit feedback. SIGIR Forum 51, 1 (Aug. 2017), 4–11. DOI:https://doi.org/10.1145/3130332.3130334 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017b. Unbiased learning-to-rank with biased feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining (WSDM'17). Association for Computing Machinery, New York, NY, 781–789. DOI:https://doi.org/10.1145/3018661.3018699 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2018. Neural machine translation for low-resource languages without parallel corpora. Mach. Trans. 32, 1 (2018), 167–189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Anett Kralisch and Thomas Mandl. 2006. Barriers to information access across languages on the internet: Network and language effects. In Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06), Vol. 3. IEEE, 54b–54b. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Harry T. Lawless and Hildegarde Heymann. 2013. Sensory Evaluation of Food: Principles and Practices. Springer Science & Business Media.Google ScholarGoogle Scholar
  23. Chenjun Ling, Ben Steichen, and Alexander G. Choulos. 2018a. A comparative user study of interactive multilingual search interfaces. In Proceedings of the Conference on Human Information Interaction & Retrieval (CHIIR'18). ACM, 211–220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chenjun Ling, Ben Steichen, and Alexander G. Choulos. 2018b. A comparative user study of interactive multilingual search interfaces. In Proceedings of the Conference on Human Information Interaction & Retrieval. ACM, 211–220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chenjun Ling, Ben Steichen, and Silvia Figueira. 2020. Multilingual news—An investigation of consumption, querying, and search result selection behaviors. Int. J. Human–Comput. Interact. 36, 6 (2020), 516–535. DOI:https://doi.org/10.1080/10447318.2019.1662636arXiv:https://doi.org/10.1080/10447318.2019.1662636Google ScholarGoogle ScholarCross RefCross Ref
  26. Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval. Springer Science & Business Media, Berlin, Germany.Google ScholarGoogle Scholar
  27. Carla Teixeira Lopes and Cristina Ribeiro. 2013. Measuring the value of health query translation: An analysis by user language proficiency. J. Amer. Soc. Info. Sci. Technol. 64, 5 (2013), 951–963.Google ScholarGoogle ScholarCross RefCross Ref
  28. Ryan Lowe and Ben Steichen. 2017. Multilingual search user behaviors—Exploring multilingual querying and result selection through crowdsourcing. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (UMAP'17). ACM, 303–307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jennifer Marlow, Paul Clough, Juan Cigarrán Recuero, and Javier Artiles. 2008. Exploring the effects of language skills on multilingual web search. In Advances in Information Retrieval, Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White (Eds.). Springer, Berlin, 126–137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Morten C. Meilgaard, B. Thomas Carr, and Gail Vance Civille. 2006. Sensory Evaluation Techniques. CRC Press, Boca Raton, FL.Google ScholarGoogle Scholar
  31. Willi Mueller, Thiago H. Silva, Jussara M. Almeida, and Antonio A. F. Loureiro. 2017. Gender matters! analyzing global cultural gender preferences for venues using social sensing. EPJ Data Sci. 6, 1 (2017), 5.Google ScholarGoogle ScholarCross RefCross Ref
  32. Peggy Nzomo, Isola Ajiferuke, Liwen Vaughan, and Pamela McKenzie. 2016. Multilingual information retrieval & use: Perceptions and practices amongst bi/multilingual academic users. J. Acad. Librarian.ship 42, 5 (2016), 495–502.Google ScholarGoogle ScholarCross RefCross Ref
  33. Peggy Nzomo, Liwen Vaughan, Isola Ajiferuke, and Pam McKenzie. 2019. Multilingual information access (MLIA) tools on Google and WorldCat: Bi/multilingual university students' experience and perceptions. J. Library Admin. 59, 8 (2019), 831–853.Google ScholarGoogle ScholarCross RefCross Ref
  34. Carol Peters, Martin Braschler, and Paul Clough. 2012. Multilingual Information Retrieval—From Research to Practice. Springer Science & Business Media, Berlin.Google ScholarGoogle Scholar
  35. Razieh Rahimi, Azadeh Shakery, and Irwin King. 2015. Multilingual information retrieval in the language modeling framework. Info. Retriev. J. 18, 3 (2015), 246–281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mark Sanderson. 2008. Ambiguous queries: Test collections need more sense. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 499–506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Li Si, Qiuyu Pan, and Xiaozhe Zhuang. 2017. An empirical analysis of user behaviour on multilingual information retrieval. Electron. Library 35, 3 (2017), 410–426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ben Steichen and Luanne Freund. 2015. Supporting the modern polyglot. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI'15). ACM, 3483–3492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ben Steichen, M. Rami Ghorab, Alexander O'Connor, Séamus Lawless, and Vincent Wade. 2014. Towards personalized multilingual information access-exploring the browsing and search behavior of multilingual users. In Proceedings of the International Conference on User Modeling, Adaptation, and Personalization. Springer, Berlin, 435–446.Google ScholarGoogle ScholarCross RefCross Ref
  40. Joseph P. Telemala and Hussein Suleman. 2018. Exploring information needs and search behaviour of Swahili speakers in Tanzania. In Maturity and Innovation in Digital Libraries, Milena Dobreva, Annika Hinze, and Maja Žumer (Eds.). Springer International Publishing, Cham, 185–190.Google ScholarGoogle Scholar
  41. Evgenia Vassilakaki, Emmanouel Garoufallou, Frances Johnson, and R. J. Hartley. 2015. An exploration of users' needs for multilingual information retrieval and access. In Metadata and Semantics Research, P. Gaitanou, E. Garoufallou, and R. Hartley (Eds.). Vol. 544. Springer, Cham, 249–258.Google ScholarGoogle Scholar
  42. Markel Vigo, Nicolas Matentzoglu, Caroline Jay, and Robert Stevens. 2019. Comparing ontology authoring workflows with Protégé: In the laboratory, in the tutorial and in the “wild.” J. Web Semant. 57 (2019), 100473.Google ScholarGoogle ScholarCross RefCross Ref
  43. Jieyu Wang and Anita Komlodi. 2018. Switching languages in online searching : A qualitative study of web users' code -switching search behaviors *. In Proceedings of the ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR'18). ACM, New Brunswick, NJ, 201–208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jieyu Wang, Anita Komlodi, and Omar Ka. 2018. Understanding multilingual web users' code-switching behaviors in online searching. Proc. Assoc. Info. Sci. Technol. 55, 1 (2018), 534–543.Google ScholarGoogle ScholarCross RefCross Ref
  45. Anping Wu and Jiangping Chen. 2019. Sustaining multilinguality: Case studies of two American multilingual digital libraries. In Proceedings of theiConference. 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  46. Yusuke Yamamoto and Takehiro Yamamoto. 2020. Personalization finder: A search interface for identifying and self-controlling web search personalization. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL'20). Association for Computing Machinery, New York, NY, 37–46. DOI:https://doi.org/10.1145/3383583.3398519 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, and Kevin Duh. 2019. Robust document representations for cross-lingual information retrieval in low-resource settings. In Proceedings of the 17th Machine Translation Summit. European Association for Machine Translation, 12–20. https://www.aclweb.org/anthology/W19-6602.Google ScholarGoogle Scholar
  48. Rabih Zbib, Lingjun Zhao, Damianos Karakos, William Hartmann, Jay DeYoung, Zhongqiang Huang, Zhuolin Jiang, Noah Rivkin, Le Zhang, Richard Schwartz, and John Makhoul. 2019. Neural-network lexical translation for cross-lingual IR from text and speech. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'19). Association for Computing Machinery, New York, NY, 645–654. DOI:https://doi.org/10.1145/3331184.3331222 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Rui Zhang, Caitlin Westerfield, Sungrok Shim, Garrett Bingham, Alexander Fabbri, William Hu, Neha Verma, and Dragomir Radev. 2019. Improving low-resource cross-lingual document retrieval by reranking with deep bilingual representations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3173–3179. DOI:https://doi.org/10.18653/v1/P19-1306Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Exploring Topic-language Preferences in Multilingual Swahili Information Retrieval in Tanzania

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 6
      November 2021
      439 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3476127
      Issue’s Table of Contents

      Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2021
      • Revised: 1 October 2020
      • Received: 1 June 2020
      • Accepted: 1 March 2020
      Published in tallip Volume 20, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!