skip to main content
research-article

Context-aware Urdu Information Retrieval System

Published:02 April 2023Publication History
Skip Abstract Section

Abstract

World Wide Web (WWW) is playing a vital role for sharing dynamic knowledge in every field of life. The information on web comprises a huge amount of data in different forms such as structured, semi structured, or few is totally in unstructured format. Due to huge size of information, searching from larger textual data about the specific topic or getting precise information is a challenging task. All this leads to the problem of word sense ambiguity (WSA). Urdu language-based information retrieval system using different techniques related to Web Semantic Search Engine architecture is proposed to efficiently retrieve the relevant information and solve the problem of WSA. The proposed system has average precision ratio 96% as compared to average precision ratio of 74% and 75% average precision Google for single word query. For the long text queries, our system outperforms the existing famous search engines with 92% accuracy such as Bing and Google having 16.50% and 16% accuracy, respectively. Similarly, the proposed system for single word query, the recall ratio is 32.25% as compared to 25% and 25% of Bing and Google. The results of recall ratio for long text query are improved as well, showing 6.38% as compared to 6.20% and 4.8% of Bing and Google, respectively. The results showed that the proposed system gives better and efficient results as compared to the existing systems for Urdu language.

REFERENCES

  1. [1] Sergey Brin and Page Lawrence. 1998. The anatomy of a large-scale hyper textual web search engine. Comput. Netw. ISDN Syst. 30, 1–7 (1998), 107–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Goran Glavaš and Šnajder Jan. 2014. Event graphs for information retrieval and multi-document summarization. Exp. Syst. Applic. 41, 15 (2014), 69046916.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Joe Tekli. 2016. An overview on XML semantic disambiguation from unstructured text to semi-structured data: Background, applications and ongoing challenges. IEEE Trans. Knowl. Data Eng 28, 6 (2016), 13831407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Ankita Malve and Chawan P. P. M.. 2015. A comparative study of keyword and semantic based search engine. Int. J. Innov. Res. Sci., Eng. Technol. 4, 11 (2015), 1115611161.Google ScholarGoogle Scholar
  5. [5] Jianqiang Li, Liu Chunchen, Liu Bo, Mao Rui, Wang Yongcai, Chen Shi, Yang Ji-Jiang, Pan Hui, and Wang Qing. 2015. Diversity-aware retrieval of medical records. Comput. Industr. 69 (2015), 8191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Tim Berners-Lee. 1999. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Webby its Inventor. DIANE Publishing Company.Google ScholarGoogle Scholar
  7. [7] Andrew Hardie. 2003. Developing a tagset for automated part-of-speech tagging in Urdu. In Corpus Linguistics.Google ScholarGoogle Scholar
  8. [8] Asma Naseer and Hussain Sarmad. 2009. Supervised Word Sense Disambiguation for Urdu Using Bayesian Classification. Center for Research in Urdu Language Processing, Lahore, Pakistan. https://eprints.lancs.ac.uk/id/eprint/103/.Google ScholarGoogle Scholar
  9. [9] Khan Muhammad A., Aleem Abdul, Wahab Abdul, and Nasir Khan M.. 2011. Copy detection in Urdu language documents using n-grams model. In IEEE International Conference on Computer Networks and Information Technology (ICCNIT). 263266.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Kashif Riaz. 2008. Concept search in Urdu. 2008. In 2nd PhD Workshop on Information and Knowledge Management. 3340.Google ScholarGoogle Scholar
  11. [11] Dara Becker and Riaz Kashif. 2002. A study in Urdu corpus construction. In 3rd Workshop on Asian Language Resources and International Standardization. 15.Google ScholarGoogle Scholar
  12. [12] Andrei Conicov. 2012. Indexing Linked Data. MS. Thesis. Department of Software Engineering, Univerzita Karlova, Matematicko-fyzikálnífakulta.Google ScholarGoogle Scholar
  13. [13] Candy Schwartz. 1998. Web search engines. J. American Societ. Inf. Sci. 49, 11 (1998), 973982.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Fain Daniel C. and Pedersen Jan O.. 2006. Sponsored search: A brief history. Bull. Amer. Societ. Inf. Sci. Technol. 32, 2 (2006), 1213.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Tom Seymour, Frantsvog Dean, and Kumar Satheesh. 2011. History of search engines. Int. J. Manag. Inf. Syst. 15, 4 (2011), 4758.Google ScholarGoogle Scholar
  16. [16] Jagendra Singh and Sharan Dr. Aditi. 2013. A comparative study between keyword and semantic based search engines. In International Conference on Cloud, Big Data and Trust. 1315.Google ScholarGoogle Scholar
  17. [17] Neelam Duhan, Sharma A. K., and Bhatia Komal Kumar. 2009. Page ranking algorithms: A survey. In IEEE International Advance Computing Conference. 15301537.Google ScholarGoogle Scholar
  18. [18] Kinga Schumacher, Sintek Michael, and Sauermann Leo. 2008. Combining fact and document retrieval with spreading activation for semantic desktop search. In Springer European Semantic Web Conference. 569583.Google ScholarGoogle Scholar
  19. [19] Yushi Wang, Berant Jonathan, and Liang Percy. 2015. Building a semantic parser overnight. In 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 13321342.Google ScholarGoogle Scholar
  20. [20] Abebe Minale A., Tekli Joe, Fekade Getahun Gilbert Tekli, and Chbeir Richard. 2016. A general multimedia representation space model toward event-based collective knowledge management. In Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) and 15th International Symposium on Distributed Computing and Applications for Business Engineering (DCABES). 512521.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Maliha Qureshi, Bibi Asma Majid, and Khan Hikmat Ullah. 2013. Comparative analysis of semantic search engines based on requirement space pyramid. Int. J. Fut. Comput. Commun. 2, 6 (2013), 562.Google ScholarGoogle Scholar
  22. [22] Sandra Escudero, Garrido Angel L., and Ilarri Sergio. 2014. Obtaining knowledge from the web using fusion and summarization techniques. In IEEE 17th International Conference on Information Fusion (FUSION). 18.Google ScholarGoogle Scholar
  23. [23] Patel Jay, Shah Pinal, Makvana Kamlesh, and Shah Parth. 2015. Review on web search personalization through semantic data. In IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). 16.Google ScholarGoogle Scholar
  24. [24] Rupal Gupta and Malik Sanjay Kumar. 2011. SPARQL semantics and execution analysis in semantic web using various tools. In IEEE International Conference on Communication Systems and Network Technologies (CSNT). 278282.Google ScholarGoogle Scholar
  25. [25] Singh Negi Yogender and Kumar Suresh. 2014. A comparative analysis of keyword-and semantic-based search engines. In Intelligent Computing, Networking, and Informatics. Springer, New Delhi, 727736.Google ScholarGoogle Scholar
  26. [26] Jose Alejandro Reyes and Montes Azucena. 2016. Learning discourse relations from news reports: An event-driven approach. IEEE Latin Amer. Trans. 14, 1 (2016), 356363.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Lama Saeeda. 2017. Iterative approach for information extraction and ontology learning from textual aviation safety reports. In European Semantic Web Conference. 236245.Google ScholarGoogle Scholar
  28. [28] Urvi Shah, Finin Tim, Joshi Anupam, Scott Cost R., and Matfield James. 2002. Information retrieval on the semantic web. In 11th International Conference on Information and Knowledge Management. ACM, 461468.Google ScholarGoogle Scholar
  29. [29] Aditya Khamparia and Pandey Babita. 2017. Comprehensive analysis of semantic web reasoners and tools: A survey. Educ. Inf. Technol. 22, 6 (2017), 31213145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Sanjib Kumar Sahu, Mahapatra D. P., and Balabantaray R. C.. 2016. Comparative study of search engines in context of features and semantics. J. Theoret. Appl. Inf. Technol. 88, 2 (2016), 210218.Google ScholarGoogle Scholar
  31. [31] Kadam Aniket D., Joshi Shashank D., Shinde Sachin V., and Medhane Sampat P.. 2015. Question answering search engine short review and road-map to future QA search engine. In IEEE International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO). 18.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Tafseer Ahmed and Butt Miriam. 2011. Discovering semantic classes for Urdu NV complex predicates. In 9th International Conference on Computational Semantics Association for Computational Linguistics. 305309.Google ScholarGoogle Scholar
  33. [33] Vaishali Gupta, Joshi Nisheeth, and ItiMathur. 2020. Rule based stemmer in Urdu. In IEEE 4th International Conference on Computer and Communication Technology (ICCCT). 19201927.Google ScholarGoogle Scholar
  34. [34] Ahmad Khan Sajjad, Anwar Waqas, and Ijaz Bajwa Usama. 2011. Challenges in developing a rule based Urdu stemmer. In 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP). 4651.Google ScholarGoogle Scholar
  35. [35] Paik Jiaul H., Pal Dipasree, and Parui Swapan K.. 2011. A novel corpus-based stemming algorithm using co-occurrence statistics. In 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 863872.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Somayye Estahbanati and Javidan Reza. 2011. A new stemmer for Farsi language. In IEEE International Symposium on Computer Science and Software Engineering (CSSE). 2529.Google ScholarGoogle Scholar
  37. [37] Kashif Riaz. 2008. Baseline for Urdu IR evaluation. In 2nd ACM Workshop on Improving on English Web Searching. 97100.Google ScholarGoogle Scholar
  38. [38] Bisma Ayaz, Altaf Wajiha, Sadiq Farah, Ahmed Hameeza, and Ismai Muhammad Ali. 2016. Novel mania: A semantic search engine for Urdu. In IEEE International Conference on Open Source Systems & Technologies (ICOSST). 4247.Google ScholarGoogle Scholar
  39. [39] Goldschmidt David E. and Krishnamoorthy Mukkai. 2005. Architecting a search engine for the semantic web. In AAAI Workshop on Contexts and Ontologies: Theory, Practice and Applications.Google ScholarGoogle Scholar
  40. [40] Prakash Choudhary and Nain Neeta. 2014. An annotated Urdu corpus of handwritten text image and benchmarking of corpus. In IEEE 37th International Conference on Information and Communication Technology. Electronics and Microelectronics (MIPRO). 11591164.Google ScholarGoogle Scholar
  41. [41] Zamil Al, Mohammed G. H., and Al-Radaideh Qasem. 2014. Automatic extraction of ontological relations from Arabic text. J. King Saud Univ.-Comput. Inf. Sci. 26, 4 (2014), 462472.Google ScholarGoogle Scholar
  42. [42] Waseem Alromima, Elgohary Rania, Moawad Ibrahim F., and Aref Mostafa. 2015. Applying ontological engineering approach for Arabic Quran corpus: A comprehensive survey. In IEEE 7th International Conference on Intelligent Computing and Information Systems (ICICIS). 620627.Google ScholarGoogle Scholar
  43. [43] Shunmughavel Vivekanandam and Jaganathan P.. 2015. A concept based ontology mapping method for effective retrieval of bio-medical documents. J. Med. Imag. Health Inform. 5 (2015), 926935.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Irene Celino, Valle Emanuele Della, Cerizza Dario, and Turati Andrea. 2007. Squiggle: An experience in model-driven development of real-world semantic search engines. In International Conference on Web Engineering. 485490.Google ScholarGoogle Scholar
  45. [45] Li Ding, Finin Tim, Joshi Anupam, Pan Rong, Scott Cost R., Peng Yun, Reddivari Pavan, Doshi Vishal, and Sachs Joel. 2004. Swoogle: A search and metadata engine for the semantic web. In 13th ACM International Conference on Information and Knowledge Management. 652659.Google ScholarGoogle Scholar
  46. [46] Aidan Hogan, Harth Andreas, Umbrich Jürgen, Kinsella Sheila, Polleres Axel, and Decker Stefan. 2011. Searching and browsing linked data with SWSE: The semantic web search engine. Web Seman.: Sci. Serv. Agents World Wide Web 9, 4 (2011), 365401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Zainab Majeed Albujasim. 2014. Search Queries in an Information Retrieval System for Arabic-language Texts. MS thesis. Department of Computer Science, University of Kentucky.Google ScholarGoogle Scholar
  48. [48] Ravi Bhushan Mishra and Kumar Sandeep. 2011. Semantic web reasoners and languages. Artif. Intell. Rev. 35, 4 (2011), 339368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Dmitry Tsarkov and Horrocks Ian. 2006. FaCT++ description logic reasoner: System description. In International Joint Conference on Automated Reasoning. 292297.Google ScholarGoogle Scholar
  50. [50] Glimm B., Horrocks I., Motik B., and Stoilos G.. 2009. HermiT: Reasoning with Large Ontologies. Computing Laboratory, Oxford University.Google ScholarGoogle Scholar
  51. [51] Ian Horrocks and Sattler Ulrike. 2007. A tableau decision procedure for $\mathcal {SHOIQ}$. J. Autom. Reason. 39, 3 (2007), 249276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Martijn Devisscher, De Meyer Tim, Van Criekinge Wim, and Dawyndt Peter. 2013. An ontology based query engine for querying biological sequences. EMBnet. J. 19 (2013), 51.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Shakeel K., Tahir G. R., Tehseen I., and Ali M.. 2018. A framework of Urdu topic modeling using latent Dirichlet allocation (LDA). In IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC). 117123.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Mirzayeya M.. 2021. History of Urdu language and its status in India and Pakistan. Academ.: Int. Multidiscip. Res. J. 11, 2 (2021), 584591.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Sarim M.. 2020. Urdu natural language processing issues and challenges: A review study. In 2nd International Conference on Intelligent Technologies and Applications.Google ScholarGoogle Scholar
  56. [56] Visweswariah K., Chenthamarakshan V., and Kambhatla N.. 2010. Urdu and Hindi: Translation and sharing of linguistic resources. In International Conference on Computational Linguistics. 12831291.Google ScholarGoogle Scholar
  57. [57] Ansari N. A. and Mangrio R.. 2019. Morphology of Urdu Verbs: A word and paradigm approach. Pakist. J. Lang. Stud. 3, 1 (2019), 3142.Google ScholarGoogle Scholar
  58. [58] Hassan J. and Shoaib U.. 2020. Multi-class review rating classification using deep recurrent neural network. Neural Process. Lett. 51, 1 (2020), 10311048.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Shoaib U., Ahmad N., Prinetto P., and Tiotto G.. 2014. Integrating multiwordnet with Italian sign language lexical resources. Exp. Syst. Applic. 41, 5 (2014), 23002308.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Context-aware Urdu Information Retrieval System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 3
      March 2023
      570 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3579816
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 April 2023
      • Online AM: 14 October 2022
      • Accepted: 26 November 2021
      • Revised: 26 October 2021
      • Received: 29 April 2021
      Published in tallip Volume 22, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)202
      • Downloads (Last 6 weeks)15

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!