Abstract
World Wide Web (WWW) is playing a vital role for sharing dynamic knowledge in every field of life. The information on web comprises a huge amount of data in different forms such as structured, semi structured, or few is totally in unstructured format. Due to huge size of information, searching from larger textual data about the specific topic or getting precise information is a challenging task. All this leads to the problem of word sense ambiguity (WSA). Urdu language-based information retrieval system using different techniques related to Web Semantic Search Engine architecture is proposed to efficiently retrieve the relevant information and solve the problem of WSA. The proposed system has average precision ratio 96% as compared to average precision ratio of 74% and 75% average precision Google for single word query. For the long text queries, our system outperforms the existing famous search engines with 92% accuracy such as Bing and Google having 16.50% and 16% accuracy, respectively. Similarly, the proposed system for single word query, the recall ratio is 32.25% as compared to 25% and 25% of Bing and Google. The results of recall ratio for long text query are improved as well, showing 6.38% as compared to 6.20% and 4.8% of Bing and Google, respectively. The results showed that the proposed system gives better and efficient results as compared to the existing systems for Urdu language.
- [1] . 1998. The anatomy of a large-scale hyper textual web search engine. Comput. Netw. ISDN Syst. 30, 1–7 (1998), 107–11.Google Scholar
Digital Library
- [2] . 2014. Event graphs for information retrieval and multi-document summarization. Exp. Syst. Applic. 41, 15 (2014), 6904–6916.Google Scholar
Digital Library
- [3] . 2016. An overview on XML semantic disambiguation from unstructured text to semi-structured data: Background, applications and ongoing challenges. IEEE Trans. Knowl. Data Eng 28, 6 (2016), 1383–1407.Google Scholar
Digital Library
- [4] . 2015. A comparative study of keyword and semantic based search engine. Int. J. Innov. Res. Sci., Eng. Technol. 4, 11 (2015), 11156–11161.Google Scholar
- [5] . 2015. Diversity-aware retrieval of medical records. Comput. Industr. 69 (2015), 81–91.Google Scholar
Digital Library
- [6] . 1999. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Webby its Inventor. DIANE Publishing Company.Google Scholar
- [7] . 2003. Developing a tagset for automated part-of-speech tagging in Urdu. In Corpus Linguistics.Google Scholar
- [8] . 2009. Supervised Word Sense Disambiguation for Urdu Using Bayesian Classification. Center for Research in Urdu Language Processing, Lahore, Pakistan. https://eprints.lancs.ac.uk/id/eprint/103/.Google Scholar
- [9] . 2011. Copy detection in Urdu language documents using n-grams model. In IEEE International Conference on Computer Networks and Information Technology (ICCNIT). 263–266.Google Scholar
Cross Ref
- [10] . 2008. Concept search in Urdu. 2008. In 2nd PhD Workshop on Information and Knowledge Management. 33–40.Google Scholar
- [11] . 2002. A study in Urdu corpus construction. In 3rd Workshop on Asian Language Resources and International Standardization. 1–5.Google Scholar
- [12] . 2012. Indexing Linked Data. MS. Thesis. Department of Software Engineering, Univerzita Karlova, Matematicko-fyzikálnífakulta.Google Scholar
- [13] . 1998. Web search engines. J. American Societ. Inf. Sci. 49, 11 (1998), 973–982.Google Scholar
Cross Ref
- [14] . 2006. Sponsored search: A brief history. Bull. Amer. Societ. Inf. Sci. Technol. 32, 2 (2006), 12–13.Google Scholar
Cross Ref
- [15] . 2011. History of search engines. Int. J. Manag. Inf. Syst. 15, 4 (2011), 47–58.Google Scholar
- [16] . 2013. A comparative study between keyword and semantic based search engines. In International Conference on Cloud, Big Data and Trust. 13–15.Google Scholar
- [17] . 2009. Page ranking algorithms: A survey. In IEEE International Advance Computing Conference. 1530–1537.Google Scholar
- [18] . 2008. Combining fact and document retrieval with spreading activation for semantic desktop search. In Springer European Semantic Web Conference. 569–583.Google Scholar
- [19] . 2015. Building a semantic parser overnight. In 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1332–1342.Google Scholar
- [20] . 2016. A general multimedia representation space model toward event-based collective knowledge management. In Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) and 15th International Symposium on Distributed Computing and Applications for Business Engineering (DCABES). 512–521.Google Scholar
Cross Ref
- [21] . 2013. Comparative analysis of semantic search engines based on requirement space pyramid. Int. J. Fut. Comput. Commun. 2, 6 (2013), 562.Google Scholar
- [22] . 2014. Obtaining knowledge from the web using fusion and summarization techniques. In IEEE 17th International Conference on Information Fusion (FUSION). 1–8.Google Scholar
- [23] . 2015. Review on web search personalization through semantic data. In IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). 1–6.Google Scholar
- [24] . 2011. SPARQL semantics and execution analysis in semantic web using various tools. In IEEE International Conference on Communication Systems and Network Technologies (CSNT). 278–282.Google Scholar
- [25] . 2014. A comparative analysis of keyword-and semantic-based search engines. In Intelligent Computing, Networking, and Informatics. Springer, New Delhi, 727–736.Google Scholar
- [26] . 2016. Learning discourse relations from news reports: An event-driven approach. IEEE Latin Amer. Trans. 14, 1 (2016), 356–363.Google Scholar
Cross Ref
- [27] . 2017. Iterative approach for information extraction and ontology learning from textual aviation safety reports. In European Semantic Web Conference. 236–245.Google Scholar
- [28] . 2002. Information retrieval on the semantic web. In 11th International Conference on Information and Knowledge Management. ACM, 461–468.Google Scholar
- [29] . 2017. Comprehensive analysis of semantic web reasoners and tools: A survey. Educ. Inf. Technol. 22, 6 (2017), 3121–3145.Google Scholar
Digital Library
- [30] . 2016. Comparative study of search engines in context of features and semantics. J. Theoret. Appl. Inf. Technol. 88, 2 (2016), 210–218.Google Scholar
- [31] . 2015. Question answering search engine short review and road-map to future QA search engine. In IEEE International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO). 1–8.Google Scholar
Cross Ref
- [32] . 2011. Discovering semantic classes for Urdu NV complex predicates. In 9th International Conference on Computational Semantics Association for Computational Linguistics. 305–309.Google Scholar
- [33] , and ItiMathur. 2020. Rule based stemmer in Urdu. In IEEE 4th International Conference on Computer and Communication Technology (ICCCT). 1920–1927.Google Scholar
- [34] . 2011. Challenges in developing a rule based Urdu stemmer. In 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP). 46–51.Google Scholar
- [35] . 2011. A novel corpus-based stemming algorithm using co-occurrence statistics. In 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 863–872.Google Scholar
Digital Library
- [36] . 2011. A new stemmer for Farsi language. In IEEE International Symposium on Computer Science and Software Engineering (CSSE). 25–29.Google Scholar
- [37] . 2008. Baseline for Urdu IR evaluation. In 2nd ACM Workshop on Improving on English Web Searching. 97–100.Google Scholar
- [38] . 2016. Novel mania: A semantic search engine for Urdu. In IEEE International Conference on Open Source Systems & Technologies (ICOSST). 42–47.Google Scholar
- [39] . 2005. Architecting a search engine for the semantic web. In AAAI Workshop on Contexts and Ontologies: Theory, Practice and Applications.Google Scholar
- [40] . 2014. An annotated Urdu corpus of handwritten text image and benchmarking of corpus. In IEEE 37th International Conference on Information and Communication Technology. Electronics and Microelectronics (MIPRO). 1159–1164.Google Scholar
- [41] . 2014. Automatic extraction of ontological relations from Arabic text. J. King Saud Univ.-Comput. Inf. Sci. 26, 4 (2014), 462–472.Google Scholar
- [42] . 2015. Applying ontological engineering approach for Arabic Quran corpus: A comprehensive survey. In IEEE 7th International Conference on Intelligent Computing and Information Systems (ICICIS). 620–627.Google Scholar
- [43] . 2015. A concept based ontology mapping method for effective retrieval of bio-medical documents. J. Med. Imag. Health Inform. 5 (2015), 926–935.Google Scholar
Cross Ref
- [44] . 2007. Squiggle: An experience in model-driven development of real-world semantic search engines. In International Conference on Web Engineering. 485–490.Google Scholar
- [45] . 2004. Swoogle: A search and metadata engine for the semantic web. In 13th ACM International Conference on Information and Knowledge Management. 652–659.Google Scholar
- [46] . 2011. Searching and browsing linked data with SWSE: The semantic web search engine. Web Seman.: Sci. Serv. Agents World Wide Web 9, 4 (2011), 365–401.Google Scholar
Digital Library
- [47] . 2014. Search Queries in an Information Retrieval System for Arabic-language Texts. MS thesis. Department of Computer Science, University of Kentucky.Google Scholar
- [48] . 2011. Semantic web reasoners and languages. Artif. Intell. Rev. 35, 4 (2011), 339–368.Google Scholar
Digital Library
- [49] . 2006. FaCT++ description logic reasoner: System description. In International Joint Conference on Automated Reasoning. 292–297.Google Scholar
- [50] . 2009. HermiT: Reasoning with Large Ontologies. Computing Laboratory, Oxford University.Google Scholar
- [51] . 2007. A tableau decision procedure for $\mathcal {SHOIQ}$. J. Autom. Reason. 39, 3 (2007), 249–276.Google Scholar
Digital Library
- [52] . 2013. An ontology based query engine for querying biological sequences. EMBnet. J. 19 (2013), 51.Google Scholar
Cross Ref
- [53] . 2018. A framework of Urdu topic modeling using latent Dirichlet allocation (LDA). In IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC). 117–123.Google Scholar
Cross Ref
- [54] . 2021. History of Urdu language and its status in India and Pakistan. Academ.: Int. Multidiscip. Res. J. 11, 2 (2021), 584–591.Google Scholar
Cross Ref
- [55] . 2020. Urdu natural language processing issues and challenges: A review study. In 2nd International Conference on Intelligent Technologies and Applications.Google Scholar
- [56] . 2010. Urdu and Hindi: Translation and sharing of linguistic resources. In International Conference on Computational Linguistics. 1283–1291.Google Scholar
- [57] . 2019. Morphology of Urdu Verbs: A word and paradigm approach. Pakist. J. Lang. Stud. 3, 1 (2019), 31–42.Google Scholar
- [58] . 2020. Multi-class review rating classification using deep recurrent neural network. Neural Process. Lett. 51, 1 (2020), 1031–1048.Google Scholar
Cross Ref
- [59] . 2014. Integrating multiwordnet with Italian sign language lexical resources. Exp. Syst. Applic. 41, 5 (2014), 2300–2308.Google Scholar
Digital Library
Index Terms
Context-aware Urdu Information Retrieval System
Recommendations
Ontology-based automatic query refinement
The effectiveness of user query plays a vital role in retrieving highly relevant documents in keyword-based search engine. Because of the lack of domain knowledge, users tend to post very short queries, which do not express their information need ...
Evaluating a Cross-Language Semantically Enriched Search Engine
ITNG '10: Proceedings of the 2010 Seventh International Conference on Information Technology: New GenerationsThis paper tackles the problem of a user who is capable of reading or using documents written in a specific language, but who is not fluent enough in this specific language to use the right query terms to find the document. The design of Cross-Language ...
Cross-Lingual Information Retrieve in Sogou Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalIn recent years, more and more Chinese people desires to be able to access the large amount of foreign language information and understand what is happening all over the world. However, language barrier is always a problem to them. In order to break the ...






Comments