skip to main content
research-article

An ontology-driven approach for semantic information retrieval on the Web

Published:30 July 2009Publication History
Skip Abstract Section

Abstract

The concept of relevance is a hot topic in the information retrieval process. In recent years the extreme growth of digital documents brought to light the need for novel approaches and more efficient techniques to improve the accuracy of IR systems to take into account real users' information needs. In this article we propose a novel metric to measure the semantic relatedness between words. Our approach is based on ontologies represented using a general knowledge base for dynamically building a semantic network. This network is based on linguistic properties and it is combined with our metric to create a measure of semantic relatedness. In this way we obtain an efficient strategy to rank digital documents from the Internet according to the user's interest domain. The proposed methods, metrics, and techniques are implemented in a system for information retrieval on the Web. Experiments are performed on a test set built using a directory service having information about analyzed documents. The obtained results compared to other similar systems show an effective improvement.

References

  1. Anand, S. S., Kearney, P., and Shapcott, M. 2007. Generating semantically enriched user profiles for Web personalization. ACM Trans. Internet Technol. 7, 4, 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Barry, C. L. 1998. Document representations and clues to document relevance. J. Amer. Soc. Inform. Sci. 49, 14, 1293--1303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baziz, M., Boughanem, M., Aussenac-Gilles, N., and Chrisment, C. 2005. Semantic cores for representing documents in IR. In Proceedings of the ACM Symposium on Applied Computing (SAC'05). ACM Press, 1011--1017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Berners-Lee, T., Hendler, J., and Lassila, O. 2001. The semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Sci. Amer. 284, 5 (5), 28--37.Google ScholarGoogle Scholar
  6. Boyce, B. R., Meadow, C. T., and Kraft, D. H. 1994. Measurement in Information Science. Academic Press Inc.Google ScholarGoogle Scholar
  7. Budanitsky, A. 1999. Lexical semantic relatedness and its application in natural language processing. Tech. rep., Department of Computer Science, University of Toronto.Google ScholarGoogle Scholar
  8. Castano, S., Ferrara, A., and Montanelli, S. 2003. H-match: An algorithm for dynamically matching ontologies in peer-based systems. In Proceedings of the International Workshop on Semantic Web and Databases (SWDB). 231--250.Google ScholarGoogle Scholar
  9. Fabriani, P., Missikoff, M., and Velardi, P. 2001. Using text processing techniques to automatically enrich a domain ontology. In Proceedings of the ACM International Conference on Formal Ontology in Information Systems (FOIS'01). 270--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. 2002. Placing search in context: The concept revisited. Trans. Inform. Syst. 20, 1, 116--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gaizauskas, R. and Humphreys, K. 1997. Using a semantic network for information extraction. J. Natural Lang. Eng. 3, 2/3, 147--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Green, S. 1997. Automatically generating hypertext by computing semantic similarity. Ph.D. thesis, Department of Computer Science, University of Toronto. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gruber, T. R. 1993. A translation approach to portable ontology specifications. Knowl. Acquis. 5, 2, 199--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Halliday, M. and Hasan, R. 1976. Cohesion In English. Longman.Google ScholarGoogle Scholar
  15. Harter, S. P. 1992. Psychological relevance and information science. J. Amer. Soc. Inform. Sci. 43, 9, 602--615.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jansen, B. J., Mullen, T., Spink, A., and Pedersen, J. 2006. Automated gathering of Web information: An in-depth examination of agents interacting with search engines. ACM Trans. Internet Technol. 6, 4, 442--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M. 1996. Four paradigms for indexing video conferences. IEEE Multi-Media 3, 1, 63--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kerschberg, L., Kim, W., and Scime, A. 2003. A personalizable agent for semantic taxonomy-based Web search. In Lecture Notes in Artificial Intelligence. Springer, 3--31.Google ScholarGoogle Scholar
  19. Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In WordNet: An Electronic Lexical Database, C. Fellbaum, Ed. The MIT Press, Cambridge, Chapter 11, 265--283.Google ScholarGoogle Scholar
  20. Lee, C.-H. and Yang, H.-C. 2001. Text mining of bilingual parallel corpora with a measure of semantic similarity. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 470--475.Google ScholarGoogle Scholar
  21. Lee, J., Kim, M., and Lee, Y. 1993. Information retrieval based on conceptual distance in is a hierarchies. J. Docum. 49, 2, 188--207.Google ScholarGoogle ScholarCross RefCross Ref
  22. Li, Y., Bandar, Z., and McLean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 4, 871--882. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Miller, G. A. 1995. WordNet: A lexical database for English. Comm. ACM 38, 11, 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Moldovan, D. I. and Mihalcea, R. 2000. Using WordNet and lexical operators to improve Internet searches. IEEE Internet Comput. 4, 1, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computat. Ling. 17, 1 (Mar.), 21--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Neches, R., Fikes, R., Finin, T., Gruber, T., Patil, R., Senator, T., and Swartout, W. R. 1991. Enabling technology for knowledge sharing. AI Mag. 12, 3, 36--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Park, T. 1993. The nature of relevance in information retrieval: An empirical study. Library Quart. 63, 3, 318--351.Google ScholarGoogle ScholarCross RefCross Ref
  28. Rada, R., Mili, H., Bicknell, E., and Blettner, M. 1989. Development and application of a metric on semantic nets. IEEE Trans. Syst. Man and Cyber 19, 1, 17--30.Google ScholarGoogle ScholarCross RefCross Ref
  29. Saracevic, T. 1975. Relevance: A review of and a framework for thinking on the notion in information science. J. Amer. Soc. Inform. Sci. 26, 6, 321--343.Google ScholarGoogle ScholarCross RefCross Ref
  30. Saracevic, T. 1996. Relevance reconsidered. In Proceedings of the 2nd International Conference on Conceptions of Library and Information Science: Integration in Perspective (CoLIS2), P. Ingwersen and N. Pors, Eds. The Royal School of Librarianship, 201--218.Google ScholarGoogle Scholar
  31. Schutz, A. 1970. Reflections on the Problem of Relevance. Yale University Press, New Haven.Google ScholarGoogle Scholar
  32. Shek, E., Vellaikal, A., Dao, S., and Perry, B. 1998. Semantic agents for content-based discovery in distributed image libraries. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries. IEEE, 19--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Shepard, R. N. 1987. Towards a universal law of generalisation for psychological science. Science 237, 1317--1323.Google ScholarGoogle ScholarCross RefCross Ref
  34. Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., and Warke, Y. 2002. Managing semantic content for the Web. IEEE Internet Comput. 6, 4, 80--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Srihari, R., Rao, A., Han, B., Munirathnam, S., and Xiaoyun, W. 2000. A model for multi-model information retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'00). vol. 2. IEEE, 701--704.Google ScholarGoogle Scholar
  36. Stairmand, M. A. 1996. A computational analysis of lexical cohesion with applications in information retrieval. Ph.D. thesis, Centre for Computational Linguistics, UMIST Manchester.Google ScholarGoogle Scholar
  37. Sussna, M. 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the 2nd International Conference on Information and Knowledge Management (CIKM'93). ACM Press, 67--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Swanson, D. 1986. Subjective versus objective relevance in bibliographic retrieval systems. Library Quart. 56, 4, 389--398.Google ScholarGoogle ScholarCross RefCross Ref
  39. Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. J. Docum. 56, 5, 389--398.Google ScholarGoogle ScholarCross RefCross Ref
  40. van Rijsbergen, C. J. 1980. Information Retrieval, 2nd Ed. Butterworths. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G., and Milios, E. E. 2005. Semantic similarity methods in WordNet and their application to information retrieval on the Web. In Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management (WIDM'05). ACM Press, 10--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Weihua, L. 2002. Ontology supported intelligent information agent. In Proceedings on the 1st International IEEE Symposium on Intelligent Systems. IEEE, 383--387.Google ScholarGoogle ScholarCross RefCross Ref
  43. Weiss, R., Velez, B., and Sheldon, M. A. 1996. Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In Proceedings of the the 7th ACM Conference on Hypertext (HYPERTEXT'96). ACM Press, 180--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Wu, Z. and Palmer, M. 1994. Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL-94). 133--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xu, H., Mita, Y., and Shibata, T. 2002. Intelligent Internet search applications based on VLSI associative processors. In Proceedings of the Symposium on Applications and the Internet (SAINT'02). 230--237. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An ontology-driven approach for semantic information retrieval on the Web

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image ACM Transactions on Internet Technology
                      ACM Transactions on Internet Technology  Volume 9, Issue 3
                      July 2009
                      89 pages
                      ISSN:1533-5399
                      EISSN:1557-6051
                      DOI:10.1145/1552291
                      Issue’s Table of Contents

                      Copyright © 2009 ACM

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Accepted: 1 October 2009
                      • Published: 30 July 2009
                      • Received: 1 October 2008
                      Published in toit Volume 9, Issue 3

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • research-article
                      • Research
                      • Refereed

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader
                    About Cookies On This Site

                    We use cookies to ensure that we give you the best experience on our website.

                    Learn more

                    Got it!