Abstract
The concept of relevance is a hot topic in the information retrieval process. In recent years the extreme growth of digital documents brought to light the need for novel approaches and more efficient techniques to improve the accuracy of IR systems to take into account real users' information needs. In this article we propose a novel metric to measure the semantic relatedness between words. Our approach is based on ontologies represented using a general knowledge base for dynamically building a semantic network. This network is based on linguistic properties and it is combined with our metric to create a measure of semantic relatedness. In this way we obtain an efficient strategy to rank digital documents from the Internet according to the user's interest domain. The proposed methods, metrics, and techniques are implemented in a system for information retrieval on the Web. Experiments are performed on a test set built using a directory service having information about analyzed documents. The obtained results compared to other similar systems show an effective improvement.
- Anand, S. S., Kearney, P., and Shapcott, M. 2007. Generating semantically enriched user profiles for Web personalization. ACM Trans. Internet Technol. 7, 4, 22. Google Scholar
Digital Library
- Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading. Google Scholar
Digital Library
- Barry, C. L. 1998. Document representations and clues to document relevance. J. Amer. Soc. Inform. Sci. 49, 14, 1293--1303. Google Scholar
Digital Library
- Baziz, M., Boughanem, M., Aussenac-Gilles, N., and Chrisment, C. 2005. Semantic cores for representing documents in IR. In Proceedings of the ACM Symposium on Applied Computing (SAC'05). ACM Press, 1011--1017. Google Scholar
Digital Library
- Berners-Lee, T., Hendler, J., and Lassila, O. 2001. The semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Sci. Amer. 284, 5 (5), 28--37.Google Scholar
- Boyce, B. R., Meadow, C. T., and Kraft, D. H. 1994. Measurement in Information Science. Academic Press Inc.Google Scholar
- Budanitsky, A. 1999. Lexical semantic relatedness and its application in natural language processing. Tech. rep., Department of Computer Science, University of Toronto.Google Scholar
- Castano, S., Ferrara, A., and Montanelli, S. 2003. H-match: An algorithm for dynamically matching ontologies in peer-based systems. In Proceedings of the International Workshop on Semantic Web and Databases (SWDB). 231--250.Google Scholar
- Fabriani, P., Missikoff, M., and Velardi, P. 2001. Using text processing techniques to automatically enrich a domain ontology. In Proceedings of the ACM International Conference on Formal Ontology in Information Systems (FOIS'01). 270--284. Google Scholar
Digital Library
- Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. 2002. Placing search in context: The concept revisited. Trans. Inform. Syst. 20, 1, 116--131. Google Scholar
Digital Library
- Gaizauskas, R. and Humphreys, K. 1997. Using a semantic network for information extraction. J. Natural Lang. Eng. 3, 2/3, 147--169. Google Scholar
Digital Library
- Green, S. 1997. Automatically generating hypertext by computing semantic similarity. Ph.D. thesis, Department of Computer Science, University of Toronto. Google Scholar
Digital Library
- Gruber, T. R. 1993. A translation approach to portable ontology specifications. Knowl. Acquis. 5, 2, 199--220. Google Scholar
Digital Library
- Halliday, M. and Hasan, R. 1976. Cohesion In English. Longman.Google Scholar
- Harter, S. P. 1992. Psychological relevance and information science. J. Amer. Soc. Inform. Sci. 43, 9, 602--615.Google Scholar
Cross Ref
- Jansen, B. J., Mullen, T., Spink, A., and Pedersen, J. 2006. Automated gathering of Web information: An in-depth examination of agents interacting with search engines. ACM Trans. Internet Technol. 6, 4, 442--464. Google Scholar
Digital Library
- Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M. 1996. Four paradigms for indexing video conferences. IEEE Multi-Media 3, 1, 63--73. Google Scholar
Digital Library
- Kerschberg, L., Kim, W., and Scime, A. 2003. A personalizable agent for semantic taxonomy-based Web search. In Lecture Notes in Artificial Intelligence. Springer, 3--31.Google Scholar
- Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In WordNet: An Electronic Lexical Database, C. Fellbaum, Ed. The MIT Press, Cambridge, Chapter 11, 265--283.Google Scholar
- Lee, C.-H. and Yang, H.-C. 2001. Text mining of bilingual parallel corpora with a measure of semantic similarity. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 470--475.Google Scholar
- Lee, J., Kim, M., and Lee, Y. 1993. Information retrieval based on conceptual distance in is a hierarchies. J. Docum. 49, 2, 188--207.Google Scholar
Cross Ref
- Li, Y., Bandar, Z., and McLean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 4, 871--882. Google Scholar
Digital Library
- Miller, G. A. 1995. WordNet: A lexical database for English. Comm. ACM 38, 11, 39--41. Google Scholar
Digital Library
- Moldovan, D. I. and Mihalcea, R. 2000. Using WordNet and lexical operators to improve Internet searches. IEEE Internet Comput. 4, 1, 34--43. Google Scholar
Digital Library
- Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computat. Ling. 17, 1 (Mar.), 21--48. Google Scholar
Digital Library
- Neches, R., Fikes, R., Finin, T., Gruber, T., Patil, R., Senator, T., and Swartout, W. R. 1991. Enabling technology for knowledge sharing. AI Mag. 12, 3, 36--56. Google Scholar
Digital Library
- Park, T. 1993. The nature of relevance in information retrieval: An empirical study. Library Quart. 63, 3, 318--351.Google Scholar
Cross Ref
- Rada, R., Mili, H., Bicknell, E., and Blettner, M. 1989. Development and application of a metric on semantic nets. IEEE Trans. Syst. Man and Cyber 19, 1, 17--30.Google Scholar
Cross Ref
- Saracevic, T. 1975. Relevance: A review of and a framework for thinking on the notion in information science. J. Amer. Soc. Inform. Sci. 26, 6, 321--343.Google Scholar
Cross Ref
- Saracevic, T. 1996. Relevance reconsidered. In Proceedings of the 2nd International Conference on Conceptions of Library and Information Science: Integration in Perspective (CoLIS2), P. Ingwersen and N. Pors, Eds. The Royal School of Librarianship, 201--218.Google Scholar
- Schutz, A. 1970. Reflections on the Problem of Relevance. Yale University Press, New Haven.Google Scholar
- Shek, E., Vellaikal, A., Dao, S., and Perry, B. 1998. Semantic agents for content-based discovery in distributed image libraries. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries. IEEE, 19--23. Google Scholar
Digital Library
- Shepard, R. N. 1987. Towards a universal law of generalisation for psychological science. Science 237, 1317--1323.Google Scholar
Cross Ref
- Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., and Warke, Y. 2002. Managing semantic content for the Web. IEEE Internet Comput. 6, 4, 80--87. Google Scholar
Digital Library
- Srihari, R., Rao, A., Han, B., Munirathnam, S., and Xiaoyun, W. 2000. A model for multi-model information retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'00). vol. 2. IEEE, 701--704.Google Scholar
- Stairmand, M. A. 1996. A computational analysis of lexical cohesion with applications in information retrieval. Ph.D. thesis, Centre for Computational Linguistics, UMIST Manchester.Google Scholar
- Sussna, M. 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the 2nd International Conference on Information and Knowledge Management (CIKM'93). ACM Press, 67--74. Google Scholar
Digital Library
- Swanson, D. 1986. Subjective versus objective relevance in bibliographic retrieval systems. Library Quart. 56, 4, 389--398.Google Scholar
Cross Ref
- Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. J. Docum. 56, 5, 389--398.Google Scholar
Cross Ref
- van Rijsbergen, C. J. 1980. Information Retrieval, 2nd Ed. Butterworths. Google Scholar
Digital Library
- Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G., and Milios, E. E. 2005. Semantic similarity methods in WordNet and their application to information retrieval on the Web. In Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management (WIDM'05). ACM Press, 10--16. Google Scholar
Digital Library
- Weihua, L. 2002. Ontology supported intelligent information agent. In Proceedings on the 1st International IEEE Symposium on Intelligent Systems. IEEE, 383--387.Google Scholar
Cross Ref
- Weiss, R., Velez, B., and Sheldon, M. A. 1996. Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In Proceedings of the the 7th ACM Conference on Hypertext (HYPERTEXT'96). ACM Press, 180--193. Google Scholar
Digital Library
- Wu, Z. and Palmer, M. 1994. Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL-94). 133--138. Google Scholar
Digital Library
- Xu, H., Mita, Y., and Shibata, T. 2002. Intelligent Internet search applications based on VLSI associative processors. In Proceedings of the Symposium on Applications and the Internet (SAINT'02). 230--237. Google Scholar
Digital Library
Index Terms
An ontology-driven approach for semantic information retrieval on the Web
Recommendations
A web-centric semantic mediation approach for spatial information systems
Semantics-related issues are at the heart of web-centric information systems and emerging spatial applications that require an integrated access to collections of heterogeneous data sources. We present an ontology-based semantic mediation approach and ...
A content-based approach for document representation and retrieval
DocEng '08: Proceedings of the eighth ACM symposium on Document engineeringIn the last few years, the problem of defining efficient techniques for knowledge representation is becoming a challenging topic in both academic and industrial community. The large amount of available data creates several problems in terms of ...
Ontology-based semantic similarity: A new feature-based approach
Estimation of the semantic likeness between words is of great importance in many applications dealing with textual data such as natural language processing, knowledge acquisition and information retrieval. Semantic similarity measures exploit knowledge ...






Comments