skip to main content
10.1145/1807085.1807097acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial

From information to knowledge: harvesting entities and relationships from web sources

Authors Info & Claims
Published:06 June 2010Publication History

ABSTRACT

There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.

References

  1. E. Agichtein. Scaling information extraction to large document collections. IEEE Data Eng. Bull., 28(4), 2005.Google ScholarGoogle Scholar
  2. E. Agichtein, L. Gravano. Querying text databases for efficient information extraction. ICDE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  3. E. Agichtein, L. Gravano, J. Pavel, V. Sokolova, A. Voskoboynik. Snowball: a prototype system for extracting relations from large text collections. SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alchemy- Open-Source AI:riptsize alchemy.cs.washington.eduGoogle ScholarGoogle Scholar
  5. B. Aleman-Meza, C. Halaschek, A. Sheth, I. B. Arpinar, G. Sannapareddy. SWETO: Large-scale semantic web test-bed. SEKE: Workshop on Ontology in Action, 2004.Google ScholarGoogle Scholar
  6. F. Alkhateeb, J.-F. Baget, J. Euzenat. Extending SPARQL with regular expression patterns (for querying RDF). Web Semant., 7(2), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. Alonso, M. Gertz, R. A. Baeza-Yates. Clustering and exploring search results using timeline constructions. CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Anyanwu, A. Maduko, A. Sheth. SPARQ2L: towards support for subgraph extraction queries in RDF databases. WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Arasu, H. Garcia-Molina. Extracting structured data from web pages. SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Artale, E. Franconi. Foundations of temporal conceptual data models. Conceptual Modeling: Foundations and Applications, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Ashish, C. A. Knoblock. Semi-automatic wrapper generation for internet information sources. COOPIS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives. DBpedia: A nucleus for a web of open data. ISWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni. Open information extraction from the web. IJCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Baumgartner, S. Flesca, G. Gottlob. Visual web information extraction with Lixto. VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, J. Widom. Swoosh: a generic approach to entity resolution. VLDB J., 18(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Benjelloun, A. D. Sarma, A. Y. Halevy, M. Theobald, J. Widom. Databases with uncertainty and lineage. VLDB J., 17(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Berberich, S. Bedathur, O. Alonso, G. Weikum. A language modeling approach for temporal information needs. ECIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Berland, E. Charniak. Finding parts in very large corpora. ACL, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Berners-Lee, J. Hendler, O. Lassila. The semantic web. Scientific American, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. A. Bernstein, L. M. Haas. Information integration in the enterprise. Commun. ACM, 51(9), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Blohm, P. Cimiano. Using the web to reduce data sparseness in pattern-based information extraction. PKDD, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Brin. Extracting patterns and relations from the World Wide Web. WebDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Bunescu, R. Mooney. Extracting relations from text: From word sequences to dependency paths. Text Mining & Natural Language Processing, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. J. Cafarella. Extracting and querying a comprehensive web database. CIDR, 2009.Google ScholarGoogle Scholar
  25. M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, Y. Zhang. WebTables: exploring the power of tables on the web. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. J. Cafarella, A. Y. Halevy, N. Khoussainova. Data integration for the relational web. PVLDB, 2(1):1090--1101, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. J. Cafarella, J. Madhavan, A. Y. Halevy. Web-scale extraction of structured data. SIGMOD Record, 37(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. E. Califf, R. J. Mooney. Relational learning of pattern-match rules for information extraction. AAAI, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., T. M. Mitchell. Coupled semi-supervised learning for information extraction. WSDM, 2010. riptsizehttp://rtw.ml.cmu.edu/readtheweb.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Carme, M. Ceresna, O. Frölich, G. Gottlob, T. Hassan, M. Herzog, W. Holzinger, B. Krüpl. The Lixto project: exploring new frontiers of web data extraction. BNCOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Chakrabarti. Dynamic personalized PageRank in entity-relation graphs. WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M.-W. Chang, L.-A. Ratinov, N. Rizzolo, D. Roth. Learning and inference with constraints. AAAI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Chaudhuri, V. Ganti, R. Motwani. Robust identification of fuzzy duplicates. ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Chaudhuri, V. Ganti, D. Xin. Exploiting web search to generate synonyms for entities. WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Chaudhuri, V. Ganti, D. Xin. Mining document collections to facilitate accurate approximate entity matching. PVLDB, 2(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Chen, A. Doan, J. Yang, R. Ramakrishnan. Efficient information extraction over evolving text data. ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. Chen, B. J. Gao, A. Doan, J. Yang, R. Ramakrishnan. Optimizing complex extraction programs over evolving text data. SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Cheng, X. Yan, K. C.-C. Chang. EntityRank: searching entities directly and holistically. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. E. Chu, A. Baid, T. Chen, A. Doan, J. F. Naughton. A relational approach to incrementally extracting and querying structure in unstructured data. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P. Cimiano, J. Völker. Text2Onto - a framework for ontology learning and data-driven change discovery. NLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. W. Cohen. A century of progress on information integration: a mid-term report. WebDB, 2005.Google ScholarGoogle Scholar
  42. W. W. Cohen, P. Ravikumar, S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. IJCAI, 2003.Google ScholarGoogle Scholar
  43. V. Crescenzi, G. Mecca. Automatic information extraction from large websites. J. ACM, 51(5), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. V. Crescenzi, G. Mecca, P. Merialdo. RoadRunner: Towards automatic data extraction from large web sites. VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. EMNLP-CoNLL, 2007.Google ScholarGoogle Scholar
  46. H. Cunningham. An Introduction to Information Extraction, Encyclopedia of Language and Linguistics (2nd Edition). Elsevier, 2005.Google ScholarGoogle Scholar
  47. N. N. Dalvi, C. Ré, D. Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. P. DeRose, W. Shen, F. Chen, A. Doan, R. Ramakrishnan. Building structured web community portals: A top-down, compositional, and incremental approach. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. P. DeRose, W. Shen, F. Chen, Y. Lee, D. Burdick, A. Doan, R. Ramakrishnan. DBLife: A community information management platform for the database research community. CIDR, 2007.Google ScholarGoogle Scholar
  50. A. Doan, L. Gravano, R. Ramakrishnan, S. Vaithyanathan. (Eds.). Special issue on information extraction. SIGMOD Record, 37(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, W. Shen. Community information management. IEEE Data Eng. Bull., 29(1), 2006.Google ScholarGoogle Scholar
  52. A. Doan, R. Ramakrishnan, A. Y. Halevy. Mass collaboration systems on the World Wide Web. Comm. ACM, 2010.Google ScholarGoogle Scholar
  53. P. Domingos, D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. X. Dong, A. Y. Halevy, C. Yu. Data integration with uncertainty. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. S. Elbassuoni, M. Ramanath, R. Schenkel, M. Sydow, G. Weikum. Language-model-based ranking for queries on RDF-graphs. CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. H. Elmeleegy, J. Madhavan, A. Halevy. Harvesting relational tables from lists on the web. PVLDB, 2(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. F. L. et al. Introducing meta-services for biomedical information extraction. Genome Biology 9 Suppl. 2, 2008.Google ScholarGoogle Scholar
  58. O. Etzioni, M. Banko, M. J. Cafarella. Machine reading. AAAI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, A. Yates. Web-scale information extraction in KnowItAll. WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. O. Etzioni, M. J. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artif. Intell., 165(1), 2005.Google ScholarGoogle Scholar
  61. R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. H. Fang, C. Zhai. Probabilistic models for expert finding. ECIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. C. Fellbaum, editor. WordNet An Electronic Lexical Database. The MIT Press, Cambridge, MA ; London, 1998.Google ScholarGoogle Scholar
  64. M. Fisher, D. M. Gabbay, L. Vila (Eds.). Handbook of temporal reasoning in artificial intelligence. Elsevier, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. M. J. Franklin, A. Y. Halevy, D. Maier. A first tutorial on dataspaces. PVLDB, 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. D. Freitag, A. McCallum. Information extraction with HMM structures learned by stochastic optimization. AAAI/IAAI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. L. Getoor, B. E. Taskar (Eds.). An Introduction to Statistical Relational Learning. MIT Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. D. Gildea, D. Jurafsky. Automatic labeling of semantic roles. Comput. Linguist., 28(3), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. R. Girju, D. Moldovan. Text mining for causal relations. FLAIRS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. G. Gottlob, C. Koch, R. Baumgartner, M. Herzog, S. Flesca. The Lixto data extraction project - back and forth between theory and practice. PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. A. Halevy, M. Franklin, D. Maier. Principles of dataspace systems. PODS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. O. Hartig, C. Bizer, J.-C. Freytag. Executing SPARQL queries over the web of linked data. ISWC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. COLING, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. D. Hindle. Noun classification from predicate-argument structures. ACL, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. V. Hristidis, H. Hwang, Y. Papakonstantinou. Authority-based keyword search in databases. ACM Trans. Database Syst., 33(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. P. G. Ipeirotis, E. Agichtein, P. Jain, L. Gravano. Towards a query optimizer for text-centric tasks. ACM Trans. Database Syst., 32(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. L. Iwanska, N. Mata, K. Kruger. Fully automatic acquisition of taxonomic knowledge from large corpora of texts: Limited syntax knowledge representation system based on natural language. ISMIS, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. A. Jain, P. G. Ipeirotis. A quality-aware optimizer for information extraction. ACM Trans. Database Syst., 34(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. A. Jain, P. G. Ipeirotis, A. Doan, L. Gravano. Join optimization of information extraction output: Quality matters! ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. D. Jurafsky, J. H. Martin. Speech and Language Processing (2nd Edition). Prentice Hall, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, G. Weikum. NAGA: Searching and ranking knowledge. ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. P. Kingsbury, M. Palmer. From Treebank to Propbank. LREC, 2002.Google ScholarGoogle Scholar
  83. N. Koudas, S. Sarawagi, D. Srivastava. Record linkage: similarity measures and algorithms. SIGMOD, 2006. riptsizehttp://queens.db.toronto.edu/ koudas/docs/aj.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan, H. Zhu. SystemT: a system for declarative information extraction. SIGMOD Record, 37(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artif. Intell., 118(1-2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. N. Kushmerick, D. S. Weld, R. Doorenbos. Wrapper induction for information extraction. IJCAI, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. D. B. Lenat. CYC: a large-scale investment in knowledge infrastructure. Commun. ACM, 38(11), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. D. B. Lenat, R. V. Guha. Building Large Knowledge-Based Systems; Representation and Inference in the CYC Project. Addison-Wesley Longman Publishing Co., Inc., 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. A. Maedche, S. Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. A. Maedche, S. Staab. Measuring similarity between ontologies. EKAW, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. C. Manning, H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. J. D. Martin. Fast and furious text mining. IEEE Data Eng. Bull., 28(4), 2005.Google ScholarGoogle Scholar
  93. E. Michelakis, R. Krishnamurthy, P. J. Haas, S. Vaithyanathan. Uncertainty management in rule-based information extraction systems. SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Y. Miyao, T. Ohta, K. Masuda, Y. Tsuruoka, K. Yoshida, T. Ninomiya, J. Tsujii. Semantic retrieval for the accurate identification of relational concepts in massive textbases. COLING-ACL, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. A. Moschitti, D. Pighin, R. Basili. Tree kernels for semantic role labeling. Comput. Linguist., 34(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. S. Narayanan, C. F. Baker, C. J. Fillmore, M. R. L. Petruck. FrameNet meets the semantic web: Lexical semantics for the web. ISWC, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. F. Naumann, M. Herschel. An Introduction to Duplicate Detection. Morgan & Claypool, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Z. Nie, Y. Ma, S. Shi, J.-R. Wen, W.-Y. Ma. Web object retrieval. WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. P. Palaga, L. Nguyen, U. Leser, J. Hakenberg. High-performance information extraction with AliBaba. EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. M. Pasca. Towards temporal web search. SAC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. D. Petkova, W. B. Croft. Hierarchical language models for expert finding in enterprise corpora. ICTAI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. S. P. Ponzetto, R. Navigli. Large-scale taxonomy mapping for restructuring and integrating Wikipedia. IJCAI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. S. P. Ponzetto, M. Strube. Deriving a large-scale taxonomy from Wikipedia. AAAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. S. P. Ponzetto, M. Strube. WikiTaxonomy: A large scale knowledge resource. ECAI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. H. Poon, P. Domingos, M. Sumner. A general method for reducing the complexity of relational inference and its application to MCMC. AAAI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. A. Pugliese, O. Udrea, V. S. Subrahmanian. Scaling RDF with time. WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu, S. Vaithyanathan. An algebraic approach to rule-based information extraction. ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. P. Resnik, E. Hardisty. Gibbs sampling for the uninitiated. Technical report, UMIACS, 2009.Google ScholarGoogle Scholar
  110. M. Richardson and P. Domingos. Markov Logic Networks. Machine Learning, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. D. Roth, W. Yih. Global Inference for Entity and Relation Identification via a Linear Programming Formulation. MIT Press, 2007.Google ScholarGoogle Scholar
  112. A. Sahuguet, F. Azavant. Building intelligent web applications using lightweight wrappers. Data Knowl. Eng., 36(3), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. S. Sarawagi. Information extraction. Foundations and Trends in Databases, 1(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. S. Sarawagi, W. W. Cohen. Semi-Markov conditional random fields for information extraction. NIPS, 2004.Google ScholarGoogle Scholar
  115. A. D. Sarma, M. Theobald, J. Widom. LIVE: A lineage-supported versioned DBMS. SSDBM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. P. Serdyukov, H. Rode, D. Hiemstra. Modeling multi-step relevance propagation for expert finding. CIKM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. W. Shen, A. Doan, J. F. Naughton, R. Ramakrishnan. Declarative information extraction using Datalog with embedded extraction predicates. VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. A. Sheth, C. Ramakrishnan. Semantic (web) technology in action: Ontology driven information systems for search, integration and analysis. IEEE Data Eng. Bull., 26, 2003.Google ScholarGoogle Scholar
  119. P. Singla, P. Domingos. Entity resolution with Markov Logic. ICDM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. S. Staab, R. Studer (Eds.). Handbook on Ontologies (2nd Edition). Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. F. M. Suchanek. Automated Construction and Growth of a Large Ontology. PhD thesis, Saarland University, 2008.Google ScholarGoogle Scholar
  122. F. M. Suchanek, G. Ifrim, G. Weikum. Combining linguistic and statistical analysis to extract relations from web documents. KDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. F. M. Suchanek, G. Kasneci, G. Weikum. YAGO: a core of semantic knowledge. WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. F. M. Suchanek, G. Kasneci, G. Weikum. YAGO: A large ontology from Wikipedia and WordNet. J. Web Sem., 6(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. F. M. Suchanek, M. Sozio, G. Weikum. SOFIE: a self-organizing framework for information extraction. WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. C. Sutton, A. McCallum. Introduction to Conditional Random Fields for Relational Learning. MIT Press, 2006.Google ScholarGoogle Scholar
  127. J. Tappolet, A. Bernstein. Applied temporal RDF: Efficient temporal querying of RDF data with SPARQL. ESWC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. B. E. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, J. Sperling. NewsStand: a new view on news. GIS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. M. Theobald, R. Schenkel, G. Weikum. Exploiting structure, annotation, and ontological knowledge for automatic classification of XML data. WebDB, 2003.Google ScholarGoogle Scholar
  130. G. Tummarello. SIG.MA: Live views on the web of data. WWW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. O. Udrea, L. Getoor, R. J. Miller. Leveraging data and structure in ontology integration. SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. D. Vallet, H. Zaragoza. Inferring the most important types of a query: a semantic approach. SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. M. Verhagen, I. Mani, R. Sauri, R. Knippen, S. B. Jang, J. Littman, A. Rumshisky, J. Phillips, J. Pustejovsky. Automating temporal annotation with TARSQI. ACL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. R. C. Wang, W. W. Cohen. Language-independent set expansion of named entities using the web. ICDM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. R. C. Wang, W. W. Cohen. Character-level analysis of semi-structured documents for set expansion. EMNLP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Y. Wang, M. Zhu, L. Qu, M. Spaniol, G. Weikum. Timely YAGO: Harvesting, querying, and visualizing temporal knowledge from Wikipedia. EDBT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. G. Weikum. Harvesting, searching, and ranking knowledge on the web. WSDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. D. S. Weld, R. Hoffmann, F. Wu. Using Wikipedia to bootstrap open information extraction. SIGMOD Record, 37(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. F. Wu, D. S. Weld. Autonomously semantifying Wikipedia. CIKM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. F. Wu, D. S. Weld. Automatically refining the Wikipedia infobox ontology. WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. F. Xu, H. Uszkoreit, H. Li. A seed-driven bottom-up machine learning framework for extracting relations of various complexity. ACL. 2007.Google ScholarGoogle Scholar
  142. A. Yates, M. Banko, M. Broadhead, M. J. Cafarella, O. Etzioni, S. Soderland. TextRunner: Open information extraction on the web. HLT-NAACL, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. Q. Zhang, F. M. Suchanek, L. Yue, G. Weikum. TOB: Timely ontologies for business relations. WebDB, 2008.Google ScholarGoogle Scholar
  144. J. Zhu, Z. Nie, X. Liu, B. Zhang, J.-R. Wen. StatSnowball: a statistical approach to extracting entity relationships. WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. From information to knowledge: harvesting entities and relationships from web sources

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        June 2010
        350 pages
        ISBN:9781450300339
        DOI:10.1145/1807085

        Copyright © 2010 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 June 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • tutorial

        Acceptance Rates

        Overall Acceptance Rate476of1,835submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!