ABSTRACT
There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.
- E. Agichtein. Scaling information extraction to large document collections. IEEE Data Eng. Bull., 28(4), 2005.Google Scholar
- E. Agichtein, L. Gravano. Querying text databases for efficient information extraction. ICDE, 2003.Google Scholar
Cross Ref
- E. Agichtein, L. Gravano, J. Pavel, V. Sokolova, A. Voskoboynik. Snowball: a prototype system for extracting relations from large text collections. SIGMOD, 2001. Google Scholar
Digital Library
- Alchemy- Open-Source AI:riptsize alchemy.cs.washington.eduGoogle Scholar
- B. Aleman-Meza, C. Halaschek, A. Sheth, I. B. Arpinar, G. Sannapareddy. SWETO: Large-scale semantic web test-bed. SEKE: Workshop on Ontology in Action, 2004.Google Scholar
- F. Alkhateeb, J.-F. Baget, J. Euzenat. Extending SPARQL with regular expression patterns (for querying RDF). Web Semant., 7(2), 2009. Google Scholar
Digital Library
- O. Alonso, M. Gertz, R. A. Baeza-Yates. Clustering and exploring search results using timeline constructions. CIKM, 2009. Google Scholar
Digital Library
- K. Anyanwu, A. Maduko, A. Sheth. SPARQ2L: towards support for subgraph extraction queries in RDF databases. WWW, 2007. Google Scholar
Digital Library
- A. Arasu, H. Garcia-Molina. Extracting structured data from web pages. SIGMOD, 2003. Google Scholar
Digital Library
- A. Artale, E. Franconi. Foundations of temporal conceptual data models. Conceptual Modeling: Foundations and Applications, 2009. Google Scholar
Digital Library
- N. Ashish, C. A. Knoblock. Semi-automatic wrapper generation for internet information sources. COOPIS, 1997. Google Scholar
Digital Library
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives. DBpedia: A nucleus for a web of open data. ISWC, 2007. Google Scholar
Digital Library
- M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni. Open information extraction from the web. IJCAI, 2007. Google Scholar
Digital Library
- R. Baumgartner, S. Flesca, G. Gottlob. Visual web information extraction with Lixto. VLDB, 2001. Google Scholar
Digital Library
- O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, J. Widom. Swoosh: a generic approach to entity resolution. VLDB J., 18(1), 2009. Google Scholar
Digital Library
- O. Benjelloun, A. D. Sarma, A. Y. Halevy, M. Theobald, J. Widom. Databases with uncertainty and lineage. VLDB J., 17(2), 2008. Google Scholar
Digital Library
- K. Berberich, S. Bedathur, O. Alonso, G. Weikum. A language modeling approach for temporal information needs. ECIR, 2010. Google Scholar
Digital Library
- M. Berland, E. Charniak. Finding parts in very large corpora. ACL, 1999. Google Scholar
Digital Library
- T. Berners-Lee, J. Hendler, O. Lassila. The semantic web. Scientific American, 2001.Google Scholar
Cross Ref
- P. A. Bernstein, L. M. Haas. Information integration in the enterprise. Commun. ACM, 51(9), 2008. Google Scholar
Digital Library
- S. Blohm, P. Cimiano. Using the web to reduce data sparseness in pattern-based information extraction. PKDD, 2007.Google Scholar
Digital Library
- S. Brin. Extracting patterns and relations from the World Wide Web. WebDB, 1998. Google Scholar
Digital Library
- R. Bunescu, R. Mooney. Extracting relations from text: From word sequences to dependency paths. Text Mining & Natural Language Processing, 2007.Google Scholar
Cross Ref
- M. J. Cafarella. Extracting and querying a comprehensive web database. CIDR, 2009.Google Scholar
- M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, Y. Zhang. WebTables: exploring the power of tables on the web. PVLDB, 1(1), 2008. Google Scholar
Digital Library
- M. J. Cafarella, A. Y. Halevy, N. Khoussainova. Data integration for the relational web. PVLDB, 2(1):1090--1101, 2009. Google Scholar
Digital Library
- M. J. Cafarella, J. Madhavan, A. Y. Halevy. Web-scale extraction of structured data. SIGMOD Record, 37(4), 2008. Google Scholar
Digital Library
- M. E. Califf, R. J. Mooney. Relational learning of pattern-match rules for information extraction. AAAI, 1999. Google Scholar
Digital Library
- A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., T. M. Mitchell. Coupled semi-supervised learning for information extraction. WSDM, 2010. riptsizehttp://rtw.ml.cmu.edu/readtheweb.html Google Scholar
Digital Library
- J. Carme, M. Ceresna, O. Frölich, G. Gottlob, T. Hassan, M. Herzog, W. Holzinger, B. Krüpl. The Lixto project: exploring new frontiers of web data extraction. BNCOD, 2006. Google Scholar
Digital Library
- S. Chakrabarti. Dynamic personalized PageRank in entity-relation graphs. WWW, 2007. Google Scholar
Digital Library
- M.-W. Chang, L.-A. Ratinov, N. Rizzolo, D. Roth. Learning and inference with constraints. AAAI, 2008. Google Scholar
Digital Library
- S. Chaudhuri, V. Ganti, R. Motwani. Robust identification of fuzzy duplicates. ICDE, 2005. Google Scholar
Digital Library
- S. Chaudhuri, V. Ganti, D. Xin. Exploiting web search to generate synonyms for entities. WWW, 2009. Google Scholar
Digital Library
- S. Chaudhuri, V. Ganti, D. Xin. Mining document collections to facilitate accurate approximate entity matching. PVLDB, 2(1), 2009. Google Scholar
Digital Library
- F. Chen, A. Doan, J. Yang, R. Ramakrishnan. Efficient information extraction over evolving text data. ICDE, 2008. Google Scholar
Digital Library
- F. Chen, B. J. Gao, A. Doan, J. Yang, R. Ramakrishnan. Optimizing complex extraction programs over evolving text data. SIGMOD, 2009. Google Scholar
Digital Library
- T. Cheng, X. Yan, K. C.-C. Chang. EntityRank: searching entities directly and holistically. VLDB, 2007. Google Scholar
Digital Library
- E. Chu, A. Baid, T. Chen, A. Doan, J. F. Naughton. A relational approach to incrementally extracting and querying structure in unstructured data. VLDB, 2007. Google Scholar
Digital Library
- P. Cimiano, J. Völker. Text2Onto - a framework for ontology learning and data-driven change discovery. NLDB, 2005. Google Scholar
Digital Library
- W. W. Cohen. A century of progress on information integration: a mid-term report. WebDB, 2005.Google Scholar
- W. W. Cohen, P. Ravikumar, S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. IJCAI, 2003.Google Scholar
- V. Crescenzi, G. Mecca. Automatic information extraction from large websites. J. ACM, 51(5), 2004. Google Scholar
Digital Library
- V. Crescenzi, G. Mecca, P. Merialdo. RoadRunner: Towards automatic data extraction from large web sites. VLDB, 2001. Google Scholar
Digital Library
- S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. EMNLP-CoNLL, 2007.Google Scholar
- H. Cunningham. An Introduction to Information Extraction, Encyclopedia of Language and Linguistics (2nd Edition). Elsevier, 2005.Google Scholar
- N. N. Dalvi, C. Ré, D. Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7), 2009. Google Scholar
Digital Library
- P. DeRose, W. Shen, F. Chen, A. Doan, R. Ramakrishnan. Building structured web community portals: A top-down, compositional, and incremental approach. VLDB, 2007. Google Scholar
Digital Library
- P. DeRose, W. Shen, F. Chen, Y. Lee, D. Burdick, A. Doan, R. Ramakrishnan. DBLife: A community information management platform for the database research community. CIDR, 2007.Google Scholar
- A. Doan, L. Gravano, R. Ramakrishnan, S. Vaithyanathan. (Eds.). Special issue on information extraction. SIGMOD Record, 37(4), 2008. Google Scholar
Digital Library
- A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, W. Shen. Community information management. IEEE Data Eng. Bull., 29(1), 2006.Google Scholar
- A. Doan, R. Ramakrishnan, A. Y. Halevy. Mass collaboration systems on the World Wide Web. Comm. ACM, 2010.Google Scholar
- P. Domingos, D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, 2009. Google Scholar
Digital Library
- X. Dong, A. Y. Halevy, C. Yu. Data integration with uncertainty. VLDB, 2007. Google Scholar
Digital Library
- S. Elbassuoni, M. Ramanath, R. Schenkel, M. Sydow, G. Weikum. Language-model-based ranking for queries on RDF-graphs. CIKM, 2009. Google Scholar
Digital Library
- H. Elmeleegy, J. Madhavan, A. Halevy. Harvesting relational tables from lists on the web. PVLDB, 2(1), 2009. Google Scholar
Digital Library
- F. L. et al. Introducing meta-services for biomedical information extraction. Genome Biology 9 Suppl. 2, 2008.Google Scholar
- O. Etzioni, M. Banko, M. J. Cafarella. Machine reading. AAAI, 2006. Google Scholar
Digital Library
- O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, A. Yates. Web-scale information extraction in KnowItAll. WWW, 2004. Google Scholar
Digital Library
- O. Etzioni, M. J. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artif. Intell., 165(1), 2005.Google Scholar
- R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1), 2005. Google Scholar
Digital Library
- H. Fang, C. Zhai. Probabilistic models for expert finding. ECIR, 2007. Google Scholar
Digital Library
- C. Fellbaum, editor. WordNet An Electronic Lexical Database. The MIT Press, Cambridge, MA ; London, 1998.Google Scholar
- M. Fisher, D. M. Gabbay, L. Vila (Eds.). Handbook of temporal reasoning in artificial intelligence. Elsevier, 2005. Google Scholar
Digital Library
- M. J. Franklin, A. Y. Halevy, D. Maier. A first tutorial on dataspaces. PVLDB, 1(2), 2008. Google Scholar
Digital Library
- D. Freitag, A. McCallum. Information extraction with HMM structures learned by stochastic optimization. AAAI/IAAI, 2000. Google Scholar
Digital Library
- L. Getoor, B. E. Taskar (Eds.). An Introduction to Statistical Relational Learning. MIT Press, 2007. Google Scholar
Digital Library
- D. Gildea, D. Jurafsky. Automatic labeling of semantic roles. Comput. Linguist., 28(3), 2002. Google Scholar
Digital Library
- R. Girju, D. Moldovan. Text mining for causal relations. FLAIRS, 2002. Google Scholar
Digital Library
- G. Gottlob, C. Koch, R. Baumgartner, M. Herzog, S. Flesca. The Lixto data extraction project - back and forth between theory and practice. PODS, 2004. Google Scholar
Digital Library
- A. Halevy, M. Franklin, D. Maier. Principles of dataspace systems. PODS, 2006. Google Scholar
Digital Library
- O. Hartig, C. Bizer, J.-C. Freytag. Executing SPARQL queries over the web of linked data. ISWC, 2009. Google Scholar
Digital Library
- M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. COLING, 1992. Google Scholar
Digital Library
- D. Hindle. Noun classification from predicate-argument structures. ACL, 1990. Google Scholar
Digital Library
- V. Hristidis, H. Hwang, Y. Papakonstantinou. Authority-based keyword search in databases. ACM Trans. Database Syst., 33(1), 2008. Google Scholar
Digital Library
- P. G. Ipeirotis, E. Agichtein, P. Jain, L. Gravano. Towards a query optimizer for text-centric tasks. ACM Trans. Database Syst., 32(4), 2007. Google Scholar
Digital Library
- L. Iwanska, N. Mata, K. Kruger. Fully automatic acquisition of taxonomic knowledge from large corpora of texts: Limited syntax knowledge representation system based on natural language. ISMIS, 1999. Google Scholar
Digital Library
- A. Jain, P. G. Ipeirotis. A quality-aware optimizer for information extraction. ACM Trans. Database Syst., 34(1), 2009. Google Scholar
Digital Library
- A. Jain, P. G. Ipeirotis, A. Doan, L. Gravano. Join optimization of information extraction output: Quality matters! ICDE, 2009. Google Scholar
Digital Library
- D. Jurafsky, J. H. Martin. Speech and Language Processing (2nd Edition). Prentice Hall, 2008. Google Scholar
Digital Library
- G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, G. Weikum. NAGA: Searching and ranking knowledge. ICDE, 2008. Google Scholar
Digital Library
- P. Kingsbury, M. Palmer. From Treebank to Propbank. LREC, 2002.Google Scholar
- N. Koudas, S. Sarawagi, D. Srivastava. Record linkage: similarity measures and algorithms. SIGMOD, 2006. riptsizehttp://queens.db.toronto.edu/ koudas/docs/aj.pdf Google Scholar
Digital Library
- R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan, H. Zhu. SystemT: a system for declarative information extraction. SIGMOD Record, 37(4), 2008. Google Scholar
Digital Library
- N. Kushmerick. Wrapper induction: efficiency and expressiveness. Artif. Intell., 118(1-2), 2000. Google Scholar
Digital Library
- N. Kushmerick, D. S. Weld, R. Doorenbos. Wrapper induction for information extraction. IJCAI, 1997.Google Scholar
Digital Library
- D. B. Lenat. CYC: a large-scale investment in knowledge infrastructure. Commun. ACM, 38(11), 1995. Google Scholar
Digital Library
- D. B. Lenat, R. V. Guha. Building Large Knowledge-Based Systems; Representation and Inference in the CYC Project. Addison-Wesley Longman Publishing Co., Inc., 1989. Google Scholar
Digital Library
- A. Maedche, S. Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 2001. Google Scholar
Digital Library
- A. Maedche, S. Staab. Measuring similarity between ontologies. EKAW, 2002. Google Scholar
Digital Library
- C. Manning, H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google Scholar
Digital Library
- J. D. Martin. Fast and furious text mining. IEEE Data Eng. Bull., 28(4), 2005.Google Scholar
- E. Michelakis, R. Krishnamurthy, P. J. Haas, S. Vaithyanathan. Uncertainty management in rule-based information extraction systems. SIGMOD, 2009. Google Scholar
Digital Library
- Y. Miyao, T. Ohta, K. Masuda, Y. Tsuruoka, K. Yoshida, T. Ninomiya, J. Tsujii. Semantic retrieval for the accurate identification of relational concepts in massive textbases. COLING-ACL, 2006. Google Scholar
Digital Library
- A. Moschitti, D. Pighin, R. Basili. Tree kernels for semantic role labeling. Comput. Linguist., 34(2), 2008. Google Scholar
Digital Library
- S. Narayanan, C. F. Baker, C. J. Fillmore, M. R. L. Petruck. FrameNet meets the semantic web: Lexical semantics for the web. ISWC, 2003.Google Scholar
Digital Library
- F. Naumann, M. Herschel. An Introduction to Duplicate Detection. Morgan & Claypool, 2010. Google Scholar
Digital Library
- T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. PVLDB, 1(1), 2008. Google Scholar
Digital Library
- Z. Nie, Y. Ma, S. Shi, J.-R. Wen, W.-Y. Ma. Web object retrieval. WWW, 2007. Google Scholar
Digital Library
- P. Palaga, L. Nguyen, U. Leser, J. Hakenberg. High-performance information extraction with AliBaba. EDBT, 2009. Google Scholar
Digital Library
- M. Pasca. Towards temporal web search. SAC, 2008. Google Scholar
Digital Library
- D. Petkova, W. B. Croft. Hierarchical language models for expert finding in enterprise corpora. ICTAI, 2006. Google Scholar
Digital Library
- S. P. Ponzetto, R. Navigli. Large-scale taxonomy mapping for restructuring and integrating Wikipedia. IJCAI, 2009. Google Scholar
Digital Library
- S. P. Ponzetto, M. Strube. Deriving a large-scale taxonomy from Wikipedia. AAAI, 2007. Google Scholar
Digital Library
- S. P. Ponzetto, M. Strube. WikiTaxonomy: A large scale knowledge resource. ECAI, 2008. Google Scholar
Digital Library
- H. Poon, P. Domingos, M. Sumner. A general method for reducing the complexity of relational inference and its application to MCMC. AAAI, 2008. Google Scholar
Digital Library
- A. Pugliese, O. Udrea, V. S. Subrahmanian. Scaling RDF with time. WWW, 2008. Google Scholar
Digital Library
- F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu, S. Vaithyanathan. An algebraic approach to rule-based information extraction. ICDE, 2008. Google Scholar
Digital Library
- P. Resnik, E. Hardisty. Gibbs sampling for the uninitiated. Technical report, UMIACS, 2009.Google Scholar
- M. Richardson and P. Domingos. Markov Logic Networks. Machine Learning, 2006. Google Scholar
Digital Library
- D. Roth, W. Yih. Global Inference for Entity and Relation Identification via a Linear Programming Formulation. MIT Press, 2007.Google Scholar
- A. Sahuguet, F. Azavant. Building intelligent web applications using lightweight wrappers. Data Knowl. Eng., 36(3), 2001. Google Scholar
Digital Library
- S. Sarawagi. Information extraction. Foundations and Trends in Databases, 1(3), 2008. Google Scholar
Digital Library
- S. Sarawagi, W. W. Cohen. Semi-Markov conditional random fields for information extraction. NIPS, 2004.Google Scholar
- A. D. Sarma, M. Theobald, J. Widom. LIVE: A lineage-supported versioned DBMS. SSDBM, 2010. Google Scholar
Digital Library
- P. Serdyukov, H. Rode, D. Hiemstra. Modeling multi-step relevance propagation for expert finding. CIKM, 2008. Google Scholar
Digital Library
- W. Shen, A. Doan, J. F. Naughton, R. Ramakrishnan. Declarative information extraction using Datalog with embedded extraction predicates. VLDB, 2007. Google Scholar
Digital Library
- A. Sheth, C. Ramakrishnan. Semantic (web) technology in action: Ontology driven information systems for search, integration and analysis. IEEE Data Eng. Bull., 26, 2003.Google Scholar
- P. Singla, P. Domingos. Entity resolution with Markov Logic. ICDM, 2006. Google Scholar
Digital Library
- S. Staab, R. Studer (Eds.). Handbook on Ontologies (2nd Edition). Springer, 2009. Google Scholar
Digital Library
- F. M. Suchanek. Automated Construction and Growth of a Large Ontology. PhD thesis, Saarland University, 2008.Google Scholar
- F. M. Suchanek, G. Ifrim, G. Weikum. Combining linguistic and statistical analysis to extract relations from web documents. KDD, 2006. Google Scholar
Digital Library
- F. M. Suchanek, G. Kasneci, G. Weikum. YAGO: a core of semantic knowledge. WWW, 2007. Google Scholar
Digital Library
- F. M. Suchanek, G. Kasneci, G. Weikum. YAGO: A large ontology from Wikipedia and WordNet. J. Web Sem., 6(3), 2008. Google Scholar
Digital Library
- F. M. Suchanek, M. Sozio, G. Weikum. SOFIE: a self-organizing framework for information extraction. WWW, 2009. Google Scholar
Digital Library
- C. Sutton, A. McCallum. Introduction to Conditional Random Fields for Relational Learning. MIT Press, 2006.Google Scholar
- J. Tappolet, A. Bernstein. Applied temporal RDF: Efficient temporal querying of RDF data with SPARQL. ESWC, 2009. Google Scholar
Digital Library
- B. E. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, J. Sperling. NewsStand: a new view on news. GIS, 2008. Google Scholar
Digital Library
- M. Theobald, R. Schenkel, G. Weikum. Exploiting structure, annotation, and ontological knowledge for automatic classification of XML data. WebDB, 2003.Google Scholar
- G. Tummarello. SIG.MA: Live views on the web of data. WWW, 2010. Google Scholar
Digital Library
- O. Udrea, L. Getoor, R. J. Miller. Leveraging data and structure in ontology integration. SIGMOD, 2007. Google Scholar
Digital Library
- D. Vallet, H. Zaragoza. Inferring the most important types of a query: a semantic approach. SIGIR, 2008. Google Scholar
Digital Library
- M. Verhagen, I. Mani, R. Sauri, R. Knippen, S. B. Jang, J. Littman, A. Rumshisky, J. Phillips, J. Pustejovsky. Automating temporal annotation with TARSQI. ACL, 2005. Google Scholar
Digital Library
- R. C. Wang, W. W. Cohen. Language-independent set expansion of named entities using the web. ICDM, 2007. Google Scholar
Digital Library
- R. C. Wang, W. W. Cohen. Character-level analysis of semi-structured documents for set expansion. EMNLP, 2009. Google Scholar
Digital Library
- Y. Wang, M. Zhu, L. Qu, M. Spaniol, G. Weikum. Timely YAGO: Harvesting, querying, and visualizing temporal knowledge from Wikipedia. EDBT, 2010. Google Scholar
Digital Library
- G. Weikum. Harvesting, searching, and ranking knowledge on the web. WSDM, 2009. Google Scholar
Digital Library
- D. S. Weld, R. Hoffmann, F. Wu. Using Wikipedia to bootstrap open information extraction. SIGMOD Record, 37(4), 2008. Google Scholar
Digital Library
- F. Wu, D. S. Weld. Autonomously semantifying Wikipedia. CIKM, 2007. Google Scholar
Digital Library
- F. Wu, D. S. Weld. Automatically refining the Wikipedia infobox ontology. WWW, 2008. Google Scholar
Digital Library
- F. Xu, H. Uszkoreit, H. Li. A seed-driven bottom-up machine learning framework for extracting relations of various complexity. ACL. 2007.Google Scholar
- A. Yates, M. Banko, M. Broadhead, M. J. Cafarella, O. Etzioni, S. Soderland. TextRunner: Open information extraction on the web. HLT-NAACL, 2007. Google Scholar
Digital Library
- Q. Zhang, F. M. Suchanek, L. Yue, G. Weikum. TOB: Timely ontologies for business relations. WebDB, 2008.Google Scholar
- J. Zhu, Z. Nie, X. Liu, B. Zhang, J.-R. Wen. StatSnowball: a statistical approach to extracting entity relationships. WWW, 2009. Google Scholar
Digital Library
Index Terms
From information to knowledge: harvesting entities and relationships from web sources
Recommendations
Entity query feature expansion using knowledge base links
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalRecent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the ...
Using MEDLINE as a Knowledge Source for Disambiguating Abbreviations in Full-Text Biomedical Journal Articles
CBMS '04: Proceedings of the 17th IEEE Symposium on Computer-Based Medical SystemsBiomedical abbreviations and acronyms are widely used in biomedical literature. Since many abbreviations represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of biomedical ...
Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia
EDBT '10: Proceedings of the 13th International Conference on Extending Database TechnologyRecent progress in information extraction has shown how to automatically build large ontologies from high-quality sources like Wikipedia. But knowledge evolves over time; facts have associated validity intervals. Therefore, ontologies should include ...






Comments