ABSTRACT
As opposed to representing a document as a "bag of words" in most information retrieval applications, we propose a model of representing a web page as sets of named entities of multiple types. Specifically, four types of named entities are extracted, namely person, geographic location, organization, and time. Moreover, the relations among these entities are also extracted, weighted, classified and marked by labels. On top of this model, some interesting applications are demonstrated. In particular, we introduce a notion of person-activity, which contains four different elements: person, location, time and activity. With this notion and based on a reasonably large set of web pages, we are able to show how one person's activities can be attributed by time and location, which gives a good idea of the mobility of the person under question.
- Conglei Yao, Nan Di. Technique Report: Mining the whole set of person names from the Chinese Web. http://net.pku.edu.cn/~ycl/wdtr.pdf.Google Scholar
- Yu, S., Cai, D., Wen, J.-R. and Ma, W.-Y., Improving Pseudo-Relevance Feedback in Web Information retrieval Using Web Page Segmentation, In Proceedings of WWW'03, pages 11--18. Google Scholar
Digital Library
- Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. Discovering relations among named entities from large corpora. In Proceedings of ACL'04, pages 415--422. Google Scholar
Digital Library
- Jinxiu Chen, Donghong Ji, Chew L. Tan, and Zhengyu Niu. Relation extraction using label propagation based semi-supervised learning. In Proceedings of ACL' 06, pages 129--136. Google Scholar
Digital Library
Index Terms
Representing a web page as sets of named entities of multiple types: a model and some preliminary applications
Recommendations
Named entity recognition and disambiguation using linked data and graph-based centrality scoring
SWIM '12: Proceedings of the 4th International Workshop on Semantic Web Information ManagementNamed Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link ...
Finite-state transducer cascades to extract named entities in texts
Implementation and application automataA lot of Named Entity Extraction Systems were created in English thanks to the impulse of MUC conferences. This article describes a Finite-State Transducer Cascade for the extraction of named entities in French journalistic texts. Finite-State Cascades ...
Annotation Scheme and Specification for Named Entities and Relations on Chinese Medical Knowledge Graph
Chinese Lexical SemanticsAbstractThe medical knowledge graph describes medical entities and relations in a structured form, which is one of the most important representations for integrating massive medical resources. It is widely used in intelligent question-answering, clinical ...





Comments