10.5555/2491748.2491752acmotherconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedings
research-article

Towards a fair comparison between name disambiguation approaches

ABSTRACT

Searching for information about people in search engines is a common and straightforward task that is often hampered by name ambiguities. While users are interested in information about a single person, results pages usually comprise many persons with the same name. There are several approaches to tackle personal name disambiguation; however, it is still a challenge to understand the impact of each approach alone. In this paper, we present a plugin-based framework that aims to compare and to identify the most promising approaches for name disambiguation. This framework enabled us to merge different approaches to find good combinations for this task and to compare state-of-the-art solutions using a common dataset. Preliminary results support the greater impact of biographical information to aid in clustering, the use of comprehensive texts instead of only metadata and TF-IDF instead of more complex approaches.

References

  1. Artiles, J., Gonzalo, J., Sekine, S. (2009). Weps 2 evaluation campaign: overview of the web people search clustering task. In WePS 2 Evaluation Workshop. WWW Conference 2009.Google ScholarGoogle Scholar
  2. Bagga, A. and Baldwin, B. (1998). Entity-based cross-document coreferencing using the vector space model. Proc. of the international conference on Computational linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chen, Y. and Martin, J. (2007). Cu-comsem: exploring rich features for unsupervised web personal name disambiguation. In proc. of SemEval '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chen, Y, Lee, Sym, and Huang, C.-R. (2009). Polyuhk: A robust information extraction system for web personal names. In 2nd WePS, 18th WWW Conference.Google ScholarGoogle Scholar
  6. Cohen, W., Ravikumar, P., and S., F. (2003). A comparison of string metrics for matching names and records. Proc. of the IJCAI Workshop on Information Integration on the Web.Google ScholarGoogle Scholar
  7. Hofmann, T. (1999). Probabilistic latent semantic indexing. In SIGIR '99: Proceedings of the ACM SIGIR conference on Research and development in information retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ikeda, M., Ono, S., Sato, I., Yoshida, M., and Nagawaka, H. (2009). Person name disambiguation on the web by twostage clustering. In 2nd WePS, 18th WWW Conference.Google ScholarGoogle Scholar
  9. Landauer, T. K., Foltz, P. W., and Laham, D. (1998). An introduction to latent semantic analysis. In Discourse processes, pages 259--284.Google ScholarGoogle Scholar
  10. Mann, G. S. and Yarowsky, D. (2003). Unsupervised personal name disambiguation. In Proc. of conference on Natural language learning at HLT-NAACL, NJ, USA. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nuray-Turan, R., Kalashnikov, D. V., and Mehrotra, S. (2012). Exploiting web querying for web people search. ACM Trans. Datab. Syst. 37, 1, 41 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Pedersen, T., Purandare, A., and Kulkarni, A. (2005). Name discrimination by clustering similar contexts. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 220--231. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ratinov, L. and Roth, D. (2009). Design Challenges and Misconceptions in Named Entity Recognition. In CoNLL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Schmid, H.(1994). Probabilistic part-of-speech tagging using decision trees. In Proc. of the International Conference on New Methods in Language Processing, 1994.Google ScholarGoogle Scholar
  15. Song, Y., Huang, J., Councill, I., Li, J., and Giles, C. (2007). Efficient topic-based unsupervised name disambiguation. Proc. of ACM/IEEE-CS joint conference on Digital libraries. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tang, J. et al. (2012) A unified probabilistic framework for name disambiguation in digital library. Knowledge and Data Engineering, IEEE Transactions on 24.6: 975--987. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards a fair comparison between name disambiguation approaches

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Article Metrics

          • Downloads (Last 12 months)3
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!