skip to main content
10.1145/2480362.2480388acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

A supervised machine learning classification algorithm for research articles

Published:18 March 2013Publication History

ABSTRACT

The issue of the automatic classification of research articles into one or more fields of science is of primary importance for scientific databases and digital libraries. A sophisticated classification strategy renders searching more effective and assists the users in locating similar relevant items. Although the most publishing services require from the authors to categorize their articles themselves, there are still cases where older documents remain unclassified, or the taxonomy changes over time. In this work we attempt to address this interesting problem by introducing a machine learning algorithm which combines several parameters and meta-data of a research article. In particular, our model exploits the training set to correlate keywords, authors, co-authorship, and publishing journals to a number of labels of the taxonomy. In the sequel, it applies this information to classify the rest of the documents. The experiments we have conducted with a large dataset comprised of about 1,5 million articles, demonstrate that in this specific application, our model outperforms the AdaBoost.MH and SVM methods.

References

  1. CiteSeerX. http://csxstatic.ist.psu.edu/about/data.Google ScholarGoogle Scholar
  2. S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. Third Text REtrieval Conference, Gaithersburg, USA, 1994.Google ScholarGoogle Scholar
  3. S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 Extension to Multiple Weighted Fields. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42--49, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Q. Lu, and L. Getoor Link-based Classification, Advanced Methods for Knowledge Discovery from Complex Data, 2005Google ScholarGoogle Scholar
  5. F. Sebastiani. Machine Learning in Automated Text Categorization ACM computing surveys (CSUR), 2002 vol. 34, issue 1, pp. 1--47 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Yang. An evaluation of statistical approaches to text categorization Information Retrieval, 1999 vol. 1, issue 1, pp. 69--90 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Joachims. Text categorization with support vector machines: Learning with many relevant features, Machine Learning: ECML 1998, pp. 137--142 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization, In Proceedings of the International Conference on Machine Learning (ICML'97), 1997, pp. 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Qi and B. D. Davidson. Web page classification: Features and algorithms, ACM Computing Surveys (CSUR), 2009, vol. 41, issue 2, pp. 1--31 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Bilgic, and G. M. Namata, and L. Getoor. Combining collective classification and link prediction, In Proceedings of Workshop on Mining Graphs and Complex Structures at the IEEE International Conference on Data Mining, 2007, pp. 381--386 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Getoor and C. P. Diehl. Link mining: a survey, ACM SIGKDD Explorations Newsletter, 2005, vol. 7, issue 2, pp. 3--12 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. A. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study, The Journal of Machine Learning Research, 2007, vol. 8, pp. 935--983 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Fuhr. A probabilistic model of dictionary-based automatic indexing. In Proceedings of RIAO-85, 1st International Conference "Recherche d'Information Assistee par Ordinateur", 1985, pp. 207--216.Google ScholarGoogle Scholar
  14. T. Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning (ICML'99), 1999, pp. 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhu, J. and Rosset, S. and Zou, H. and Hastie, T. Multi-class adaboost. In Ann Arbor, 2006 vol. 1001, issue 1, pp. 1612--1631.Google ScholarGoogle Scholar

Index Terms

  1. A supervised machine learning classification algorithm for research articles

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing
      March 2013
      2124 pages
      ISBN:9781450316569
      DOI:10.1145/2480362

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 March 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SAC '13 Paper Acceptance Rate255of1,063submissions,24%Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader