skip to main content
10.1145/319759.319789acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article
Free Access

A Web text mining approach based on self-organizing map

Authors Info & Claims
Published:01 November 1999Publication History

ABSTRACT

Web text mining is a new issue in the knowledge discovery research field. It is aimed to help people discover knowledge from large quantities of semi-structured or unstructured text in the web. Several approaches, including some pure and hybrid information retrieval (IR) methods, have been proposed to tackle such an issue. Among these approaches, combining the Self-Organizing Map (SOM) method with the principles of the vectorspace model, appears to be a promising alternative for the traditional purely IR-based methods in this problem domain. In this paper, a novel SOM-based method using a Chinese corpus for web text mining is presented. The SOM is used to generate two maps, namely the word cluster map and the document cluster map, which reveal the relationships among words and documents respectively. The search process incorporates these two maps and effectively finds the relevant documents according to the keywords specified in the query. The conceptually associated web documents are found not only by the specific keywords but the relevant words found by the word cluster map.

References

  1. 1.Allan, J., Carbonell, J., Doddington, G., Yamron, J. and Yang, Y. Topic Detection and Tracking Pilot Study: Final Report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, (1995), 194-21 g.Google ScholarGoogle Scholar
  2. 2.Dagan, I., Feldman, R. and Hirsh, H. Keyword-Based Browsing and Analysis of Large Document Sets. In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), (Las Vegas, NV, 1996).Google ScholarGoogle Scholar
  3. 3.Deerwester, S., Dumais, S. Fumas, G. and Landauer, K. Indexing by Latent Semantic Analysis. In Journal of the American Society for Information Science 40, 6, 1990, 391-407.Google ScholarGoogle Scholar
  4. 4.Feldman, R. and Dagan, I. KDT- Knowledge Discovery in Texts. In Proceedings of the First Annual Conference on Knowledge Discovery and Data Mining (KDD), (Montreal, 1995).Google ScholarGoogle Scholar
  5. 5.Feldman, R., Klosgen, W. and Zilberstein, A. Visualization Techniques to Explore Data Mining Results for Document Collections. In Proceedings of the Third Annual Conference on Knowledge Discovery and Data Mining (KDD), (Newport Beach, 1997).Google ScholarGoogle Scholar
  6. 6.Feldman, R., Dagan, I. and Hirsh, H. Mining Text Using Keyword Distributions. Journal of Intelligent Information Systems, Vol. 10, (1998), 281-300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Honkela, T., Kaski, S., Lagus, K., and Kohonen, T. Newsgroup Exploration with WEBSOM Method and Browsing interface. Technical Report A32, (1996). Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland.Google ScholarGoogle Scholar
  8. 8.Kaski, S., Honkela, T., Lagus, K., and Kohonen, T. WEBSOM--Self-Organizing Maps of Document Collections. Neurocomputing, Vol. 21, (1998), 101- 117.Google ScholarGoogle ScholarCross RefCross Ref
  9. 9.Kohonen, T. Self-Organizing Formation of Topologically Correct Feature Maps. Biological Cybernetics, Vo{. 43, (1982), 59-69.Google ScholarGoogle Scholar
  10. 10.Kohonen, T. Self-Organizing Maps, Springer Verlag, Berlin, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Kohonen, T. Self-Organization of Very Large Document Collections: State of the Art. In Niklasson, L., Boden, M., and Ziemke, T., editors, Proceedings of ICANN98, the 8th International Conference on Artificial Neural Networks, Vol. 1, (London , t998), 65-74. Springer.Google ScholarGoogle Scholar
  12. 12.Ritter, H. and Kohonen, T. Self-Organizing Semantic Maps. Biological Cybernetics, Vol. 61, (1989), 241- 254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Salton, G. and McGill, M. J. Introduction to Modem Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Web text mining approach based on self-organizing map

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WIDM '99: Proceedings of the 2nd international workshop on Web information and data management
          November 1999
          76 pages
          ISBN:1581132212
          DOI:10.1145/319759

          Copyright © 1999 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 November 1999

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader