skip to main content
10.5555/778212.778338guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

Industry: text mining with self-organizing maps

Authors Info & Claims
Published:01 January 2002Publication History

ABSTRACT

Today's information age may be characterized by constant massive production and dissemination of written information. More powerful tools for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This need is our starting point for applying data mining techniques on unstructured information as present in text archives. The users will particularly benefit from cluster techniques that uncover similar documents and bring these similarities to the user's attention. In our approach to text mining we suggest relying on the utilization of self-organizing maps for the analysis of a document archive. The benefit of this approach is the intuitive visualization of document similarities thanks to the spatial ordering of the documents within the self-organizing map. We augment the basic capabilities of the neural network with a data description technique that, based on the features learned by the map, automatically selects the most descriptive features of the input patterns mapped onto a particular unit of the map, thus making the associations between the various clusters within the map explicit. We demonstrate the benefits of this approach by using a real-world document archive comprised of articles from Time magazine.

References

  1. Agrawal, R. and R. Srikant. 1995. "Mining sequential patterns." In Proceedings of the International Conference on Data Engineering. Taipei, Taiwan. Los Alamitos, CA: IEEE CS Press. pp. 3-14.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ahonen, H., O. Heinonen, M. Klemettinen, and A. I. Verkamo. 1997. "Mining the phrasal frontier." In Proceedings of the European Symposium on Principles of Data Mining and Knowledge Discovery. Berlin: Springer-Verlag. pp. 343-350.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Feldman, R. and I. Dagan. 1995. "Knowledge discovery in textual databases (KDT)." In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Montreal. Menlo Park, CA: AAAI Press. pp. 112-117.]]Google ScholarGoogle Scholar
  4. Feldman, R. and H. Hirsh. 1996. "Mining associations in text in the presence of background knowledge." In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Portland, OR. Menlo Park, CA: AAAI Press. pp. 112-117.]]Google ScholarGoogle Scholar
  5. Feldman, R., W. Klösgen, Y. Ban-Yehuda, G. Kedar, and V. Reznikov. 1997. "Pattern based browsing in document collections." In Proceedings of the European Symposium on Principles of Data Mining and Knowledge Discovery. Berlin: Springer-Verlag. pp. 112-122.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hearst, M. A. 1999. "Untangling text data mining." In Proceedings of the Annual Meeting of the Association for Computational Linguistics. College Park, MD. pp. 3-10.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Honkela, T., S. Kaski, K. Lagus, and T. Kohonen. 1997. "WEBSOM--self-organizing maps of document collections." In Proceedings of the Workshop on Self-Organizing Maps. Espoo, Finland. pp. 298-303.]]Google ScholarGoogle Scholar
  8. Kohonen, T. 1982. "Self-organized formation of topologically correct feature maps." Biol. Cybernet.43: 59-69.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. Kohonen, T. 1995. In Self-organizing Maps. Berlin: Springer-Verlag.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kohonen, T. 1998. "Self-organization of very large document collections: state of the art." In Proceedings of the International Conference on Artificial Neural Networks, Skövde, Sweden. Berlin: Springer-Verlag. pp. 65-74.]]Google ScholarGoogle Scholar
  11. Lagus, K., T. Honkela, S. Kaski, and T. Kohonen. 1996. "Self-organizing maps of document collections: a new approach to interactive exploration." In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Portland, OR. Menlo Park, CA: AAAI Press. pp. 238-243.]]Google ScholarGoogle Scholar
  12. Lent, B., R. Agrawal, and R. Srikant. 1997. "Discovering trends in text databases." In Proceedings of the International Conference on Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press. pp. 227-230.]]Google ScholarGoogle Scholar
  13. Lin, X., D. Soergel, and G. Marchionini. 1991. "A self-organizing semantic map for information retrieval." In Proceedings of the ACM SIGIR International Conference on R&D in Information Retrieval, Chicago. New York: ACM Press. pp. 262-269.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Merkl, D. 1995. "A connectionist view on document classification." In Proceedings of the Australasian Database Conference17(2): 153-161.]]Google ScholarGoogle Scholar
  15. Merkl, D. 1997a. "Exploration of document collections with self-organizing maps: a novel approach to similarity representation." In Proceedings of the European Symposium on Principles of Data Mining and Knowledge Discovery, Trondheim, Norway. Berlin: Springer-Verlag. pp. 101-111.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Merkl, D. 1997b. "Exploration of text collections with hierarchical feature maps." In Proceedings of the International ACM SIGIR Conf on R&D in Information Retrieval, Philadelphia. New York: ACM Press. pp. 186-195.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Merkl, D. 1998. "Text classification with self-organizing maps: some lessons learned." Neurocomput.21(1-3): 61-77.]]Google ScholarGoogle ScholarCross RefCross Ref
  18. Merkl, D. and A. Rauber. 1999a. "Self-organization of distributed document archives." In Proceedings of the International Database Engineering and Applications Symposium, Montreal, Canada. Los Alamitos, CA: IEEE CS Press. pp. 128-136.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Merkl, D. and A. Rauber. 1999b. "Uncovering associations between documents." In Proceedings of the IJCAI'99 Workshop on Text Mining. Stockholm. pp. 89-98.]]Google ScholarGoogle Scholar
  20. Merkl, D. and A. Rauber. 2000. "Uncovering the hierarchical structure of text archives by using an unsupervised neural network with adaptive architecture." In Proceedings of the Pacific Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan. Berlin: Springer-Verlag. pp. 384-395.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rauber, A. and D. Merkl. 1999. "Automatic labeling of self-organizing maps: making a treasure-map reveal its secrets." In Proceedings of the Pacific Asia Conference on Knowledge Discovery and Data Mining, Beijing, China. Berlin: Springer-Verlag. pp. 228-237.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Roussinov, D. and M. Ramsey. 1998. "Information forage through adaptive visualization." In Proceedings of the ACM International Conference on Digital Libraries, Pittsburgh. New York: ACM Press. pp. 303-304.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Salton, G. and C. Buckley. 1988. "Term weighting approaches in automatic text retrieval." Inform. Process. Mgmnt24(5): 513-523.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Swanson, D. R. and N. R. Smalheiser. 1997. "An interactive system for finding complementary literatures: a stimulus to scientific discovery." Artificial Intell.91: 183-203.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Turtle, H. R. and W. B. Croft. 1992. "A comparison of text retrieval models." Comput. J.35(3): 297-290.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Baeza-Yates. R. and B. Ribeiro-Neto. 1999. Modern Information Retrieval. Harlow, UK: Addison-Wesley. This book provides an in-depth coverage of contemporary research issues in information retrieval.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kohonen, T. 1997. Self-Organizing Maps, 2d ed. Berlin: Springer-Verlag. This is by far the most comprehensive review of self-organizing maps and their applications.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ripley, B. 1996. Pattern Recognition and Neural Networks. Cambridge, UK: Cambridge University Press. An excellent introduction to neural networks, which covers a wide spectrum of models and relates them to traditional pattern recognition methods.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sparck Jones, K. and P. Willett, 1997. Readings in Information Retrieval. San Francisco, CA: Morgan Kaufmann. A prime source for classic articles in the area of information retrieval.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Industry: text mining with self-organizing maps

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image Guide books
          Handbook of data mining and knowledge discovery
          January 2002
          1025 pages
          ISBN:0195118316
          • Editors:
          • Willi Klösgen,
          • Jan M. Zytkow

          Publisher

          Oxford University Press, Inc.

          United States

          Publication History

          • Published: 1 January 2002

          Qualifiers

          • chapter