ABSTRACT
Today's information age may be characterized by constant massive production and dissemination of written information. More powerful tools for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This need is our starting point for applying data mining techniques on unstructured information as present in text archives. The users will particularly benefit from cluster techniques that uncover similar documents and bring these similarities to the user's attention. In our approach to text mining we suggest relying on the utilization of self-organizing maps for the analysis of a document archive. The benefit of this approach is the intuitive visualization of document similarities thanks to the spatial ordering of the documents within the self-organizing map. We augment the basic capabilities of the neural network with a data description technique that, based on the features learned by the map, automatically selects the most descriptive features of the input patterns mapped onto a particular unit of the map, thus making the associations between the various clusters within the map explicit. We demonstrate the benefits of this approach by using a real-world document archive comprised of articles from Time magazine.
- Agrawal, R. and R. Srikant. 1995. "Mining sequential patterns." In Proceedings of the International Conference on Data Engineering. Taipei, Taiwan. Los Alamitos, CA: IEEE CS Press. pp. 3-14.]] Google Scholar
Digital Library
- Ahonen, H., O. Heinonen, M. Klemettinen, and A. I. Verkamo. 1997. "Mining the phrasal frontier." In Proceedings of the European Symposium on Principles of Data Mining and Knowledge Discovery. Berlin: Springer-Verlag. pp. 343-350.]] Google Scholar
Digital Library
- Feldman, R. and I. Dagan. 1995. "Knowledge discovery in textual databases (KDT)." In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Montreal. Menlo Park, CA: AAAI Press. pp. 112-117.]]Google Scholar
- Feldman, R. and H. Hirsh. 1996. "Mining associations in text in the presence of background knowledge." In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Portland, OR. Menlo Park, CA: AAAI Press. pp. 112-117.]]Google Scholar
- Feldman, R., W. Klösgen, Y. Ban-Yehuda, G. Kedar, and V. Reznikov. 1997. "Pattern based browsing in document collections." In Proceedings of the European Symposium on Principles of Data Mining and Knowledge Discovery. Berlin: Springer-Verlag. pp. 112-122.]] Google Scholar
Digital Library
- Hearst, M. A. 1999. "Untangling text data mining." In Proceedings of the Annual Meeting of the Association for Computational Linguistics. College Park, MD. pp. 3-10.]] Google Scholar
Digital Library
- Honkela, T., S. Kaski, K. Lagus, and T. Kohonen. 1997. "WEBSOM--self-organizing maps of document collections." In Proceedings of the Workshop on Self-Organizing Maps. Espoo, Finland. pp. 298-303.]]Google Scholar
- Kohonen, T. 1982. "Self-organized formation of topologically correct feature maps." Biol. Cybernet.43: 59-69.]]Google Scholar
Cross Ref
- Kohonen, T. 1995. In Self-organizing Maps. Berlin: Springer-Verlag.]] Google Scholar
Digital Library
- Kohonen, T. 1998. "Self-organization of very large document collections: state of the art." In Proceedings of the International Conference on Artificial Neural Networks, Skövde, Sweden. Berlin: Springer-Verlag. pp. 65-74.]]Google Scholar
- Lagus, K., T. Honkela, S. Kaski, and T. Kohonen. 1996. "Self-organizing maps of document collections: a new approach to interactive exploration." In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Portland, OR. Menlo Park, CA: AAAI Press. pp. 238-243.]]Google Scholar
- Lent, B., R. Agrawal, and R. Srikant. 1997. "Discovering trends in text databases." In Proceedings of the International Conference on Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press. pp. 227-230.]]Google Scholar
- Lin, X., D. Soergel, and G. Marchionini. 1991. "A self-organizing semantic map for information retrieval." In Proceedings of the ACM SIGIR International Conference on R&D in Information Retrieval, Chicago. New York: ACM Press. pp. 262-269.]] Google Scholar
Digital Library
- Merkl, D. 1995. "A connectionist view on document classification." In Proceedings of the Australasian Database Conference17(2): 153-161.]]Google Scholar
- Merkl, D. 1997a. "Exploration of document collections with self-organizing maps: a novel approach to similarity representation." In Proceedings of the European Symposium on Principles of Data Mining and Knowledge Discovery, Trondheim, Norway. Berlin: Springer-Verlag. pp. 101-111.]] Google Scholar
Digital Library
- Merkl, D. 1997b. "Exploration of text collections with hierarchical feature maps." In Proceedings of the International ACM SIGIR Conf on R&D in Information Retrieval, Philadelphia. New York: ACM Press. pp. 186-195.]] Google Scholar
Digital Library
- Merkl, D. 1998. "Text classification with self-organizing maps: some lessons learned." Neurocomput.21(1-3): 61-77.]]Google Scholar
Cross Ref
- Merkl, D. and A. Rauber. 1999a. "Self-organization of distributed document archives." In Proceedings of the International Database Engineering and Applications Symposium, Montreal, Canada. Los Alamitos, CA: IEEE CS Press. pp. 128-136.]] Google Scholar
Digital Library
- Merkl, D. and A. Rauber. 1999b. "Uncovering associations between documents." In Proceedings of the IJCAI'99 Workshop on Text Mining. Stockholm. pp. 89-98.]]Google Scholar
- Merkl, D. and A. Rauber. 2000. "Uncovering the hierarchical structure of text archives by using an unsupervised neural network with adaptive architecture." In Proceedings of the Pacific Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan. Berlin: Springer-Verlag. pp. 384-395.]] Google Scholar
Digital Library
- Rauber, A. and D. Merkl. 1999. "Automatic labeling of self-organizing maps: making a treasure-map reveal its secrets." In Proceedings of the Pacific Asia Conference on Knowledge Discovery and Data Mining, Beijing, China. Berlin: Springer-Verlag. pp. 228-237.]] Google Scholar
Digital Library
- Roussinov, D. and M. Ramsey. 1998. "Information forage through adaptive visualization." In Proceedings of the ACM International Conference on Digital Libraries, Pittsburgh. New York: ACM Press. pp. 303-304.]] Google Scholar
Digital Library
- Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley.]] Google Scholar
Digital Library
- Salton, G. and C. Buckley. 1988. "Term weighting approaches in automatic text retrieval." Inform. Process. Mgmnt24(5): 513-523.]] Google Scholar
Digital Library
- Swanson, D. R. and N. R. Smalheiser. 1997. "An interactive system for finding complementary literatures: a stimulus to scientific discovery." Artificial Intell.91: 183-203.]] Google Scholar
Digital Library
- Turtle, H. R. and W. B. Croft. 1992. "A comparison of text retrieval models." Comput. J.35(3): 297-290.]] Google Scholar
Digital Library
- Baeza-Yates. R. and B. Ribeiro-Neto. 1999. Modern Information Retrieval. Harlow, UK: Addison-Wesley. This book provides an in-depth coverage of contemporary research issues in information retrieval.]] Google Scholar
Digital Library
- Kohonen, T. 1997. Self-Organizing Maps, 2d ed. Berlin: Springer-Verlag. This is by far the most comprehensive review of self-organizing maps and their applications.]] Google Scholar
Digital Library
- Ripley, B. 1996. Pattern Recognition and Neural Networks. Cambridge, UK: Cambridge University Press. An excellent introduction to neural networks, which covers a wide spectrum of models and relates them to traditional pattern recognition methods.]] Google Scholar
Digital Library
- Sparck Jones, K. and P. Willett, 1997. Readings in Information Retrieval. San Francisco, CA: Morgan Kaufmann. A prime source for classic articles in the area of information retrieval.]] Google Scholar
Digital Library
Index Terms
- Industry: text mining with self-organizing maps
Recommendations
Industry: database marketing and web mining
Handbook of data mining and knowledge discoveryThe four customer-related key disciplines in marketing are attraction, retention, cross-sales, and departure. The same holds for database marketing and its electronic commerce equivalent in Web mining. The case study that is presented tackles the ...
Industry: predicting daily stock indices movements from financial news
Handbook of data mining and knowledge discoveryThe World Wide Web contains mostly unstructured textual information. Hence, with the growth of the Internet and the World Wide Web, the need for technology to analyze and mine textual information automatically is becoming increasingly important. We ...
Mining massive document collections by the WEBSOM method
Special issue: Soft computing data miningA viable alternative to the traditional text-mining methods is the WEBSOM, a software system based on the Self-Organizing Map (SOM) principle. Prior to the searching or browsing operations, this method orders a collection of textual items, say, ...




Comments