ABSTRACT
Web text mining is a new issue in the knowledge discovery research field. It is aimed to help people discover knowledge from large quantities of semi-structured or unstructured text in the web. Several approaches, including some pure and hybrid information retrieval (IR) methods, have been proposed to tackle such an issue. Among these approaches, combining the Self-Organizing Map (SOM) method with the principles of the vectorspace model, appears to be a promising alternative for the traditional purely IR-based methods in this problem domain. In this paper, a novel SOM-based method using a Chinese corpus for web text mining is presented. The SOM is used to generate two maps, namely the word cluster map and the document cluster map, which reveal the relationships among words and documents respectively. The search process incorporates these two maps and effectively finds the relevant documents according to the keywords specified in the query. The conceptually associated web documents are found not only by the specific keywords but the relevant words found by the word cluster map.
- 1.Allan, J., Carbonell, J., Doddington, G., Yamron, J. and Yang, Y. Topic Detection and Tracking Pilot Study: Final Report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, (1995), 194-21 g.Google Scholar
- 2.Dagan, I., Feldman, R. and Hirsh, H. Keyword-Based Browsing and Analysis of Large Document Sets. In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), (Las Vegas, NV, 1996).Google Scholar
- 3.Deerwester, S., Dumais, S. Fumas, G. and Landauer, K. Indexing by Latent Semantic Analysis. In Journal of the American Society for Information Science 40, 6, 1990, 391-407.Google Scholar
- 4.Feldman, R. and Dagan, I. KDT- Knowledge Discovery in Texts. In Proceedings of the First Annual Conference on Knowledge Discovery and Data Mining (KDD), (Montreal, 1995).Google Scholar
- 5.Feldman, R., Klosgen, W. and Zilberstein, A. Visualization Techniques to Explore Data Mining Results for Document Collections. In Proceedings of the Third Annual Conference on Knowledge Discovery and Data Mining (KDD), (Newport Beach, 1997).Google Scholar
- 6.Feldman, R., Dagan, I. and Hirsh, H. Mining Text Using Keyword Distributions. Journal of Intelligent Information Systems, Vol. 10, (1998), 281-300. Google Scholar
Digital Library
- 7.Honkela, T., Kaski, S., Lagus, K., and Kohonen, T. Newsgroup Exploration with WEBSOM Method and Browsing interface. Technical Report A32, (1996). Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland.Google Scholar
- 8.Kaski, S., Honkela, T., Lagus, K., and Kohonen, T. WEBSOM--Self-Organizing Maps of Document Collections. Neurocomputing, Vol. 21, (1998), 101- 117.Google Scholar
Cross Ref
- 9.Kohonen, T. Self-Organizing Formation of Topologically Correct Feature Maps. Biological Cybernetics, Vo{. 43, (1982), 59-69.Google Scholar
- 10.Kohonen, T. Self-Organizing Maps, Springer Verlag, Berlin, 1995. Google Scholar
Digital Library
- 11.Kohonen, T. Self-Organization of Very Large Document Collections: State of the Art. In Niklasson, L., Boden, M., and Ziemke, T., editors, Proceedings of ICANN98, the 8th International Conference on Artificial Neural Networks, Vol. 1, (London , t998), 65-74. Springer.Google Scholar
- 12.Ritter, H. and Kohonen, T. Self-Organizing Semantic Maps. Biological Cybernetics, Vol. 61, (1989), 241- 254.Google Scholar
Digital Library
- 13.Salton, G. and McGill, M. J. Introduction to Modem Information Retrieval. McGraw-Hill, New York, 1983. Google Scholar
Digital Library
Index Terms
- A Web text mining approach based on self-organizing map
Recommendations
Non-segmented Document Clustering Using Self-Organizing Map and Frequent Max Substring Technique
ICONIP '09: Proceedings of the 16th International Conference on Neural Information Processing: Part IIThis paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring mining technique to improve the efficiency of information retrieval. The proposed technique appears to be a promising alternative ...
Conformal self-organizing map on curved seamless surface
This paper presents a new mapping to construct the self-organizing map on the curved seamless surface. This mapping is developed for the planar triangle surface derived from the conformal self-organizing map [C.-Y. Liou, Y.-T. Kuo, Conformal self-...
LSISOM – A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections
The Self Organizing Map (SOM) algorithm has been utilized, with much success, in a variety of applications for the automatic organization of full-text document collections. A great advantage of the SOM method is that document collections can be ordered ...





Comments