ABSTRACT
Multidimensional Scaling Algorithms (MDS) allow us to visualize high dimensional object relationships in an intuitive way. An interesting application of the MDS algorithms is the visualization of the semantic relations among documents or terms in textual databases.
However, the MDS algorithms proposed in the literature exhibit a low discriminant power. The unsupervised nature of the algorithms and the 'curse of dimensionality' favor the overlapping among different topics in the map. This problem can be overcome considering that many textual collections provide frequently a categorization for a small subset of documents.
In this paper we define new semi-supervised measures that reflect better the semantic classes of the textual collection considering the a priori categorization of a subset of documents. Next the dissimilarities are incorporated into the Torgerson MDS algorithm to improve the separation among topics in the map. The experimental results show that the model proposed outperforms well known unsupervised alternatives.
- Aggarwal, C.C., Gates, S.C., Yu, P.S.: On Using Partial Supervision for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 16(2), 245-255 (2004). Google Scholar
Digital Library
- Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional applications. In: Proc. of SIGMOD-PODS, vol. 1, pp. 13-18 (2001). Google Scholar
Digital Library
- Bartell, B.T., Cottrell, G.W., Belew, R.K.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: ACM SIGIR Conference, Copenhagen, Denmark, pp. 161-167 (1992). Google Scholar
Digital Library
- Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces and information retrieval. SIAM review 41(2), 335-362 (1999). Google Scholar
Digital Library
- Bock, H.H.: Simultaneous visualization and clustering methods as an alternative to Kohonen maps. In: Della Riccia, G., Kruse, R., Lenz, H.-J. (eds.) Learning, networks and statistics, CISM Courses and Lectures no. 382, pp. 67-85. Springer, Wien - New York (1997).Google Scholar
- Buja, A., Logan, B., Reeds, F., Shepp, R.: Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics 22, 406-438 (1994).Google Scholar
Cross Ref
- Chang, H., Yeung, D.-Y., Cheung, W.K.: Relaxational Metric Adaptation and its Application to Semi-Supervised Clustering and Content-Based Image Retrieval. Pattern Recognition 39, 1905-1917 (2006). Google Scholar
Digital Library
- Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Conference on Neural Information Processing Systems (NIPS), vol. 15 (2003).Google Scholar
- Chen, H., Houston, A.L., Sewell, R.R., Schatz, B.R.: Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science (JASIS) 49(7), 582-603 (1998). Google Scholar
Digital Library
- Cox, T.F., Cox, M.A.A.: Multidimensional scaling, 2nd edn. Chapman & Hall/CRC.ition, USA (2001).Google Scholar
- Joachims, T.: Learning to Classify Text using Support Vector Machines. In: Methods, Theory and Algorithms, Kluwer Academic Publishers, Boston (2002). Google Scholar
Digital Library
- Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An introduction to cluster analysis, John Wiley & Sons, New York (1990).Google Scholar
- Kohonen, T.: Self-organizing maps, 2nd edn. Springer, Berlin (1995). Google Scholar
Digital Library
- Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data Using a Minimal Number of Queries. IEEE Transactions on Neural Networks 14(6), 1496-1505 (2003). Google Scholar
Digital Library
- Lebart, L., Salem, A., Berry, L.: Exploring Textual Data. Kluwer Academic Publishers, Netherlands (1998).Google Scholar
- Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999). Google Scholar
Digital Library
- Mao, J., Jain, A.K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks 6(2) (March 1995). Google Scholar
Digital Library
- Martín-Merino, M., Muñoz, M.: A New MDS Algorithm for Textual Data Analysis. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 860-867. Springer, Heidelberg (2004).Google Scholar
- Martín-Merino, M., Muñoz, A.: A New Sammon Algorithm for Sparse Data Visualization. Int. Conf. on Pattern Recognition 1, 477-481 (2004). Google Scholar
Digital Library
- Mladenié, D.: Turning Yahoo into an Automatic Web-Page Classifier. In: Proceedings of the 13th European Conference on Aritficial Intelligence, Brighton, pp. 473-474 (1998).Google Scholar
- Pedrycz, W., Vukovich, G.: Fuzzy Clustering with Supervision. Pattern Recognition 37, 1339-1349 (2004).Google Scholar
Cross Ref
- Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search, Austin, USA, July 2000, pp. 58-64 (2000).Google Scholar
- Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998).Google Scholar
- Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the 14th International Conference on Machine Learning, Nashville, Tennessee, USA, July 1997, pp. 412-420 (1997). Google Scholar
Digital Library
Recommendations
Semi-supervised metrics for textual data visualization
ICANN'07: Proceedings of the 17th international conference on Artificial neural networksMultidimensional Scaling algorithms (MDS) are useful tools that help to discover high dimensional object relationships. They have been applied to a wide range of practical problems and particularly to the visualization of the semantic relations among ...
A local semi-supervised Sammon algorithm for textual data visualization
Sammon's mapping is a powerful non-linear technique that allow us to visualize high dimensional object relationships. It has been applied to a broad range of practical problems and particularly to the visualization of the semantic relations among terms ...
Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme
ECML '00: Proceedings of the 11th European Conference on Machine LearningSupervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, in many text classification tasks, labeled training documents are expensive to obtain, while unlabeled documents are readily ...




Comments