skip to main content
10.5555/1771622.1771654guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A partially supervised metric multidimensional scaling algorithm for textual data visualization

Published:06 September 2007Publication History

ABSTRACT

Multidimensional Scaling Algorithms (MDS) allow us to visualize high dimensional object relationships in an intuitive way. An interesting application of the MDS algorithms is the visualization of the semantic relations among documents or terms in textual databases.

However, the MDS algorithms proposed in the literature exhibit a low discriminant power. The unsupervised nature of the algorithms and the 'curse of dimensionality' favor the overlapping among different topics in the map. This problem can be overcome considering that many textual collections provide frequently a categorization for a small subset of documents.

In this paper we define new semi-supervised measures that reflect better the semantic classes of the textual collection considering the a priori categorization of a subset of documents. Next the dissimilarities are incorporated into the Torgerson MDS algorithm to improve the separation among topics in the map. The experimental results show that the model proposed outperforms well known unsupervised alternatives.

References

  1. Aggarwal, C.C., Gates, S.C., Yu, P.S.: On Using Partial Supervision for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 16(2), 245-255 (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional applications. In: Proc. of SIGMOD-PODS, vol. 1, pp. 13-18 (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bartell, B.T., Cottrell, G.W., Belew, R.K.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: ACM SIGIR Conference, Copenhagen, Denmark, pp. 161-167 (1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces and information retrieval. SIAM review 41(2), 335-362 (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bock, H.H.: Simultaneous visualization and clustering methods as an alternative to Kohonen maps. In: Della Riccia, G., Kruse, R., Lenz, H.-J. (eds.) Learning, networks and statistics, CISM Courses and Lectures no. 382, pp. 67-85. Springer, Wien - New York (1997).Google ScholarGoogle Scholar
  6. Buja, A., Logan, B., Reeds, F., Shepp, R.: Inequalities and positive default functions arising from a problem in multidimensional scaling. Annals of Statistics 22, 406-438 (1994).Google ScholarGoogle ScholarCross RefCross Ref
  7. Chang, H., Yeung, D.-Y., Cheung, W.K.: Relaxational Metric Adaptation and its Application to Semi-Supervised Clustering and Content-Based Image Retrieval. Pattern Recognition 39, 1905-1917 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Conference on Neural Information Processing Systems (NIPS), vol. 15 (2003).Google ScholarGoogle Scholar
  9. Chen, H., Houston, A.L., Sewell, R.R., Schatz, B.R.: Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science (JASIS) 49(7), 582-603 (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cox, T.F., Cox, M.A.A.: Multidimensional scaling, 2nd edn. Chapman & Hall/CRC.ition, USA (2001).Google ScholarGoogle Scholar
  11. Joachims, T.: Learning to Classify Text using Support Vector Machines. In: Methods, Theory and Algorithms, Kluwer Academic Publishers, Boston (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An introduction to cluster analysis, John Wiley & Sons, New York (1990).Google ScholarGoogle Scholar
  13. Kohonen, T.: Self-organizing maps, 2nd edn. Springer, Berlin (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data Using a Minimal Number of Queries. IEEE Transactions on Neural Networks 14(6), 1496-1505 (2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lebart, L., Salem, A., Berry, L.: Exploring Textual Data. Kluwer Academic Publishers, Netherlands (1998).Google ScholarGoogle Scholar
  16. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mao, J., Jain, A.K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks 6(2) (March 1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Martín-Merino, M., Muñoz, M.: A New MDS Algorithm for Textual Data Analysis. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 860-867. Springer, Heidelberg (2004).Google ScholarGoogle Scholar
  19. Martín-Merino, M., Muñoz, A.: A New Sammon Algorithm for Sparse Data Visualization. Int. Conf. on Pattern Recognition 1, 477-481 (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mladenié, D.: Turning Yahoo into an Automatic Web-Page Classifier. In: Proceedings of the 13th European Conference on Aritficial Intelligence, Brighton, pp. 473-474 (1998).Google ScholarGoogle Scholar
  21. Pedrycz, W., Vukovich, G.: Fuzzy Clustering with Supervision. Pattern Recognition 37, 1339-1349 (2004).Google ScholarGoogle ScholarCross RefCross Ref
  22. Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search, Austin, USA, July 2000, pp. 58-64 (2000).Google ScholarGoogle Scholar
  23. Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998).Google ScholarGoogle Scholar
  24. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the 14th International Conference on Machine Learning, Nashville, Tennessee, USA, July 1997, pp. 412-420 (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image Guide Proceedings
    IDA'07: Proceedings of the 7th international conference on Intelligent data analysis
    September 2007
    380 pages
    ISBN:9783540748243
    • Editors:
    • Michael R. Berthold,
    • John Shawe-Taylor,
    • Nada Lavrač

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    • Published: 6 September 2007

    Qualifiers

    • Article