skip to main content
10.5555/1771622.1771663guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Soft topographic map for clustering and classification of bacteria

Authors Info & Claims
Published:06 September 2007Publication History

ABSTRACT

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called "housekeeping genes". The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and presents some singular cases potentially due to incorrect classification or erroneous annotations in the database.

References

  1. Garrity, G.M., Julia, B.A., Lilburn, T.: The revised road map to the manual. In: Garrity, G.M. (ed.) Bergey's manual of systematic bacteriology, pp. 159-187. Springer, New York (2004).Google ScholarGoogle Scholar
  2. Joliffe, I.T.: Principal Component Analysis. Springer, New York (1986).Google ScholarGoogle Scholar
  3. Clarridge III, J.E.: Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases. Clin. Microbiol. Rev. 17, 840-862 (2004).Google ScholarGoogle Scholar
  4. Drancourt, M., Bollet, C., Carlioz, A., Martelin, R., Gayral, J.-P., Raoult, D.: 16S Ribosomal DNA Sequence Analysis of a Large Collection of Environmental and Clinical Unidentifiable Bacterial Isolates. J. Clin. Microbiol 38, 3623-3630 (2000).Google ScholarGoogle ScholarCross RefCross Ref
  5. Drancourt, M., Berger, P., Raoult, D.: Systematic 16S rRNA Gene Sequencing of Atypical Clinical Isolates Identified 27 New Bacterial Species Associated with Humans. J. Clin. Microbiol. 42, 2197-2202 (2004).Google ScholarGoogle Scholar
  6. Drancourt, M., Raoult, D.: Sequence-Based Identification of New Bacteria: a Proposition for Creation of an Orphan Bacterium Repository. J. Clin. Microbiol. 43, 4311-4315 (2005).Google ScholarGoogle ScholarCross RefCross Ref
  7. Oja, M., Somervuo, P., Kaski, S., Kohonen, T.: Clustering of human endogenous retrovirus sequences with median self-organizing map. In: WSOM'03. Workshop on Self-Organizing Maps (9-14 September 2003).Google ScholarGoogle Scholar
  8. Butte, A.J., Kohane, I.S.: Mutual information relevance networks: functional genomics clustering using pairwise entropy measurements. In: Proc. Pacific Symposium on Biocomputing, vol. 5, pp. 415-426 (2000).Google ScholarGoogle Scholar
  9. Somervuo, P., Kohonen, T.: Clustering and visualization of large protein sequence databases by means of an extension of the self-organizing map. In: Discovery Science. Proceedings of the Third International Conference, pp. 76-85 (2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9), 945-952 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen, Y., Reilly, K.D., Sprague, A.P., Guan, Z.: SEQOPTICS: A Protein Sequence Clustering Method. In: First International Multi-Symposiums on Computer and Computational Sciences. IMSCCS '06, 20-24 June 2006, vol. 1, pp. 69-75 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, 1999, pp. 49-60 (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Remm, M., Storm, C.E.V., Sonnhammer, E.L.L.: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Journal of Molecular Biology 314(5), 1041-1052 (2001).Google ScholarGoogle ScholarCross RefCross Ref
  14. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 232, 584-599 (1993).Google ScholarGoogle ScholarCross RefCross Ref
  15. http://www.ncbi.nlm.nih.gov/blast/fasta.shtmlGoogle ScholarGoogle Scholar
  16. Dubnov, S., El-Yaniv, R., Gdalyahu, Y., Schneidman, E., Tishby, N., Yona, G.: A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles. Machine Learning 47, 35-61 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Buhmann, J., Zoller, T.: Active Learning for Hierarchical Pairwise Data Clustering. icpr, 2186 (2000).Google ScholarGoogle Scholar
  18. Hofmann, T., Buhmann, J.M.: Hierarchical pairwise data clustering by mean-field annealing. In: Proceedings of ICANN'95, NEURON IMES'95, vol. II, pp. 197-202. EC2 & Cie (1995).Google ScholarGoogle Scholar
  19. Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on Pairwise Proximity Data. In: NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hofmann, T., Buhmann, J.: Multidimensional scaling and data clustering. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 459-466. MIT Press, Cambridge, Mass (1995).Google ScholarGoogle Scholar
  21. Klock, H., Buhmann, J.M.: Multidimensional scaling by deterministic annealing. In: Pelillo, M., Hancock, E.R. (eds.) EMMCVPR 1997. LNCS, vol. 1223, pp. 246- 260. Springer, Heidelberg (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17, 401-419 (1952).Google ScholarGoogle ScholarCross RefCross Ref
  23. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673-4680 (1994).Google ScholarGoogle ScholarCross RefCross Ref
  24. Needleman, S.B., Wunsch, C.D.: J. Mol. Biol. 48, 443-453 (1970).Google ScholarGoogle ScholarCross RefCross Ref
  25. Jukes, T.H., Cantor, C.R.: Mammalian Protein Metabolism. In: Munro, H.N. (ed.) Evolution of Protein Molecules, pp. 21-132. Academic Press, New York (1969).Google ScholarGoogle Scholar
  26. Luttrell, S.P.: A Bayesian analysis of self-organizing maps. Neural Comput. 6, 767-794 (1994). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Graepel, T., Burger, M., Obermayer, K.: Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21, 173-190 (1998).Google ScholarGoogle ScholarCross RefCross Ref
  28. Graepel, T., Obermayer, K.: A stochastic self organizing map for proximity data. Neural Computation 11, 139-155 (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 1-14 (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rose, K.: Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems. Proc. of the IEEE 86(11), 2210-2239 (1998).Google ScholarGoogle Scholar
  31. Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ultsch, A.: U*-Matrix: a Tool to visualize Clusters in high dimensional Data, Technical Report No. 36, Dept. of Mathematics and Computer Science, University of Marburg, Germany (2003).Google ScholarGoogle Scholar
  33. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=NucleotideGoogle ScholarGoogle Scholar
  34. Kumar, S., Tamura, K., Nei, M.: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Briefings in Bioinformatics 5, 150-163 (2004).Google ScholarGoogle ScholarCross RefCross Ref
  35. Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16(6), 276-277 (2000).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image Guide Proceedings
    IDA'07: Proceedings of the 7th international conference on Intelligent data analysis
    September 2007
    380 pages
    ISBN:9783540748243
    • Editors:
    • Michael R. Berthold,
    • John Shawe-Taylor,
    • Nada Lavrač

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    • Published: 6 September 2007

    Qualifiers

    • Article