ABSTRACT
In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called "housekeeping genes". The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and presents some singular cases potentially due to incorrect classification or erroneous annotations in the database.
- Garrity, G.M., Julia, B.A., Lilburn, T.: The revised road map to the manual. In: Garrity, G.M. (ed.) Bergey's manual of systematic bacteriology, pp. 159-187. Springer, New York (2004).Google Scholar
- Joliffe, I.T.: Principal Component Analysis. Springer, New York (1986).Google Scholar
- Clarridge III, J.E.: Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases. Clin. Microbiol. Rev. 17, 840-862 (2004).Google Scholar
- Drancourt, M., Bollet, C., Carlioz, A., Martelin, R., Gayral, J.-P., Raoult, D.: 16S Ribosomal DNA Sequence Analysis of a Large Collection of Environmental and Clinical Unidentifiable Bacterial Isolates. J. Clin. Microbiol 38, 3623-3630 (2000).Google Scholar
Cross Ref
- Drancourt, M., Berger, P., Raoult, D.: Systematic 16S rRNA Gene Sequencing of Atypical Clinical Isolates Identified 27 New Bacterial Species Associated with Humans. J. Clin. Microbiol. 42, 2197-2202 (2004).Google Scholar
- Drancourt, M., Raoult, D.: Sequence-Based Identification of New Bacteria: a Proposition for Creation of an Orphan Bacterium Repository. J. Clin. Microbiol. 43, 4311-4315 (2005).Google Scholar
Cross Ref
- Oja, M., Somervuo, P., Kaski, S., Kohonen, T.: Clustering of human endogenous retrovirus sequences with median self-organizing map. In: WSOM'03. Workshop on Self-Organizing Maps (9-14 September 2003).Google Scholar
- Butte, A.J., Kohane, I.S.: Mutual information relevance networks: functional genomics clustering using pairwise entropy measurements. In: Proc. Pacific Symposium on Biocomputing, vol. 5, pp. 415-426 (2000).Google Scholar
- Somervuo, P., Kohonen, T.: Clustering and visualization of large protein sequence databases by means of an extension of the self-organizing map. In: Discovery Science. Proceedings of the Third International Conference, pp. 76-85 (2000). Google Scholar
Digital Library
- Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9), 945-952 (2002). Google Scholar
Digital Library
- Chen, Y., Reilly, K.D., Sprague, A.P., Guan, Z.: SEQOPTICS: A Protein Sequence Clustering Method. In: First International Multi-Symposiums on Computer and Computational Sciences. IMSCCS '06, 20-24 June 2006, vol. 1, pp. 69-75 (2006). Google Scholar
Digital Library
- Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, 1999, pp. 49-60 (1999). Google Scholar
Digital Library
- Remm, M., Storm, C.E.V., Sonnhammer, E.L.L.: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Journal of Molecular Biology 314(5), 1041-1052 (2001).Google Scholar
Cross Ref
- Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 232, 584-599 (1993).Google Scholar
Cross Ref
- http://www.ncbi.nlm.nih.gov/blast/fasta.shtmlGoogle Scholar
- Dubnov, S., El-Yaniv, R., Gdalyahu, Y., Schneidman, E., Tishby, N., Yona, G.: A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles. Machine Learning 47, 35-61 (2002). Google Scholar
Digital Library
- Buhmann, J., Zoller, T.: Active Learning for Hierarchical Pairwise Data Clustering. icpr, 2186 (2000).Google Scholar
- Hofmann, T., Buhmann, J.M.: Hierarchical pairwise data clustering by mean-field annealing. In: Proceedings of ICANN'95, NEURON IMES'95, vol. II, pp. 197-202. EC2 & Cie (1995).Google Scholar
- Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on Pairwise Proximity Data. In: NIPS. Google Scholar
Digital Library
- Hofmann, T., Buhmann, J.: Multidimensional scaling and data clustering. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 459-466. MIT Press, Cambridge, Mass (1995).Google Scholar
- Klock, H., Buhmann, J.M.: Multidimensional scaling by deterministic annealing. In: Pelillo, M., Hancock, E.R. (eds.) EMMCVPR 1997. LNCS, vol. 1223, pp. 246- 260. Springer, Heidelberg (1997). Google Scholar
Digital Library
- Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17, 401-419 (1952).Google Scholar
Cross Ref
- Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673-4680 (1994).Google Scholar
Cross Ref
- Needleman, S.B., Wunsch, C.D.: J. Mol. Biol. 48, 443-453 (1970).Google Scholar
Cross Ref
- Jukes, T.H., Cantor, C.R.: Mammalian Protein Metabolism. In: Munro, H.N. (ed.) Evolution of Protein Molecules, pp. 21-132. Academic Press, New York (1969).Google Scholar
- Luttrell, S.P.: A Bayesian analysis of self-organizing maps. Neural Comput. 6, 767-794 (1994). Google Scholar
Digital Library
- Graepel, T., Burger, M., Obermayer, K.: Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21, 173-190 (1998).Google Scholar
Cross Ref
- Graepel, T., Obermayer, K.: A stochastic self organizing map for proximity data. Neural Computation 11, 139-155 (1999). Google Scholar
Digital Library
- Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 1-14 (1997). Google Scholar
Digital Library
- Rose, K.: Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems. Proc. of the IEEE 86(11), 2210-2239 (1998).Google Scholar
- Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1995). Google Scholar
Digital Library
- Ultsch, A.: U*-Matrix: a Tool to visualize Clusters in high dimensional Data, Technical Report No. 36, Dept. of Mathematics and Computer Science, University of Marburg, Germany (2003).Google Scholar
- http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=NucleotideGoogle Scholar
- Kumar, S., Tamura, K., Nei, M.: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Briefings in Bioinformatics 5, 150-163 (2004).Google Scholar
Cross Ref
- Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16(6), 276-277 (2000).Google Scholar
Recommendations
Soft topographic maps for clustering and classifying bacteria using housekeeping genes
The Self-Organizing Map (SOM) algorithm is widely used for building topographic maps of data represented in a vectorial space, but it does not operate with dissimilarity data. Soft Topographic Map (STM) algorithm is an extension of SOM to arbitrary ...
Clustering Bacteria Species Using Neural Gas: Preliminary Study
Computational Intelligence Methods for Bioinformatics and BiostatisticsIn this work a method for clustering and visualization of bacteria taxonomy is presented. A modified version of the Batch Median Neural Gas (BNG) algorithm is proposed. The BNG algorithm is able to manage non vectorial data given as a dissimilarity ...




Comments