skip to main content
research-article

Design and evaluation of decentralized online clustering

Published:01 October 2012Publication History
Skip Abstract Section

Abstract

Ensuring the efficient and robust operation of distributed computational infrastructures is critical, given that their scale and overall complexity is growing at an alarming rate and that their management is rapidly exceeding human capability. Clustering analysis can be used to find patterns and trends in system operational data, as well as highlight deviations from these patterns. Such analysis can be essential for verifying the correctness and efficiency of the operation of the system, as well as for discovering specific situations of interest, such as anomalies or faults, that require appropriate management actions.

This work analyzes the automated application of clustering for online system management, from the point of view of the suitability of different clustering approaches for the online analysis of system data in a distributed environment, with minimal prior knowledge and within a timeframe that allows the timely interpretation of and response to clustering results. For this purpose, we evaluate DOC (Decentralized Online Clustering), a clustering algorithm designed to support data analysis for autonomic management, and compare it to existing and widely used clustering algorithms. The comparative evaluations will show that DOC achieves a good balance in the trade-offs inherent in the challenges for this type of online management.

References

  1. Amazon elastic compute cloud. 2012. http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  2. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. 94--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Babaoglu, O., Canright, G., Deutsch, A., Caro, G. A. D., Ducatelle, F., Gambardella, L. M., Ganguly, N., Jelasity, M., Montemanni, R., Montressor, A., and Urnes, T. 2006. Design patterns from biology for distributed computing. ACM Trans. Auton. Adapt. Syst. 1, 1, 26--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bandyopadhyay, S., Gianella, C., Maulik, U., Kargupta, H., Liu, K., and Datta, S. 2006. Clustering distributed data streams in peer-to-peer environments. Inf. Sci. 176, 14, 1952--1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 2000. Lof: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christopher D. Manning, P. R. and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Datta, S., Giannella, C., and Kargupta, H. 2006. K-means clustering over a large, dynamic network. In Proceedings of the 6th SIAM International Conference on Data Mining.Google ScholarGoogle Scholar
  8. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).Google ScholarGoogle Scholar
  9. Guha, S., Rastogi, R., and Shim, K. 1998. Cure: An efficient clustering algorithm for large databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jain, A., Murty, M., and Flynn, P. 1999. Data clustering: A review. ACM Comput. Surv. 31, 3, 264--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Karypis, G., Han, E.-H., and Kumar, V. 1999. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32, 8, 68--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Li, M., Lee, W.-C., and Sivasubramaniam, A. 2006. Pens: An algorithm for density-based clustering in peer-to-peer systems. In Proceedings of the International Conference on Scalable Information Systems (INFOSCALE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Moon, B., Jagadish, H., and Faloutsos, C. 2001. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Trans. Knowl. Data Engin. 13, 1, 124--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ogston, E., Overeinder, B., van Steen, M., and Brazier, F. 2003. A method for decentralized clustering in large multi-agent systems. In Proceedings of the 2nd International Joint Conference on Autonomous Agent and Multi-Agent Systems. 798--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Quiroz, A., Gnanasambandam, N., Parashar, M., and Sharma, N. 2008. Robust clustering analysis for the management of self-monitoring distributed systems. J. Cluster Comput. Online, --. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Quiroz, A., Kim, H., Parashar, M., Gnanasambandam, N., and Sharma, N. 2009. Towards autonomic workload provisioning for enterprise grids and clouds. In Proceedings of the 10th IEEE/ACM International Conference on Grid Computing (GRID).Google ScholarGoogle Scholar
  18. Quiroz, A., Parashar, M., Gnansambandam, N., and Sharma, N. 2010. Autonomic policy adaptation using decentralized online clustering. In Proceedings of the International Conference on Autonomic Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sagan, H. 1994. Space-Filling Curves. Springer-Verlag.Google ScholarGoogle Scholar
  20. Schmidt, C. and Parashar, M. 2003. Flexible information discovery in descentralized distributed systems. In Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM. 149--160. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Design and evaluation of decentralized online clustering

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Autonomous and Adaptive Systems
              ACM Transactions on Autonomous and Adaptive Systems  Volume 7, Issue 3
              September 2012
              130 pages
              ISSN:1556-4665
              EISSN:1556-4703
              DOI:10.1145/2348832
              Issue’s Table of Contents

              Copyright © 2012 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 October 2012
              • Accepted: 1 December 2011
              • Revised: 1 October 2011
              • Received: 1 December 2010
              Published in taas Volume 7, Issue 3

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!