Abstract
Ensuring the efficient and robust operation of distributed computational infrastructures is critical, given that their scale and overall complexity is growing at an alarming rate and that their management is rapidly exceeding human capability. Clustering analysis can be used to find patterns and trends in system operational data, as well as highlight deviations from these patterns. Such analysis can be essential for verifying the correctness and efficiency of the operation of the system, as well as for discovering specific situations of interest, such as anomalies or faults, that require appropriate management actions.
This work analyzes the automated application of clustering for online system management, from the point of view of the suitability of different clustering approaches for the online analysis of system data in a distributed environment, with minimal prior knowledge and within a timeframe that allows the timely interpretation of and response to clustering results. For this purpose, we evaluate DOC (Decentralized Online Clustering), a clustering algorithm designed to support data analysis for autonomic management, and compare it to existing and widely used clustering algorithms. The comparative evaluations will show that DOC achieves a good balance in the trade-offs inherent in the challenges for this type of online management.
- Amazon elastic compute cloud. 2012. http://aws.amazon.com/ec2/.Google Scholar
- Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. 94--105. Google Scholar
Digital Library
- Babaoglu, O., Canright, G., Deutsch, A., Caro, G. A. D., Ducatelle, F., Gambardella, L. M., Ganguly, N., Jelasity, M., Montemanni, R., Montressor, A., and Urnes, T. 2006. Design patterns from biology for distributed computing. ACM Trans. Auton. Adapt. Syst. 1, 1, 26--66. Google Scholar
Digital Library
- Bandyopadhyay, S., Gianella, C., Maulik, U., Kargupta, H., Liu, K., and Datta, S. 2006. Clustering distributed data streams in peer-to-peer environments. Inf. Sci. 176, 14, 1952--1985. Google Scholar
Digital Library
- Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 2000. Lof: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- Christopher D. Manning, P. R. and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google Scholar
Digital Library
- Datta, S., Giannella, C., and Kargupta, H. 2006. K-means clustering over a large, dynamic network. In Proceedings of the 6th SIAM International Conference on Data Mining.Google Scholar
- Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).Google Scholar
- Guha, S., Rastogi, R., and Shim, K. 1998. Cure: An efficient clustering algorithm for large databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- Jain, A., Murty, M., and Flynn, P. 1999. Data clustering: A review. ACM Comput. Surv. 31, 3, 264--323. Google Scholar
Digital Library
- Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice Hall. Google Scholar
Digital Library
- Karypis, G., Han, E.-H., and Kumar, V. 1999. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32, 8, 68--75. Google Scholar
Digital Library
- Li, M., Lee, W.-C., and Sivasubramaniam, A. 2006. Pens: An algorithm for density-based clustering in peer-to-peer systems. In Proceedings of the International Conference on Scalable Information Systems (INFOSCALE). Google Scholar
Digital Library
- Moon, B., Jagadish, H., and Faloutsos, C. 2001. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Trans. Knowl. Data Engin. 13, 1, 124--141. Google Scholar
Digital Library
- Ogston, E., Overeinder, B., van Steen, M., and Brazier, F. 2003. A method for decentralized clustering in large multi-agent systems. In Proceedings of the 2nd International Joint Conference on Autonomous Agent and Multi-Agent Systems. 798--796. Google Scholar
Digital Library
- Quiroz, A., Gnanasambandam, N., Parashar, M., and Sharma, N. 2008. Robust clustering analysis for the management of self-monitoring distributed systems. J. Cluster Comput. Online, --. Google Scholar
Digital Library
- Quiroz, A., Kim, H., Parashar, M., Gnanasambandam, N., and Sharma, N. 2009. Towards autonomic workload provisioning for enterprise grids and clouds. In Proceedings of the 10th IEEE/ACM International Conference on Grid Computing (GRID).Google Scholar
- Quiroz, A., Parashar, M., Gnansambandam, N., and Sharma, N. 2010. Autonomic policy adaptation using decentralized online clustering. In Proceedings of the International Conference on Autonomic Computing. Google Scholar
Digital Library
- Sagan, H. 1994. Space-Filling Curves. Springer-Verlag.Google Scholar
- Schmidt, C. and Parashar, M. 2003. Flexible information discovery in descentralized distributed systems. In Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12'03). Google Scholar
Digital Library
- Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM. 149--160. Google Scholar
Digital Library
Index Terms
Design and evaluation of decentralized online clustering
Recommendations
Inter cluster distance management model with optimal centroid estimation for K-means clustering algorithm
Clustering techniques are used to group up the transactions based on the relevancy. Cluster analysis is one of the primary data analysis method. The clustering process can be done in two ways such that Hierarchical clusters and partition clustering. ...
On cluster tree for nested and multi-density data clustering
Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...
A New Assessment of Cluster Tendency Ensemble approach for Data Clustering
SoICT '18: Proceedings of the 9th International Symposium on Information and Communication TechnologyThe ensemble is an universal machine learning method that is based on the divide-and-conquer principle. The ensemble aims to improve performance of system in terms of processing speed and quality. The assessment of cluster tendency is a method ...






Comments