skip to main content
article
Free access

OPTICS: ordering points to identify the clustering structure

Published: 01 June 1999 Publication History

Abstract

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.

References

[1]
Agrawal R., Gehrke J., Gunopulos D., Raghavan P.: "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD'98 Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94- 105.
[2]
Ankerst M., Keim D. A., Kriegel H.-P.: "'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets", Proc. Visualization'96, Hot Topic Session, San Francisco, CA, 1996.
[3]
Berchthold S., Keim D., Kriegel H.-P.: "The X-Tree: An Index Structure for High-Dimensional Data", 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39.
[4]
Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: "The R*-tree: An ti',fficient and Robust Access Method for Points and Rectangles", Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, ACM Press, New York, 1990, pp. 322-331.
[5]
Ciaccia P., Patella M., Zezula P.: "M-tree: An Efficient Access Method for Similarity Search in Metric Spaces ", Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, 1997, pp. 426-435.
[6]
Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.
[7]
Ester M., Kriegel H.-P., Sander J., Wimmer M., Xu X.: "Incremental Clustering for Mining in a Data Warehousing Environment", Proc. 24th Int. Conf. on Very I.,arge Data Bases, New York, NY, 1998, pp. 323-333.
[8]
Ester M., Kriegel H.-P., Xu X.: "Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification", Proc. 4th Int. Symp. on Laxge Spatial Databases, Portland, ME, 1995, in: Lecture Notes in Computer Science, Vol. 951, Springer, 1995, pp. 6'7-82.
[9]
Grossman A., Morlet J.: "Decomposition oj'functions into wavelets of constant shapes and related tr~msforms". Mathematics and Physics: Lectures on Recent Restdts, World Scientific, Singapore, 1985.
[10]
Guha S., Rastogi R., Shim K.: "CURE: A~ Efficient Clustering Algorithms for Large Databases", P-oc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 73-84.
[11]
Hinneburg A., Keim D.: "An Efficient Approa~:h to Clustering in Large Multimedia Databases with Noise"~, Proc. 4th Int. Conf. on Knowledge Discovery & Data Milling, New York City, NY, 1998.
[12]
Hattori K., Torii Y.: "Effective algorithms for Jhe nearest neighbor method in the clustering problem", Patt~.,rn Recognition, 1993, Vol. 26, No. 5, pp. 741-746.
[13]
Huang Z.: "A Fast Clustering Algorithm to C,'uster Very Large Categorical Data Sets in Data Mining", 1)roc. SIG- OD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS, 1997.
[14]
Jain A. K., Dubes R. C.: "Algorithms for Clustering Data," Prentice-Hall, Inc., 1988.
[15]
Keim D. A.: "Pixel-oriented Database Visualizations", in: SIGMOD RECORD, Special Issue on Inform~tion Visualization, Dezember 1996.
[16]
Keim D. A.: "Databases and Visualization", t'roc. Tutorial ACM SIGMOD Int. Conf. on Managemen: of Data, Montreal, Canada, 1996, p. 543.
[17]
Knorr E. M., Ng R.T.: "Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining," IEEE Trans. on Knowledge and Data Engineeri}lg, Vol. 8, No. 6, December 1996, pp. 884-897.
[18]
Kaufman L., Rousseeuw E J.: "Finding GrouFs in Data: An Introduction to Cluster Analysis", John Wiley & Sons, 1990.
[19]
MacQueen, J.: "Some Methods for Classification and Analysis of Multivariate Observations", 5th Berkeley Synap. Math. Statist. Prob., Vol. 1, pp. 281-297.
[20]
Ng R. T., Han J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155.
[21]
Press W. H.,Teukolsky S. A., Vetterling W. T., Flannery B. E: "Numerical Recipes in C", 2nd ed., Cambridl,ye University Press, 1992.
[22]
Richards A. J.: "Remote Sensing Digital hnag~: Analysis. An Introduction", 1983, Berlin, Springer Verlag.
[23]
Schikuta E.: "'Grid clustering: An efficient hierarchical clustering method for very large data sets". Proc. 13th Int. Conf. on Pattern Recognition, Vol 2, 1996, pp. 101-105.
[24]
Schikuta E., Erhart M.: "The bang-clustering system: Grid-based data analysis". Proc. Sec. Int. Symp. IDA-97, Vol. 1280 LNCS, London, UK, Springer-Verlag, 1 c,'97.
[25]
Sheikholeslami G., Chatterjee S., Zhang A.: "'lCaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases", Proc. 24th Int. Conf. on Very I,arge Data Bases, New York, NY, 1998, pp. 428 - 439.
[26]
Sibson R.: "SLINK: an optimally efficient alggrithm for the single-link cluster method".The Comp. Journal, Vol. }'~ 6, No. 1, 1973, pp. 30-34.
[27]
Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Dtttabase,s". Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp. 103- i 14.

Cited By

View all
  • (2025)Advanced data-driven FBG sensor-based pavement monitoring system using multi-sensor data fusion and an unsupervised learning approachMeasurement10.1016/j.measurement.2024.115821242(115821)Online publication date: Jan-2025
  • (2025)Integrating supervised and unsupervised learning approaches to unveil critical process inputsComputers & Chemical Engineering10.1016/j.compchemeng.2024.108857192(108857)Online publication date: Jan-2025
  • (2024)Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutionsPeerJ Computer Science10.7717/peerj-cs.228610(e2286)Online publication date: 29-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 28, Issue 2
June 1999
599 pages
ISSN:0163-5808
DOI:10.1145/304181
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
    June 1999
    604 pages
    ISBN:1581130848
    DOI:10.1145/304182
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1999
Published in SIGMOD Volume 28, Issue 2

Check for updates

Author Tags

  1. cluster analysis
  2. database mining
  3. visualization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3,047
  • Downloads (Last 6 weeks)343
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Advanced data-driven FBG sensor-based pavement monitoring system using multi-sensor data fusion and an unsupervised learning approachMeasurement10.1016/j.measurement.2024.115821242(115821)Online publication date: Jan-2025
  • (2025)Integrating supervised and unsupervised learning approaches to unveil critical process inputsComputers & Chemical Engineering10.1016/j.compchemeng.2024.108857192(108857)Online publication date: Jan-2025
  • (2024)Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutionsPeerJ Computer Science10.7717/peerj-cs.228610(e2286)Online publication date: 29-Aug-2024
  • (2024)AutoSCAN: automatic detection of DBSCAN parameters and efficient clustering of data in overlapping density regionsPeerJ Computer Science10.7717/peerj-cs.192110(e1921)Online publication date: 14-Mar-2024
  • (2024)Spaceborne laser altimetry data processing and applicationPoint Cloud Generation and Its Applications [Working Title]10.5772/intechopen.1004899Online publication date: 26-Sep-2024
  • (2024)Black Hole Clustering: Gravity-Based Approach with No Predetermined ParametersData Science Journal10.5334/dsj-2024-02723Online publication date: 7-May-2024
  • (2024)Machine learning approaches for automatic classification of single-particle mass spectrometry dataAtmospheric Measurement Techniques10.5194/amt-17-299-202417:1(299-313)Online publication date: 16-Jan-2024
  • (2024)An Investigation on the Use of Clustering Algorithms for Data Preprocessing in Breast Cancer DiagnosisTürk Doğa ve Fen Dergisi10.46810/tdfd.136439713:1(70-77)Online publication date: 26-Mar-2024
  • (2024)Zanthoxylum infructescence detection based on adaptive density clusteringJournal of Agricultural Engineering10.4081/jae.2024.1568Online publication date: 26-Mar-2024
  • (2024)Safety Monitoring Method for the Uplift Pressure of Concrete Dams Based on Optimized Spatiotemporal Clustering and the Bayesian Panel Vector Autoregressive ModelWater10.3390/w1608119016:8(1190)Online publication date: 22-Apr-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media