10.5555/3001460.3001507guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedings
ARTICLE

A density-based algorithm for discovering clusters in large spatial databases with noise

ABSTRACT

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

References

  1. Beckmann N., Kriegel H.-P., Schneider R, and Seeger B. 1990. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles, Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322-331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brinkhoff T., Kriegel H.-R, Schneider R., and Seeger B. 1994 Efficient Multi-Step Processing of Spatial Joins, Proc. ACM SIGMOD Int. Conf. on Management of Data, Minneapolis, MN, 1994, pp. 197-208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ester M., Kriegel H.-P., and Xu X. 1995. A Database Interface for Clustering in Large Spatial Databases, Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, Montreal, Canada, 1995, AAAI Press, 1995.Google ScholarGoogle Scholar
  4. García J.A., Fdez-Valdivia J., Cortijo F. J., and Molina R. 1994. A Dynamic Approach for Clustering Data. Signal Processing, Vol. 44, No. 2, 1994, pp. 181-196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gueting R.H. 1994. An Introduction to Spatial Database Systems. The VLDB Journal 3(4):357-399. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jain Anil K. 1988. Algorithms for Clustering Data. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kaufman L., and Rousseeuw P.J. 1990. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons.Google ScholarGoogle Scholar
  8. Matheus C.J.; Chan P.K.; and Piatetsky-Shapiro G. 1993. Systems for Knowledge Discovery in Databases, IEEE Transactions on Knowledge and Data Engineering 5(6):903-913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ng R.T., and Han J. 1994. Efficient and Effective Clustering Methods for Spatial Data Mining, Proc. 20th Int. Conf. on Very Large Data Bases, 144-155. Santiago, Chile. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stonebraker M., Frew J., Gardels K., and Meredith J. 1993. The SEQUOIA 2000 Storage Benchmark, Proc. ACM SIGMOD Int. Conf. on Management of Data, Washington, DC, 1993, pp. 2-11. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

(auto-classified)
  1. A density-based algorithm for discovering clusters in large spatial databases with noise

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!