10.1145/342009.335388acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedings
ARTICLE

LOF: identifying density-based local outliers

ABSTRACT

For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.

References

  1. 1.Aming, A., Agrawal R., Raghavan R: "A Linear Method for Deviation Detection in Large Databases", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, p. 164-169.Google ScholarGoogle Scholar
  2. 2.Ankerst M., Breunig M.M., Kriegel H.-E, Sander J.: "OPTICS: Ordering Points To Identify the Clustering Structure", Proc. ACM SIGMOD Int. Conf. on Management of Data, Philadelphia, PA, 1999. Google ScholarGoogle Scholar
  3. 3.Agrawal R., Gehrke J., Gunopulos D., Raghavan E: "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94-105. Google ScholarGoogle Scholar
  4. 4.Berchthold S., Keim D. A., Kriegel H.-E: "The X-Tree: An Index Structure for High-Dimensional Data", 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39. Google ScholarGoogle Scholar
  5. 5.Barnett V., Lewis T.: "Outliers in statistical data", John Wiley, 1994.Google ScholarGoogle Scholar
  6. 6.DuMouchel W., Schonlau M.: "A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 189-193.Google ScholarGoogle Scholar
  7. 7.Ester M., Kriegel H.-E, Sander J., Xu X.: "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.Google ScholarGoogle Scholar
  8. 8.Fawcett T., Provost F.: "Adaptive Fraud Detection", Data Mining and Knowledge Discovery Journal, Kluwer Academic Publishers, Vol. 1, No. 3, 1997, pp. 291-316. Google ScholarGoogle Scholar
  9. 9.Fayyad U., Piatetsky-Shapiro G., Smyth R: "Knowledge Discovery and Data Mining: Towards a Unifying Framework", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, 1996, pp. 82-88.Google ScholarGoogle Scholar
  10. 10.Hawkins, D.: "Identification of Outliers", Chapman and Hall, London, 1980.Google ScholarGoogle Scholar
  11. 11.Hinneburg A., Keim D.A.: "An Efficient Approach to Clustering in Large Multimedia Databases with Noise", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York City, NY, 1998,pp. 58-65.Google ScholarGoogle Scholar
  12. 12.Johnson T., Kwok I., Ng R.: "Fast Computation of 2- Dimensional Depth Contours", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 224-228.Google ScholarGoogle Scholar
  13. 13.Knott E.M., Ng R.T.: "Algorithms for Mining Distance- Based Outliers in Large Datasets", Proc. 24th Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 392-403. Google ScholarGoogle Scholar
  14. 14.Knott E. M., Ng R.T.: "Finding Intensional Knowledge of Distance-based Outliers", Proc. 25th Int. Conf. on Very Large Data Bases, Edinburgh, Scotland, 1999, pp. 211-222. Google ScholarGoogle Scholar
  15. 15.Ng R. T., Hart J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155. Google ScholarGoogle Scholar
  16. 16.Preparata E, Shamos M.: "Computational Geometry: an Introduction", Springer, 1988. Google ScholarGoogle Scholar
  17. 17.Ramaswamy S., Rastogi R., Kyuseok S.: "Efficient Algorithms for Mining Outliers from Large Data Sets", Proc. ACM SIDMOD Int. Conf. on Management of Data, 2000. Google ScholarGoogle Scholar
  18. 18.Ruts I., Rousseeuw E: "Computing Depth Contours of Bivariate Point Clouds, Journal of Computational Statistics and Data Analysis, 23, 1996, pp. 153-168. Google ScholarGoogle Scholar
  19. 19.Sheikholeslami G., Chatterjee S., Zhang A.: "WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases", Proc. Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 428-439. Google ScholarGoogle Scholar
  20. 20.Tukey J. W.: "Exploratory Data Analysis", Addison-Wesley, 1977.Google ScholarGoogle Scholar
  21. 21.Weber R., Schek Hans-L, Blott S.: "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces", Proc. Int. Conf. on Very Large Data Bases, New York, NY, 1998, pp. 194-205. Google ScholarGoogle Scholar
  22. 22.Wang W., Yang J., Muntz R.: "STING: A Statistical Information Grid Approach to Spatial Data Mining", Proc. 23th Int. Conf. on Very Large Data Bases, Athens, Greece, Morgan Kaufmann Publishers, San Francisco, CA, 1997, pp. 186-195. Google ScholarGoogle Scholar
  23. 23.Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Databases", Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp. 103-114. Google ScholarGoogle Scholar

Index Terms

  1. LOF

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Article Metrics

        • Downloads (Last 12 months)1,400
        • Downloads (Last 6 weeks)85

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!