skip to main content
10.1145/2783258.2783405acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Estimating Local Intrinsic Dimensionality

Published:10 August 2015Publication History

ABSTRACT

This paper is concerned with the estimation of a local measure of intrinsic dimensionality (ID) recently proposed by Houle. The local model can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. This form of intrinsic dimensionality can be particularly useful in search, classification, outlier detection, and other contexts in machine learning, databases, and data mining, as it has been shown to be equivalent to a measure of the discriminative power of similarity functions. Several estimators of local ID are proposed and analyzed based on extreme value theory, using maximum likelihood estimation (MLE), the method of moments (MoM), probability weighted moments (PWM), and regularly varying functions (RV). An experimental evaluation is also provided, using both real and artificial data.

References

  1. A. A. Balkema and L. de Haan. Residual Life Time at Great Age. The Annals of Probability, 2:792--804, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  2. N. Bingham, C. Goldie, and J. Teugels. Regular variation, volume 27. Cambridge University Press, 1989.Google ScholarGoogle Scholar
  3. N. Boujemaa, J. Fauqueur, M. Ferecatu, F. Fleuret, V. Gouet, B. LeSaux, and H. Sahbi. IKONA: Interactive Specific and Generic Image Retrieval. In MMCBIR, 2001.Google ScholarGoogle Scholar
  4. C. Bouveyron, G. Celeux, and S. Girard. Intrinsic dimension estimation by maximum likelihood in isotropic probabilistic pca. Pattern Recogn. Lett., 32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Bruske and G. Sommer. Intrinsic dimensionality estimation with optimally topology preserving maps. PAMI, 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Camastra and A. Vinciarelli. Estimating the intrinsic dimension of data with a fractal-based method. PAMI, 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Coles. An Introduction to Statistical Modeling of Extreme Values. 2001.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Costa and A. Hero. Entropic graphs for manifold learning. In Asilomar Conf. on Signals, Sys. and Comput.., pages 316--320 Vol.1, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  9. T. de Vries, S. Chawla, and M. E. Houle. Finding local anomalies in very high dimensional space. In ICDM, pages 128--137, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. A. Fisher and L. H. C. Tippett. Limiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample. Math. Proc. Cambridge Phil. Soc., 24:180--190, 1928.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. I. Fraga Alves, L. de Haan, and T. Lin. Estimation of the parameter controlling the speed of convergence in extreme value theory. Math. Methods of Stat., 12.Google ScholarGoogle Scholar
  12. M. I. Fraga Alves, M. I. Gomes, and L. de Haan. A new class of semiparametric estimators of the second order parameter. Portugalia Mathematica, 60:193--213, 2003.Google ScholarGoogle Scholar
  13. B. V. Gnedenko. Sur la Distribution Limite du Terme Maximum d'une Série Aléatoire. Ann. Math., 44:423--453, 1943.Google ScholarGoogle ScholarCross RefCross Ref
  14. A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded Geometries, Fractals, and Low-Distortion Embeddings. In FOCS, pages 534--543, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hein and J.-Y. Audibert. Intrinsic dimensionality estimation of submanifolds in r d. In ICML, pages 289--296, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. M. Hill. A simple general approach to inference about the tail of a distribution. Ann. Stat., 3(5):1163--1174, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. E. Houle. Dimensionality, Discriminability, Density & Distance Distributions. In ICDMW, pages 468--473, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. E. Houle. Inlierness, Outlierness, Hubness and Discriminability: an Extreme-Value-Theoretic Foundation. Technical Report 2015-002E, NII, 2015.Google ScholarGoogle Scholar
  19. M. E. Houle, H. Kashima, and M. Nett. Generalized Expansion Dimension. In ICDMW, pages 587--594, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. E. Houle, X. Ma, M. Nett, and V. Oria. Dimensional Testing for Multi-Step Similarity Search. In ICDM, pages 299--308, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. E. Houle, X. Ma, V. Oria, and J. Sun. Efficient algorithms for similarity search in axis-aligned subspaces. In SISAP, pages 1--12, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  22. M. E. Houle and M. Nett. Rank-based similarity search: Reducing the dimensional dependence. PAMI, 37(1):136--150, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in One Billion Vectors: Re-rank with Source Coding. In ICASSP, pages 861--864, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  24. I. Jolliffe. Principal Component Analysis. 1986.Google ScholarGoogle Scholar
  25. D. R. Karger and M. Ruhl. Finding Nearest Neighbors in Growth-Restricted Metrics. In STOC, pages 741--750, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Karhunen and J. Joutsensalo. Representation and separation of signals using nonlinear PCA type learning. Neural Networks, 7(1):113--127, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based Learning applied to Document Recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google ScholarGoogle Scholar
  28. J. Pickands, III. Statistical Inference Using Extreme Order Statistics. Ann. Stat., 3:119--131, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  29. C. R. Rao. Linear statistical inference and its applications. 1973.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. T. Roweis and L. K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):2323--2326, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Rozza, G. Lombardi, C. Ceruti, E. Casiraghi, and P. Campadelli. Novel high intrinsic dimensionality estimators. Machine Learning Journal, 89(1--2):37--65, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10(5):1299--1319, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. U. Shaft and R. Ramakrishnan. Theory of nearest neighbors indexability. ACM Trans. Database Syst., 31(3):814--838, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Takens. On the numerical determination of the dimension of an attractor. 1985.Google ScholarGoogle ScholarCross RefCross Ref
  35. J. Tenenbaum, V. D. Silva, and J. Langford. A global geometric framework for non linear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  36. J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  37. J. Venna and S. Kaski. Local Multidimensional Scaling. Neural Networks, 19(6--7):889--899, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Verveer and R. Duin. An evaluation of intrinsic dimensionality estimators. PAMI, 17(1):81--86, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. von Brünken, M. E. Houle, and A. Zimek. Intrinsic Dimensional Outlier Detection in High-Dimensional Data. Technical Report 2015-003E, NII, 2015.Google ScholarGoogle Scholar

Index Terms

  1. Estimating Local Intrinsic Dimensionality

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
            August 2015
            2378 pages
            ISBN:9781450336642
            DOI:10.1145/2783258

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 August 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader