ABSTRACT
This paper is concerned with the estimation of a local measure of intrinsic dimensionality (ID) recently proposed by Houle. The local model can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. This form of intrinsic dimensionality can be particularly useful in search, classification, outlier detection, and other contexts in machine learning, databases, and data mining, as it has been shown to be equivalent to a measure of the discriminative power of similarity functions. Several estimators of local ID are proposed and analyzed based on extreme value theory, using maximum likelihood estimation (MLE), the method of moments (MoM), probability weighted moments (PWM), and regularly varying functions (RV). An experimental evaluation is also provided, using both real and artificial data.
- A. A. Balkema and L. de Haan. Residual Life Time at Great Age. The Annals of Probability, 2:792--804, 1974.Google Scholar
Cross Ref
- N. Bingham, C. Goldie, and J. Teugels. Regular variation, volume 27. Cambridge University Press, 1989.Google Scholar
- N. Boujemaa, J. Fauqueur, M. Ferecatu, F. Fleuret, V. Gouet, B. LeSaux, and H. Sahbi. IKONA: Interactive Specific and Generic Image Retrieval. In MMCBIR, 2001.Google Scholar
- C. Bouveyron, G. Celeux, and S. Girard. Intrinsic dimension estimation by maximum likelihood in isotropic probabilistic pca. Pattern Recogn. Lett., 32. Google Scholar
Digital Library
- J. Bruske and G. Sommer. Intrinsic dimensionality estimation with optimally topology preserving maps. PAMI, 20. Google Scholar
Digital Library
- F. Camastra and A. Vinciarelli. Estimating the intrinsic dimension of data with a fractal-based method. PAMI, 24. Google Scholar
Digital Library
- S. Coles. An Introduction to Statistical Modeling of Extreme Values. 2001.Google Scholar
Cross Ref
- J. Costa and A. Hero. Entropic graphs for manifold learning. In Asilomar Conf. on Signals, Sys. and Comput.., pages 316--320 Vol.1, 2003.Google Scholar
Cross Ref
- T. de Vries, S. Chawla, and M. E. Houle. Finding local anomalies in very high dimensional space. In ICDM, pages 128--137, 2010. Google Scholar
Digital Library
- R. A. Fisher and L. H. C. Tippett. Limiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample. Math. Proc. Cambridge Phil. Soc., 24:180--190, 1928.Google Scholar
Cross Ref
- M. I. Fraga Alves, L. de Haan, and T. Lin. Estimation of the parameter controlling the speed of convergence in extreme value theory. Math. Methods of Stat., 12.Google Scholar
- M. I. Fraga Alves, M. I. Gomes, and L. de Haan. A new class of semiparametric estimators of the second order parameter. Portugalia Mathematica, 60:193--213, 2003.Google Scholar
- B. V. Gnedenko. Sur la Distribution Limite du Terme Maximum d'une Série Aléatoire. Ann. Math., 44:423--453, 1943.Google Scholar
Cross Ref
- A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded Geometries, Fractals, and Low-Distortion Embeddings. In FOCS, pages 534--543, 2003. Google Scholar
Digital Library
- M. Hein and J.-Y. Audibert. Intrinsic dimensionality estimation of submanifolds in r d. In ICML, pages 289--296, 2005. Google Scholar
Digital Library
- B. M. Hill. A simple general approach to inference about the tail of a distribution. Ann. Stat., 3(5):1163--1174, 1975.Google Scholar
Cross Ref
- M. E. Houle. Dimensionality, Discriminability, Density & Distance Distributions. In ICDMW, pages 468--473, 2013. Google Scholar
Digital Library
- M. E. Houle. Inlierness, Outlierness, Hubness and Discriminability: an Extreme-Value-Theoretic Foundation. Technical Report 2015-002E, NII, 2015.Google Scholar
- M. E. Houle, H. Kashima, and M. Nett. Generalized Expansion Dimension. In ICDMW, pages 587--594, 2012. Google Scholar
Digital Library
- M. E. Houle, X. Ma, M. Nett, and V. Oria. Dimensional Testing for Multi-Step Similarity Search. In ICDM, pages 299--308, 2012. Google Scholar
Digital Library
- M. E. Houle, X. Ma, V. Oria, and J. Sun. Efficient algorithms for similarity search in axis-aligned subspaces. In SISAP, pages 1--12, 2014.Google Scholar
Cross Ref
- M. E. Houle and M. Nett. Rank-based similarity search: Reducing the dimensional dependence. PAMI, 37(1):136--150, 2015.Google Scholar
Cross Ref
- H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in One Billion Vectors: Re-rank with Source Coding. In ICASSP, pages 861--864, 2011.Google Scholar
Cross Ref
- I. Jolliffe. Principal Component Analysis. 1986.Google Scholar
- D. R. Karger and M. Ruhl. Finding Nearest Neighbors in Growth-Restricted Metrics. In STOC, pages 741--750, 2002. Google Scholar
Digital Library
- J. Karhunen and J. Joutsensalo. Representation and separation of signals using nonlinear PCA type learning. Neural Networks, 7(1):113--127, 1994. Google Scholar
Digital Library
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based Learning applied to Document Recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google Scholar
- J. Pickands, III. Statistical Inference Using Extreme Order Statistics. Ann. Stat., 3:119--131, 1975.Google Scholar
Cross Ref
- C. R. Rao. Linear statistical inference and its applications. 1973.Google Scholar
Cross Ref
- S. T. Roweis and L. K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):2323--2326, 2000.Google Scholar
Cross Ref
- A. Rozza, G. Lombardi, C. Ceruti, E. Casiraghi, and P. Campadelli. Novel high intrinsic dimensionality estimators. Machine Learning Journal, 89(1--2):37--65, 2012. Google Scholar
Digital Library
- B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10(5):1299--1319, 1998. Google Scholar
Digital Library
- U. Shaft and R. Ramakrishnan. Theory of nearest neighbors indexability. ACM Trans. Database Syst., 31(3):814--838, 2006. Google Scholar
Digital Library
- F. Takens. On the numerical determination of the dimension of an attractor. 1985.Google Scholar
Cross Ref
- J. Tenenbaum, V. D. Silva, and J. Langford. A global geometric framework for non linear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google Scholar
Cross Ref
- J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google Scholar
Cross Ref
- J. Venna and S. Kaski. Local Multidimensional Scaling. Neural Networks, 19(6--7):889--899, 2006. Google Scholar
Digital Library
- P. Verveer and R. Duin. An evaluation of intrinsic dimensionality estimators. PAMI, 17(1):81--86, 1995. Google Scholar
Digital Library
- J. von Brünken, M. E. Houle, and A. Zimek. Intrinsic Dimensional Outlier Detection in High-Dimensional Data. Technical Report 2015-003E, NII, 2015.Google Scholar
Index Terms
Estimating Local Intrinsic Dimensionality
Recommendations
Intrinsic dimensionality estimation of submanifolds in Rd
ICML '05: Proceedings of the 22nd international conference on Machine learningWe present a new method to estimate the intrinsic dimensionality of a submanifold M in Rd from random samples. The method is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice ...
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods
AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and SecurityNeural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for ...
Understanding deep learning (still) requires rethinking generalization
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family or to the ...





Comments