skip to main content
article
Free Access

Scale-sensitive dimensions, uniform convergence, and learnability

Published:01 July 1997Publication History
Skip Abstract Section

Abstract

Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distribution-free convergence property of means to expectations uniformly over classes of random variables. Classes of real-valued functions enjoying such a property are also known as uniform Glivenko-Cantelli classes. In this paper, we prove, through a generalization of Sauer's lemma that may be interesting in its own right, a new characterization of uniform Glivenko-Cantelli classes. Our characterization yields Dudley, Gine´, and Zinn's previous characterization as a corollary. Furthermore, it is the first based on a Gine´, and Zinn's previous characterization as a corollary. Furthermore, it is the first based on a simple combinatorial quantity generalizing the Vapnik-Chervonenkis dimension. We apply this result to obtain the weakest combinatorial condition known to imply PAC learnability in the statistical regression (or “agnostic”) framework. Furthermore, we find a characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire. These results show that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class.

References

  1. ALON, N., AND MILMAN, V. 1983. Embedding of 1k in finite dimension Banach spaces. Israel J. Math. 45, 265-280.Google ScholarGoogle Scholar
  2. ASSOUAD, P., AND DUDLEY, R. 1989. Minimax nonparametric estimation over classes of sets. Preprint.Google ScholarGoogle Scholar
  3. BARTLETT, P., AND LONG, P. 1995. More theorems about scale-sensitive dimensions and learning. In Proceedings of the 8th Annual Conference on Computational Learning Theory. ACM, New York, pp. 392-401. Google ScholarGoogle Scholar
  4. BARTLETT, P., LONG, P., AND WILLIAMSON, R. 1996. Fat-shattering and the learnability of realvalued functions. J. Comput. Syst. Sci. 52, 3, 434-452. Google ScholarGoogle Scholar
  5. BEN-DAVID, S., CESA-BIANCHI, N., HAUSSLER, D., AND LONG, P. 1995. Characterizations of learnability for classes of {0 ... n}-valued functions. J. Comput. Syst. Sci. 50, 1, 74-86. Google ScholarGoogle Scholar
  6. BLUMER, A., EHRENFEUCHT, A., HAUSSLER, D., AND WARMUTH, M. 1989. Learnability and Vapnik- Chervonenkis dimensions. J. ACM 36, 4 (Oct.), 929-965. Google ScholarGoogle Scholar
  7. COLLINS, K., SHOR, P., AND STEMBRIDGE, J. 1987. A lower bound for {0, 1, *} tournament codes. Disc. Math. 63, 15-19. Google ScholarGoogle Scholar
  8. DUDLEY, R. 1984. A course on empirical processes. In Lecture Notes in Mathematics, vol. 1097. Springer-Verlag, New York, pp. 2-142.Google ScholarGoogle Scholar
  9. DUDLEY, R., GIN#, E., AND ZINN, J. 1991. Uniform and universal Glivenko-Cantelli classes. J. Theoret. Prob. 4, 485-510.Google ScholarGoogle Scholar
  10. GINI#, E., AND ZINN, J. 1984. Some limit theorems for empirical processes. Ann. Prob. 12, 929-989.Google ScholarGoogle Scholar
  11. GUYON, I., VAPNIK, g., BOSER, B., BOTTOU, L., AND SOLLA, S. 1991. Structural risk minimization for character recognition. In Proceedings of the 1991 Conference on Advances in Neural Information Processing Systems. pp. 471-479.Google ScholarGoogle Scholar
  12. HAUSSLER, D. 1992. Decision theoretic generalization of the PAC model for neural net and other learning applications. Inf. Comput. 100, 1, 78-150. Google ScholarGoogle Scholar
  13. HAUSSLER, D., AND LONG, P. 1995. A generalization of Sauer's lemma. J. Combin. Theory, Ser. A 71,219-240. Google ScholarGoogle Scholar
  14. KEARNS, M., AND SCHAPIRE, R. 1994. Efficient distribution-free learning of probabilistic concepts. J. Comput. Syst. Sci. 48, 3, 464-497. Google ScholarGoogle Scholar
  15. MILMAN, g. 1982. Some remarks about embedding of 1k in finite dimensional spaces. Israel J. Math. 43, 129-138.Google ScholarGoogle Scholar
  16. POLLARD, D. 1990. Empirical Processes: Theory and Applications, Volume 2 of NSF-CBMS Regional Conference Series in Probability and Statistics. Institute of Mathematical Statistics and American Statistical Association.Google ScholarGoogle Scholar
  17. RISSANEN, J. 1978. Modeling by shortest data description. Automatica 14, 465-471.Google ScholarGoogle Scholar
  18. SAUER, N. 1972. On the density of families of sets. J. Combin. Theory, Ser. A 13, 145-147.Google ScholarGoogle Scholar
  19. SHELAH, S. 1972. A combinatorial problem: Stability and order for models and theories in infinitary languages. Pac. J. Math. 41, 247-261.Google ScholarGoogle Scholar
  20. SIMON, H. 1994. Bounds on the number of examples needed for learning functions. In Proceedings of the 1st Euro-COLT Workshop. The Institute of Mathematics and Its Applications. pp. 83-94. Google ScholarGoogle Scholar
  21. VAN LINT, J. 1985. {0, 1, *} distance problems in combinatorics. In Lecture Notes of London Mathematical Society, vol. 103, Cambridge University Press, Cambridge, England, pp. 113-135.Google ScholarGoogle Scholar
  22. VAPNIK, g. 1982. Estimation of Dependencies Based on Empirical Data. Springer Verlag, New York. Google ScholarGoogle Scholar
  23. VAPNIK, V. 1989. Inductive principles of the search for empirical dependencies. In Proceedings of the 2nd Annual Workshop on a Computational Learning Theory. pp. 1-21. Google ScholarGoogle Scholar
  24. VAPNIK, g., AND CHERVONENKIS, A. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Prob. Its Applic. 16, 2, 264-280.Google ScholarGoogle Scholar
  25. VAPNIK, g., AND CHERVONENKIS, A. 1981. Necessary and sufficient conditions for uniform convergence of means to mathematical expectations. Theory Prob. Applic. 26, 3, 532-553.Google ScholarGoogle Scholar

Index Terms

  1. Scale-sensitive dimensions, uniform convergence, and learnability

    Recommendations

    Reviews

    Richard L. Frautschi

    Inspired by Valiant's PAC learning model, the authors, using discretization techniques, seek a convergence of distribution-free expectations over classes of random variables. Real-valued functions (such as Glivenko-Cantelli classes) thus acquire a new uniformity. The model pioneers a simple combinatorial quantity generalized from the Vapnik-Chervonenkis dimension and implies learnability as a function of statistical regression. With learnability<__?__Pub Caret> thus construed as probability, the accuracy parameter determines the complexity of the learner's hypothesis class. Yet, can the authors' refinements of quantitative elegance be transposed to pragmatic demonstrations__?__

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!