skip to main content
10.1145/1374376.1374384acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Algorithms for subset selection in linear regression

Authors Info & Claims
Published:17 May 2008Publication History

ABSTRACT

We study the problem of selecting a subset of k random variables to observe that will yield the best linear prediction of another variable of interest, given the pairwise correlations between the observation variables and the predictor variable. Under approximation preserving reductions, this problem is equivalent to the "sparse approximation" problem of approximating signals concisely. The subset selection problem is NP-hard in general; in this paper, we propose and analyze exact and approximation algorithms for several special cases of practical interest. Specifically, we give an FPTAS when the covariance matrix has constant bandwidth, and exact algorithms when the associated covariance graph, consisting of edges for pairs of variables with non-zero correlation, forms a tree or has a large (known) independent set. Furthermore, we give an exact algorithm when the variables can be embedded into a line such that the covariance decreases exponentially in the distance, and a constant-factor approximation when the variables have no "conditional suppressor variables". Much of our reasoning is based on perturbation results for the R2 multiple correlation measure, which is frequently used as a natural measure for "goodness-of-fit statistics". It lies at the core of our FPTAS, and also allows us to extend our exact algorithms to approximation algorithms when the matrix "nearly" falls into one of the above classes. We also use our perturbation analysis to prove approximation guarantees for the widely used "Forward Regression" heuristic under the assumption that the observation variables are nearly independent.

References

  1. K. Anstreicher, M. Fampa, J. Lee, and J. Williams. An exact algorithm for maximum entropy sampling. Operations Research, 43(4):684--691, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Anstreicher, M. Fampa, J. Lee, and J. Williams. Maximum-entropy remote sampling. Discrete Applied Mathematics, 108(3):211--226, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. J. Candes, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59:1207--1223, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  5. W. Cochran. Some effects of errors of measurement on multiple correlation. Journal of the American Statistical Association, 65(329):22--34, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Cohen and P. Cohen. Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum Assoc Publishers, 2003.Google ScholarGoogle Scholar
  7. G. Cornuejols, M. Fisher, and G. Nemhauser. Location of bank accounts to optimize float. Management Science, 23:789--810, 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Couvreur and Y. Bressler. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications, 21(3):797--808, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Davis, S. Mallat, and M. Avellaneda. Greedy adaptive approximation. Journal of Constructive Approximation, 13:57--98, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong. Model driven data acquisition in sensor networks. In Proc. International Conference on Very Large Data Bases, VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Diekhoff. Statistics for the Social and Behavioral Sciences. Wm. C. Brown Publishers, 2002.Google ScholarGoogle Scholar
  12. D. Donoho. For most large underdetermined systems of linear equations, the minimal 11-norm near-solution approximates the sparsest near-solution. Communications on Pure and Applied Mathematics, 59:1207--1223, 2005.Google ScholarGoogle Scholar
  13. D. L. Donoho, M. Elad, and V. N. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transaction on Information Theory, 52:6--18, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. F. Flack and P. C. Chang. Frequency of selecting noise variables in subset regression analysis: A simulation study. The American Statistician Journal, 41(1):84--86, 1987.Google ScholarGoogle Scholar
  15. A. Gilbert, S. Muthukrishnan, and M. Strauss. Approximation of functions over redundant dictionaries using coherence. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Guestrin, A. Krause, and A. Singh. Near-optimal sensor placements in gaussian processes. In International Conference on Machine Learning, ICML, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Hwang, S. Onn, and U. Rothblum. A polynomial time algorithm for shaped partition problems. SIAM Journal on Optimization, 10(1):70--81, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Prentice Hall, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Liaskovitis and C. Schurgers. Leveraging redundancy in sampling-interpolation applications for sensor networks. In Proc. 3rd Intl. Conf. on Distributed Computing in Sensor Systems, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Miller. Subset Selection in Regression. Chapman and Hall, second edition, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Natarajan. Sparse approximation solutions to linear systems. SIAM Journal on Computing, 24:227--234, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Onn and L. Schulman. The vector partition problem for convex optimization functions. Mathematics of Operations Research, 26(3):583--590, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. H. Pesaran and R. J. Smith. A generalized r2 criterion for regression models estimated by the instrumental variables method. Econometrica, 62(3):705--710, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Saxe. Dynamic programming algorithms for recognizing small bandwidth graphs in polynomial time. SIAM Journal on Algebraic Methods I, 1(4):363--369, 1980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. W. Stewart and J.G. Sun. Matrix Perturbation Theory. Academic Press, 1990.Google ScholarGoogle Scholar
  29. V. Temlyakov. Greedy algorithms and m-term approximation with regard to redundant dictionaries. Journal of Approximation Theory, 98:117--145, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Temlyakov. Nonlinear methods of approximation. Foundations of Computational Mathematics, 3:33--107, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society, 58:267--288, 1996.Google ScholarGoogle Scholar
  32. J. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Trans. Information Theory, 50:2231--2242, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Tropp. Topics in Sparse Approximation. PhD thesis, University of Texas, Austin, 2004.Google ScholarGoogle Scholar
  34. J. Tropp. Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Information Theory, 51:1030--1051, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Tropp, A. Gilbert, S. Muthukrishnan, and M. Strauss. Improved sparse approximation over quasi-incoherent dictionaries. In Proc. IEEE-ICIP, 2003.Google ScholarGoogle Scholar
  36. W. F. Velicer. Suppressor variables and the semipartial correlation coefficient. Educational and Psychological Measurement, 38:953--958, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  37. M. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity using l1-constrained quadratic programming. In Proc. Allerton Conference on Communication, 2006.Google ScholarGoogle Scholar
  38. D. A. Walker. Suppressor variable(s) importance within a regression model. Journal of College Student Development, 44:127--133, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  39. H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67(2):301--320, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Algorithms for subset selection in linear regression

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing
      May 2008
      712 pages
      ISBN:9781605580470
      DOI:10.1145/1374376

      Copyright © 2008 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 May 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      STOC '08 Paper Acceptance Rate80of325submissions,25%Overall Acceptance Rate1,469of4,586submissions,32%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!