ABSTRACT
We study the problem of selecting a subset of k random variables to observe that will yield the best linear prediction of another variable of interest, given the pairwise correlations between the observation variables and the predictor variable. Under approximation preserving reductions, this problem is equivalent to the "sparse approximation" problem of approximating signals concisely. The subset selection problem is NP-hard in general; in this paper, we propose and analyze exact and approximation algorithms for several special cases of practical interest. Specifically, we give an FPTAS when the covariance matrix has constant bandwidth, and exact algorithms when the associated covariance graph, consisting of edges for pairs of variables with non-zero correlation, forms a tree or has a large (known) independent set. Furthermore, we give an exact algorithm when the variables can be embedded into a line such that the covariance decreases exponentially in the distance, and a constant-factor approximation when the variables have no "conditional suppressor variables". Much of our reasoning is based on perturbation results for the R2 multiple correlation measure, which is frequently used as a natural measure for "goodness-of-fit statistics". It lies at the core of our FPTAS, and also allows us to extend our exact algorithms to approximation algorithms when the matrix "nearly" falls into one of the above classes. We also use our perturbation analysis to prove approximation guarantees for the widely used "Forward Regression" heuristic under the assumption that the observation variables are nearly independent.
- K. Anstreicher, M. Fampa, J. Lee, and J. Williams. An exact algorithm for maximum entropy sampling. Operations Research, 43(4):684--691, 1995.Google Scholar
Digital Library
- K. Anstreicher, M. Fampa, J. Lee, and J. Williams. Maximum-entropy remote sampling. Discrete Applied Mathematics, 108(3):211--226, 2001. Google Scholar
Digital Library
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google Scholar
Digital Library
- E. J. Candes, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59:1207--1223, 2005.Google Scholar
Cross Ref
- W. Cochran. Some effects of errors of measurement on multiple correlation. Journal of the American Statistical Association, 65(329):22--34, 1970.Google Scholar
Cross Ref
- J. Cohen and P. Cohen. Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum Assoc Publishers, 2003.Google Scholar
- G. Cornuejols, M. Fisher, and G. Nemhauser. Location of bank accounts to optimize float. Management Science, 23:789--810, 1977.Google Scholar
Digital Library
- C. Couvreur and Y. Bressler. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications, 21(3):797--808, 2000. Google Scholar
Digital Library
- G. Davis, S. Mallat, and M. Avellaneda. Greedy adaptive approximation. Journal of Constructive Approximation, 13:57--98, 1997.Google Scholar
Cross Ref
- A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong. Model driven data acquisition in sensor networks. In Proc. International Conference on Very Large Data Bases, VLDB, 2004. Google Scholar
Digital Library
- G. Diekhoff. Statistics for the Social and Behavioral Sciences. Wm. C. Brown Publishers, 2002.Google Scholar
- D. Donoho. For most large underdetermined systems of linear equations, the minimal 11-norm near-solution approximates the sparsest near-solution. Communications on Pure and Applied Mathematics, 59:1207--1223, 2005.Google Scholar
- D. L. Donoho, M. Elad, and V. N. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transaction on Information Theory, 52:6--18, 2006. Google Scholar
Digital Library
- V. F. Flack and P. C. Chang. Frequency of selecting noise variables in subset regression analysis: A simulation study. The American Statistician Journal, 41(1):84--86, 1987.Google Scholar
- A. Gilbert, S. Muthukrishnan, and M. Strauss. Approximation of functions over redundant dictionaries using coherence. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2003. Google Scholar
Digital Library
- C. Guestrin, A. Krause, and A. Singh. Near-optimal sensor placements in gaussian processes. In International Conference on Machine Learning, ICML, 2005. Google Scholar
Digital Library
- R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1999. Google Scholar
Digital Library
- F. Hwang, S. Onn, and U. Rothblum. A polynomial time algorithm for shaped partition problems. SIAM Journal on Optimization, 10(1):70--81, 1999. Google Scholar
Digital Library
- R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Prentice Hall, 2002. Google Scholar
Digital Library
- P. Liaskovitis and C. Schurgers. Leveraging redundancy in sampling-interpolation applications for sensor networks. In Proc. 3rd Intl. Conf. on Distributed Computing in Sensor Systems, 2007. Google Scholar
Digital Library
- A. Miller. Subset Selection in Regression. Chapman and Hall, second edition, 2002.Google Scholar
Cross Ref
- S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1, 2005. Google Scholar
Digital Library
- B. Natarajan. Sparse approximation solutions to linear systems. SIAM Journal on Computing, 24:227--234, 1995. Google Scholar
Digital Library
- G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978.Google Scholar
Digital Library
- S. Onn and L. Schulman. The vector partition problem for convex optimization functions. Mathematics of Operations Research, 26(3):583--590, 2001. Google Scholar
Digital Library
- M. H. Pesaran and R. J. Smith. A generalized r2 criterion for regression models estimated by the instrumental variables method. Econometrica, 62(3):705--710, 1994.Google Scholar
Cross Ref
- J. Saxe. Dynamic programming algorithms for recognizing small bandwidth graphs in polynomial time. SIAM Journal on Algebraic Methods I, 1(4):363--369, 1980.Google Scholar
Digital Library
- G. W. Stewart and J.G. Sun. Matrix Perturbation Theory. Academic Press, 1990.Google Scholar
- V. Temlyakov. Greedy algorithms and m-term approximation with regard to redundant dictionaries. Journal of Approximation Theory, 98:117--145, 1999. Google Scholar
Digital Library
- V. Temlyakov. Nonlinear methods of approximation. Foundations of Computational Mathematics, 3:33--107, 2002.Google Scholar
Cross Ref
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society, 58:267--288, 1996.Google Scholar
- J. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Trans. Information Theory, 50:2231--2242, 2004. Google Scholar
Digital Library
- J. Tropp. Topics in Sparse Approximation. PhD thesis, University of Texas, Austin, 2004.Google Scholar
- J. Tropp. Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Information Theory, 51:1030--1051, 2006. Google Scholar
Digital Library
- J. Tropp, A. Gilbert, S. Muthukrishnan, and M. Strauss. Improved sparse approximation over quasi-incoherent dictionaries. In Proc. IEEE-ICIP, 2003.Google Scholar
- W. F. Velicer. Suppressor variables and the semipartial correlation coefficient. Educational and Psychological Measurement, 38:953--958, 1978.Google Scholar
Cross Ref
- M. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity using l1-constrained quadratic programming. In Proc. Allerton Conference on Communication, 2006.Google Scholar
- D. A. Walker. Suppressor variable(s) importance within a regression model. Journal of College Student Development, 44:127--133, 2003.Google Scholar
Cross Ref
- H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67(2):301--320, 2005.Google Scholar
Index Terms
Algorithms for subset selection in linear regression
Recommendations
Approximate Sparse Recovery: Optimizing Time and Measurements
A Euclidean approximate sparse recovery system consists of parameters $k,N$, an $m$-by-$N$ measurement matrix, $\bm{\Phi}$, and a decoding algorithm, $\mathcal{D}$. Given a vector, ${\mathbf x}$, the system approximates ${\mathbf x}$ by $\widehat {\...
On the General Position Subset Selection Problem
Let $f(n,\ell)$ be the maximum integer such that every set of $n$ points in the plane with at most $\ell$ collinear contains a subset of $f(n,\ell)$ points with no three collinear. First we prove that if $\ell\leqslant O(\sqrt{n})$, then $f(n,\ell)\geqslant\...
Column subset selection via sparse approximation of SVD
Given a real matrix A@__ __R^m^x^n of rank r, and an integer k






Comments