skip to main content
article

Sampling from large matrices: An approach through geometric functional analysis

Published: 01 July 2007 Publication History

Abstract

We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(rlog r) with a small error in the spectral norm, where r = ‖A2F/‖A22 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing low-rank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best-known sample complexity for an approximation algorithm for MAX-2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables.

References

[1]
Alon, N., Fernandez De La Vega, W., Kannan, R., and Karpinski, M. 2002. Random sampling and approximation of MAX-CSPs. In Proceedings of the 34th ACM Symposium on Theory of Computing, ACM, New York, 232--239.
[2]
Alon, N., Fernandez De La Vega, W., Kannan, R., and Karpinski, M. 2003. Random Sampling and approximation of MAX-CSPs. J. Comput. Syst. Sci. 67, 212--243.
[3]
Azar, Y., Fiat, A., Karlin, A., Mcscherry, F., and Saia, J. 2001. Spectral analysis for data mining. In Proceedings of the 33rd ACM Symposium on Theory of Computing, ACM, New York, 619--626.
[4]
Berry, M. W., Drmac, Z., and Jessup, E. R. 1999. Matrices, vector spaces and information retrieval. SIAM Rev. 41, 335--362.
[5]
Berry, M. W., Dumais, S. T., and O'brian, S. T. 1995. Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573--595.
[6]
Bourgain, J., and Tzafriri, L. 1987. Invertibility of “large” sumatricies with applications to the geometry of Banach spaces and harmonic analysis. Israel Journal of Mathematics 57, 137--223.
[7]
Deerwester, S. T., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R.H. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 41, 391--407.
[8]
Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 2004. Clustering large graphs via Singular Value Decomposition. Mach. Learn. 56, 9--33.
[9]
Drineas, P., and Kannan, R. 2003. Pass efficient algorithms for approximating large matrices. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, MD), ACM, New York, 223--232.
[10]
Drineas, P., Kannan, R., and Mahoney, M. 2006a. Fast Monte-Carlo algorithms for Matrices II: Computing a low-rank approximation to a matrix. SIAM J. Comput. 36, 158--183.
[11]
Drineas, P., Mahoney, M. P., and Kannan, R. 2006b. Fast Monte-Carlo algorithms for matrices III: Computing an efficient approximate decomposition of a matrix. SIAM J. Comput. 36, 184--206.
[12]
Fernandez De La Vega, W. 1996. MAX-CUT has a randomized approximation scheme in dense graphs. Rand. Struct. Algorithms 8, 187--199.
[13]
Frieze, A., Kannan, R., and Vempala, S. 2004. Fast Monte-Carlo algorithms for finding low-rank approximations. J. ACM 51, 1025--1041.
[14]
Jerry, M. J., and Linoff, G. 1997. Data mining techniques. Wiley, New York.
[15]
Kashin, B., and Tzafriri, L. Some remarks on the restrictions of operators to coordinate subspaces. Unpublished notes.
[16]
Ledoux, M., and Talagrand, M. 1991. Probability in Banach spaces, Springer-Verlag, New York.
[17]
LUNIN, A. A. 1975. On operator norms of submatrices. Math. USSR Sbornik 27, 481--502.
[18]
Papadimitriou, C. H., Raghvan, P., Tamaki, H., and Vempala, S. 1998. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci. 61, 217--235.
[19]
Rudelson, M. 1999. Random vectors in isotropipc position. J. Funct. Anal. 164, 60--72.
[20]
Talagrand, M. 1995. Sections of smooth convex bodies via majorizing measures. Acta Math. 175, 273--300
[21]
Vershynin, R. 2001. John's decompositions: Selecting a large part. Isr. J. Math. 122, 253--277.

Cited By

View all
  • (2024)Synthetic instruments in DiD designs with unmeasured confoundingSSRN Electronic Journal10.2139/ssrn.4716511Online publication date: 2024
  • (2024)Optimal Matrix Sketching over Sliding WindowsProceedings of the VLDB Endowment10.14778/3665844.366584717:9(2149-2161)Online publication date: 1-May-2024
  • (2024)Tensor Network-Based Lightweight Energy Forecasting for Virtual Power Plant2024 13th International Conference on Renewable Energy Research and Applications (ICRERA)10.1109/ICRERA62673.2024.10815236(469-474)Online publication date: 9-Nov-2024
  • Show More Cited By

Recommendations

Reviews

Bruce E. Litow

This paper explores randomized sampling of matrices by submatrices. This is not a new topic, but the method employed is new and interesting. The main results are somewhat involved, but the key ideas used throughout the paper are these (where A is a finite dimensional matrix, not necessarily square): (1) Use of numerical rank, which exhibits stability not shared by the rank. This is defined as: where the numerator has the Frobenius norm, which is the sum of the squares of the singular values of A, and the denominator has the ℓ 2 norm, that is, the maximum singular value. The sampling parameter (number of rows needed) is bounded above by O(r ·log r). (See Theorem 1.1 of the paper.) The O-notation hides 1η 4·Δ, where 0 < η,Δ < 1, 1 - 2exp(-O(1Δ)) is the probability of sampling success (O notation here indicates an absolute constant), and η determines the error. (2) A law of large numbers for operator-valued random variables. This is the central contribution of the paper and represents an approach distinct from linear algebra techniques. This also allows for a natural notion of row or column sampling of A. (3) A series of tail distribution bonds on expected values of sequences of vectors. These results can undoubtedly be used in areas not covered in this paper, and so assume independent interest. Although the paper is not self-contained, the citations are sufficient for further exploration and the presentation is crisp and quite clear. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 54, Issue 4
July 2007
176 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1255443
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2007
Published in JACM Volume 54, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Monte-Carlo methods
  2. Randomized algorithms
  3. low-rank approximations
  4. massive data sets
  5. singular-value decompositions

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)112
  • Downloads (Last 6 weeks)16
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Synthetic instruments in DiD designs with unmeasured confoundingSSRN Electronic Journal10.2139/ssrn.4716511Online publication date: 2024
  • (2024)Optimal Matrix Sketching over Sliding WindowsProceedings of the VLDB Endowment10.14778/3665844.366584717:9(2149-2161)Online publication date: 1-May-2024
  • (2024)Tensor Network-Based Lightweight Energy Forecasting for Virtual Power Plant2024 13th International Conference on Renewable Energy Research and Applications (ICRERA)10.1109/ICRERA62673.2024.10815236(469-474)Online publication date: 9-Nov-2024
  • (2024)Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02690(28474-28484)Online publication date: 16-Jun-2024
  • (2024)Interlacing Polynomial Method for the Column Subset Selection ProblemInternational Mathematics Research Notices10.1093/imrn/rnae0012024:9(7798-7819)Online publication date: 25-Jan-2024
  • (2024)Sublinear Time Eigenvalue Approximation via Random SamplingAlgorithmica10.1007/s00453-024-01208-586:6(1764-1829)Online publication date: 1-Jun-2024
  • (2023)The edge of orthogonalityProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619615(29063-29081)Online publication date: 23-Jul-2023
  • (2023)Bootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketchingBernoulli10.3150/22-BEJ146329:1Online publication date: 1-Feb-2023
  • (2023)Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank StructureProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35899737:2(1-60)Online publication date: 22-May-2023
  • (2023)Streaming Euclidean Max-Cut: Dimension vs Data ReductionProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585170(170-182)Online publication date: 2-Jun-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media