ABSTRACT
A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the l 2 distance to the distribution p. We give time and sample efficient algorithms for this problem.
We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are ε-far from any k-histogram in the l 1 distance and l 2 distance respectively.
Supplemental Material
Available for Download
This is an erratum for our PODS 2012 paper "Approximating and Testing k-Histogram Distributions in Sub-linear Time"
- N. Alon, A. Andoni, T. Kaufman, K. Matulef, R. Rubinfeld, and N. Xie. Testing k-wise and almost-wise independence. In Proceedings of the Thirty-Ninth Annual ACM Symposium on the Theory of Computing (STOC), pages 496--505, 2007. Google Scholar
Digital Library
- T. Batu, S. Dasgupta, R. Kumar, and R. Rubinfeld. The complexity of approximating the entropy. SIAM Journal on Computing, 35(1):132--150, 2005. Google Scholar
Digital Library
- T. Batu, L. Fortnow, E. Fischer, R. Kumar, R. Rubinfeld, and P. White. Testing random variables for independence and identity. In Proceedings of the Forty-Second Annual Symposium on Foundations of Computer Science (FOCS), pages 442--451, 2001. Google Scholar
Digital Library
- T. Batu, L. Fortnow, R. Rubinfeld, W.D. Smith, and P. White. Testing that distributions are close. In Proceedings of the Forty-First Annual Symposium on Foundations of Computer Science (FOCS), pages 259--269, Los Alamitos, CA, USA, 2000. IEEE Computer Society. Google Scholar
Digital Library
- T. Batu, L. Fortnow, R. Rubinfeld, W. D. Smith, and P. White. Testing closeness of discrete distributions. CoRR, abs/1009.5397, 2010. This is a long version of BFR++00.Google Scholar
- T. Batu, R. Kumar, and R. Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of the Thirty-Sixth Annual ACM Symposium on the Theory of Computing (STOC), pages 381--390, 2004. Google Scholar
Digital Library
- S. Chaudhuri, R. Motwani, and V. Narasayya. Random sampling for histogram construction: how much is enough? SIGMOD, 1998. Google Scholar
Digital Library
- A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, M. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintenance. STOC, 2002. Google Scholar
Digital Library
- O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653--750, 1998. Google Scholar
Digital Library
- S. Guha, N. Koudas, and K. Shim. Approximation and streaming algorithms for histogram construction problems. ACM Transactions on Database Systems (TODS), 31(1), 2006. Google Scholar
Digital Library
- P.B. Gibbons, Y Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. VLDB, 1997. Google Scholar
Digital Library
- O. Goldreich and D. Ron. On testing expansion in bounded-degree graphs. Electronic Colloqium on Computational Complexity, 7(20), 2000.Google Scholar
- Y. Ioannidis. The history of histograms (abridged). VLDB, 2003. Google Scholar
Digital Library
- H. V. Jagadish, V. Poosala, N. Koudas, K. Sevcik, S. Muthukrishnan, and T. Suel. Optimal histograms with quality guarantees. VLDB, 1998. Google Scholar
Digital Library
- L. Paninski. Testing for uniformity given very sparsely-sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750--4755, 2008. Google Scholar
Digital Library
- D. Ron. Property testing: A learning theory perspective. Foundations and Trends in Machine Learning, 3:307--402, 2008. Google Scholar
Digital Library
- S. Raskhodnikova, D. Ron, A. Shpilka, and A. Smith. Strong lower bonds for approximating distributions support size and the distinct elements problem. SIAM Journal on Computing, 39(3):813--842, 2009. Google Scholar
Digital Library
- R. Rubinfeld and M. Sudan. Robust characterization of polynomials with applications to program testing. SIAM Journal on Computing, 25(2):252--271, 1996. Google Scholar
Digital Library
- R. Rubinfeld. Sublinear time algorithms. In Proc. International Congress of Mathematicians, volume 3, pages 1095--1111, 2006.Google Scholar
- Nitin Thaper, Sudipto Guha, Piotr Indyk, and Nick Koudas. Dynamic multidimensional histograms. In SIGMOD Conference, pages 428--439, 2002. Google Scholar
Digital Library
- P. Valiant. Testing symmetric properties of distributions. In Proceedings of the Fourtieth Annual ACM Symposium on the Theory of Computing (STOC), pages 383--392, 2008. Google Scholar
Digital Library
- G. Valiant and P. Valiant. Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. In Proceedings of the Fourty-Third Annual ACM Symposium on the Theory of Computing, pages 685--694, 2011. See also ECCC TR10-179 and TR10-180. Google Scholar
Digital Library
Index Terms
Approximating and testing k-histogram distributions in sub-linear time
Recommendations
Sublinear algorithms for testing monotone and unimodal distributions
STOC '04: Proceedings of the thirty-sixth annual ACM symposium on Theory of computingThe complexity of testing properties of monotone and unimodal distributions, when given access only to samples of the distribution, is investigated. Two kinds of sublinear-time algorithms---those for testing monotonicity and those that take advantage of ...
Testing monotone high-dimensional distributions
STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computingA monotone distribution P over a (partially) ordered domain has P(y) ≥ P(x) if y ≥ x in the order. We study several natural problems of testing properties of monotone distributions over the n-dimensional Boolean cube, given access to random draws from ...
Testing Probability Distributions using Conditional Samples
We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. This is an oracle that takes as input a subset $S \subseteq [N]$ of the domain $[N]$ ...






Comments