Abstract
In this article, we consider the problem of testing properties of joint distributions under the Conditional Sampling framework. In the standard sampling model, sample complexity of testing properties of joint distributions are exponential in the dimension, resulting in inefficient algorithms for practical use. While recent results achieve efficient algorithms for product distributions with significantly smaller sample complexity, no efficient algorithm is expected when the marginals are not independent.
In this article, we initialize the study of conditional sampling in the multidimensional setting. We propose a subcube conditional sampling model where the tester can condition on a (adaptively) chosen subcube of the domain. Due to its simplicity, this model is potentially implementable in many practical applications, particularly when the distribution is a joint distribution over Σn for some set Σ.
We present algorithms for various fundamental properties of distributions in the subcube-conditioning model and prove that the sample complexity is polynomial in the dimension n (and not exponential as in the traditional model). We present an algorithm for testing identity to a known distribution using Õ(n2)-subcube-conditional samples, an algorithm for testing identity between two unknown distributions using Õ(n5)-subcube-conditional samples and an algorithm for testing identity to a product distribution using Õ(n5)-subcube-conditional samples.
The central concept of our technique involves an elegant chain rule, which can be proved using basic techniques of probability theory, yet it is powerful enough to avoid the curse of dimensionality.
- Jayadev Acharya, Clément L. Canonne, and Gautam Kamath. 2015a. A chasm between identity and equivalence testing with conditional queries. In Proceedings of the Conference on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM’15). 449--466.Google Scholar
- Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. 2015b. Optimal testing for properties of distributions. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems. 3591--3599. http://arxiv.org/abs/1507.05952 Google Scholar
Digital Library
- Tuǧkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. 2005. The complexity of approximating the entropy. SIAM J. Comput. 35, 1 (2005), 132--150. Google Scholar
Digital Library
- Tuǧkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. 2001. Testing random variables for independence and identity. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (FOCS’01), Bob Werner (Ed.). Los Alamitos, CA, 442--451. Google Scholar
Digital Library
- Tuǧkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. 2013. Testing closeness of discrete distributions. J. ACM 60, 1, Article 4 (Feb. 2013), 4:1--4:25 pages. Google Scholar
Digital Library
- Clément L. Canonne. 2015a. Big data on the rise? Testing monotonicity of distributions. In Proceedings of the 42nd International Colloquium on Automata, Languages, and Programming (ICALP’15). 294--305.Google Scholar
Cross Ref
- Clément L. Canonne. 2015b. A survey on distribution testing: Your data is big. but is it blue? Electron. Colloq. Comput. Complex. 22 (2015), 63. Retrieved from http://eccc.hpi-web.de/report/2015/063.Google Scholar
- Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. 2017. Testing Bayesian networks. In Proceedings of the 30th Conference on Learning Theory (COLT’17). 370--448. Retrieved from http://arxiv.org/abs/1612.03156.Google Scholar
- Clément L. Canonne, Dana Ron, and Rocco A. Servedio. 2015. Testing probability distributions using conditional samples. SIAM J. Comput. 44, 3 (2015), 540--616.Google Scholar
Digital Library
- Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. 2016. On the power of conditional samples in distribution testing. SIAM J. Comput. 45, 4 (2016), 1261--1296.Google Scholar
Digital Library
- Sourav Chakraborty and Kuldeep Meel. 2016. Testing correctness of programs that claim to produce satisfying assignments uniformly at random. Under Preparation (2016).Google Scholar
- Siuon Chan, Ilias Diakonikolas, Paul Valiant, and Gregory Valiant. 2014. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14), Chandra Chekuri (Ed.). SIAM. Google Scholar
Digital Library
- Constantinos Daskalakis, Nishanth Dikkala, and Gautam Kamath. 2018. Testing ising models. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’18). 1989--2007. Retrieved from http://arxiv.org/abs/1612.03147. Google Scholar
Digital Library
- Constantinos Daskalakis and Qinxuan Pan. 2017. Square Hellinger subadditivity for Bayesian networks and its applications to identity testing. In Proceedings of the 30th Conference on Learning Theory (COLT’17). 697--703. Retrieved from http://arxiv.org/abs/1612.03164.Google Scholar
- Ilias Diakonikolas and Daniel M. Kane. 2016. A new approach for testing properties of discrete distributions. In Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS’16). 685--694. Retrieved from http://arxiv.org/abs/1601.05557.Google Scholar
- Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh. 2015. Faster algorithms for testing under conditional sampling. In Proceedings of The 28th Conference on Learning Theory (COLT’15). 607--636.Google Scholar
- Eldar Fischer. 2004. The difficulty of testing for isomorphism against a graph that is given in advance. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 391--397. Google Scholar
Digital Library
- Eldar Fischer, Ilan Newman, and Jirí Sgall. 2004. Functions that have read-twice constant width branching programs are not necessarily testable. Random Struct. Algor. 24, 2 (2004), 175--193. Google Scholar
Digital Library
- Oded Goldreich. 2017. Introduction to Property Testing. Cambridge University Press.Google Scholar
- Oded Goldreich, Shafi Goldwasser, and Dana Ron. 1998. Property testing and its connection to learning and approximation. J. ACM 45, 4 (1998), 653--750. Google Scholar
Digital Library
- Oded Goldreich and Dana Ron. 2011. On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography, Oded Goldreich (Ed.). Lecture Notes in Computer Science, Vol. 6650. Springer, 68--75. Retrieved from Google Scholar
Digital Library
- Themistoklis Gouleakis, Christos Tzamos, and Manolis Zampetakis. 2017. Faster sublinear algorithms using conditional sampling. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’17). Google Scholar
Digital Library
- Reut Levi, Dana Ron, and Ronitt Rubinfeld. 2013. Testing properties of collections of distributions. Theory Comput. 9 (2013), 295--347. Retrieved fromGoogle Scholar
Cross Ref
- L. Paninski. 2008. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inf. Theor. 54, 10 (Oct. 2008), 4750--4755. Google Scholar
Digital Library
- Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. 2009. Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM J. Comput. 39, 3 (2009), 813--842. Google Scholar
Digital Library
- Ronitt Rubinfeld and Madhu Sudan. 1996. Robust characterizations of polynomials with applications to program testing. SIAM J. Comput. 25, 2 (1996), 252--271. Google Scholar
Digital Library
- Gregory Valiant and Paul Valiant. 2011. Estimating the unseen: An n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. In Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC’11). 685--694. Google Scholar
Digital Library
- Gregory Valiant and Paul Valiant. 2014. An automatic inequality prover and instance optimal identity testing. In Proceedings of the 55th IEEE Annual Symposium on Foundations of Computer Science (FOCS’14). 51--60. Google Scholar
Digital Library
- Paul Valiant. 2011. Testing symmetric properties of distributions. SIAM J. Comput. 40, 6 (2011), 1927--1968. Google Scholar
Digital Library
Index Terms
Property Testing of Joint Distributions using Conditional Samples
Recommendations
Testing conditional independence of discrete distributions
STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of ComputingWe study the problem of testing *conditional independence* for discrete distributions. Specifically, given samples from a discrete random variable (X, Y, Z) on domain [ℓ1]×[ℓ2] × [n], we want to distinguish, with probability at least 2/3, between the ...
Testing Probability Distributions using Conditional Samples
We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. This is an oracle that takes as input a subset $S \subseteq [N]$ of the domain $[N]$ ...
On the power of conditional samples in distribution testing
ITCS '13: Proceedings of the 4th conference on Innovations in Theoretical Computer ScienceIn this paper we define and examine the power of the conditional sampling oracle in the context of distribution-property testing. The conditional sampling oracle for a discrete distribution μ takes as input a subset S ⊂ [n] of the domain, and outputs a ...






Comments