Abstract
We study a basic problem of approximating the size of an unknown set S in a known universe U. We consider two versions of the problem. In both versions, the algorithm can specify subsets T⊆U. In the first version, which we refer to as the group query or subset query version, the algorithm is told whether T∩S is nonempty. In the second version, which we refer to as the subset sampling version, if T∩S is nonempty, then the algorithm receives a uniformly selected element from T∩S. We study the difference between these two versions in both the case that the algorithm is adaptive and the case in which it is nonadaptive. Our main focus is on a natural family of allowed subsets, which correspond to intervals, as well as variants of this family.
- P. K. Agarwal and J. Erickson. 1999. Geometric range searching and its relatives. In Advances in Discrete and Computational Geometry. American Mathematical Society, 1--56.Google Scholar
- A. Anagnostopoulos, A. Z. Broder, and D. Carmel. 2006. Sampling search-engine results. World Wide Web 9, 4, 397--429. Google Scholar
Digital Library
- K. Bharat and A. Z. Broder. 1998. A technique for measuring the relative size and overlap of public web search engines. Computer Networks 30, 1--7, 379--388. Google Scholar
Digital Library
- A. Z. Broder, M. Fontoura, V. Josifovski, R. Kumar, R. Motwani, S. U. Nabar, R. Panigrahy, A. Tomkins, and Y. Xu. 2006. Estimating corpus size via queries. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM’06). 594--603. Google Scholar
Digital Library
- C. Cannone, D. Ron, and R. Servedio. 2014. Testing equivalence between probability distributions using conditional samples. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 1174--1192. Google Scholar
Digital Library
- S. Chakraborty, E. Fischer, Y. Goldhirsh, and A. Matsliah. 2013. On the power of conditional samples in distribution testing. In Proceedings of the 2nd Symposium on Innovations in Theoretical Computer Science (ITCS’13). 561--580. DOI:http://dx.doi.org/10.1145/2422436.2422497 Google Scholar
Digital Library
- R. Dorfman. 1943. The detection of defective members of large populations. The Annals of Mathematical Statistics 14, 4, 436--440. DOI:http://dx.doi.org/10.1214/aoms/1177731363Google Scholar
Cross Ref
- D. Du and F. Hwang. 1993. Combinatorial Group Testing and Its Applications. World Scientific, Singapore.Google Scholar
- P. Indyk, H. Q. Ngo, and A. Rudra. 2010. Efficiently decodable non-adaptive group testing. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’10). 1126--1142. Google Scholar
Digital Library
- S. Raskhodnikova, D. Ron, A. Shpilka, and A. Smith. 2009. Strong lower bonds for approximating distributions support size and the distinct elements problem. SIAM Journal on Computing 39, 3, 813--842. Google Scholar
Digital Library
- L. J. Stockmeyer. 1983. The complexity of approximate counting (preliminary version). In Proceedings of the 15th Annual ACM Symposium on Theory of Computing (STOC’83). 118--126. Google Scholar
Digital Library
- L. J. Stockmeyer. 1985. On approximation algorithms for #P. SIAM Journal on Computing 14, 4, 849--861.Google Scholar
Cross Ref
- G. Valiant and P. Valiant. 2011. Estimating the unseen: An n/log (n)-sample estimator for entropy and support size, shown optimal via new CLTs. In Proceedings of the 43rd Annual ACM Symposium on the Theory of Computing (STOC’11). 685--694. See also ECCC TR10-179 and TR10-180. Google Scholar
Digital Library
- P. Valiant. 2011. Testing symmetric properties of distributions. SIAM Journal on Computing 40, 6, 1927--1968. Google Scholar
Digital Library
- A. Yao. 1977. Probabilistic computations: Toward a unified measure of complexity. In Proceedings of the 18th IEEE Symposium on Foundations of Computer Science (FOCS'77). 222--227. Google Scholar
Digital Library
Index Terms
The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling
Recommendations
Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataWe present a system that approximates the answer to complex ad-hoc queries in big-data clusters by injecting samplers on-the-fly and without requiring pre-existing samples. Improvements can be substantial when big-data queries take multiple passes over ...
Relationship between generalized rough sets based on binary relation and covering
Rough set theory is a powerful tool for dealing with uncertainty, granularity, and incompleteness of knowledge in information systems. This paper systematically studies a type of generalized rough sets based on covering and the relationship between this ...
Covering-based rough fuzzy sets and binary relation
Rough set theory is a powerful tool for dealing with uncertainty, granularity, and incompleteness of knowledge in information systems. In this paper we study covering-based rough fuzzy sets in which a fuzzy set can be approximated by the intersection of ...






Comments