ROLE
Author only
BOOKMARK & SHARE


1
October 2017
Journal of the ACM (JACM): Volume 64 Issue 6, November 2017
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 51, Downloads (12 Months): 205, Downloads (Overall): 205
Full text available:
PDF
We show that a class of statistical properties of distributions, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from any distribution over ...
Keywords:
unseen species, entropy estimation, Statistical property estimation, distinct elements
2
June 2017
STOC 2017: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 41, Downloads (12 Months): 278, Downloads (Overall): 278
Full text available:
PDF
The vast majority of theoretical results in machine learning and statistics assume that the training data is a reliable reflection of the phenomena to be learned. Similarly, most learning techniques used in practice are brittle to the presence of large amounts of biased or malicious data. Motivated by this, we ...
Keywords:
highdimensional statistics, outlier removal, robust learning
3
December 2016
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 0, Downloads (12 Months): 0, Downloads (Overall): 0
Full text available:
PDF
We consider a crowdsourcing model in which n workers are asked to rate the quality of n items previously generated by other workers. An unknown set of α n workers generate reliable ratings, while the remaining workers may behave arbitrarily and possibly adversarially. The manager of the experiment can also ...
4
June 2016
STOC '16: Proceedings of the fortyeighth annual ACM symposium on Theory of Computing
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 6, Downloads (12 Months): 106, Downloads (Overall): 225
Full text available:
PDF
We consider the following basic learning task: given independent draws from an unknown distribution over a discrete support, output an approximation of the distribution that is as accurate as possible in L1 distance (equivalently, total variation distance, or "statistical distance"). Perhaps surprisingly, it is often possible to "denoise" the empirical ...
Keywords:
GoodTuring frequency estimation, instance optimality, the unseen species problem, Distribution learning, property estimation
5
December 2015
NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems  Volume 2
Publisher: MIT Press
We consider the problem of testing whether two unequalsized samples were drawn from identical distributions, versus distributions that differ significantly. Specifically, given a target error parameter ε > 0, m 1 independent draws from an unknown distribution p with discrete support, and m 2 draws from an unknown distribution q ...
6
May 2015
Journal of the ACM (JACM): Volume 62 Issue 2, May 2015
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 9, Downloads (12 Months): 83, Downloads (Overall): 294
Full text available:
PDF
Given a set of n d dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson correlation coefficient ρ (Hamming distance d ċ 1−ρ&frac;2), how quickly can one find the two correlated vectors? We present an algorithm ...
Keywords:
Correlations, parity with noise, locality sensitive hashing, metric embedding, approximate closest pair, asymmetric embeddings, learning juntas, nearest neighbor
7
October 2014
FOCS '14: Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science
Publisher: IEEE Computer Society
We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete support p = (p1, p2,…, pn) how many samples (independent draws) must one obtain from an unknown distribution, q, to distinguish, with high probability, the case that p = q ...
Keywords:
Property Testing, Identity Testing, Instance Optimal, Automated Theorem Proving, CauchySchwarz inequality
8
October 2014
FOCS '14: Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science
Publisher: IEEE Computer Society
We show that, if truth assignments on n variables reproduce through recombination so that satisfaction of a particular Boolean function confers a small evolutionary advantage, then a polynomially large population over polynomially many generations (polynomial in n and the inverse of the initial satisfaction probability) will end up almost certainly ...
Keywords:
evolution, algorithms, Boolean functions
9
June 2014
ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning  Volume 32
Publisher: JMLR.org
This work provides simple algorithms for multiclass (and multilabel) prediction in settings where both the number of examples n and the data dimension d are relatively large. These robust and parameter free algorithms are essentially iterative leastsquares updates and very versatile both in theory and in practice. On the theoretical ...
10
June 2014
ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning  Volume 32
Publisher: JMLR.org
We study the effectiveness of learning low degree polynomials using neural networks by the gradient descent method. While neural networks have been shown to have great expressive power, and gradient descent has been widely used in practice for learning neural networks, few theoretical guarantees are known for such methods. In ...
11
January 2014
SODA '14: Proceedings of the twentyfifth annual ACMSIAM symposium on Discrete algorithms
Publisher: Society for Industrial and Applied Mathematics
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 1, Downloads (12 Months): 11, Downloads (Overall): 82
Full text available:
PDF
We study the question of learning a sparse multivariate polynomial over the real domain. In particular, for some unknown polynomial f ( x ) of degree d and k monomials, we show how to reconstruct f , within error ε, given only a set of examples x i drawn uniformly ...
12
January 2014
SODA '14: Proceedings of the twentyfifth annual ACMSIAM symposium on Discrete algorithms
Publisher: Society for Industrial and Applied Mathematics
Bibliometrics:
Citation Count: 11
Downloads (6 Weeks): 1, Downloads (12 Months): 29, Downloads (Overall): 91
Full text available:
PDF
We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions p and q over an n element set, we wish to distinguish whether p = q versus p is at least εfar from q , in either ℓ 1 or ℓ 2 ...
13
December 2013
NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems  Volume 2
Publisher: Curran Associates Inc.
Recently, Valiant and Valiant [1, 2] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from ...
14
January 2013
SODA '13: Proceedings of the twentyfourth annual ACMSIAM symposium on Discrete algorithms
Publisher: Society for Industrial and Applied Mathematics
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 0, Downloads (12 Months): 7, Downloads (Overall): 35
Full text available:
PDF
We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems that involve testing and estimating the L 1 (total variation) distance between two k modal distributions p and q over the discrete domain {1,..., n }. More precisely, we consider the following four ...
15
October 2012
FOCS '12: Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science
Publisher: IEEE Computer Society
Given a set of $n$ $d$dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson  correlation $\rho$ (Hamming distance $d\cdot \frac{1\rho}{2}$), how quickly can one find the two correlated vectors? We present an algorithm which, for ...
Keywords:
Correlation, closest pair, nearest neighbor, locality sensitive hashing, learning parity with noise, learning juntas, metric embedding
16
June 2012
Journal of the ACM (JACM): Volume 59 Issue 3, June 2012
Publisher: ACM
Bibliometrics:
Citation Count: 10
Downloads (6 Weeks): 4, Downloads (12 Months): 23, Downloads (Overall): 338
Full text available:
PDF
This article provides new worstcase bounds for the size and treewith of the result Q ( D ) of a conjunctive query Q applied to a database D . We derive bounds for the result size  Q ( D ) in terms of structural properties of Q , both ...
Keywords:
Database theory, conjunctive queries, size bounds, treewidth
17
February 2012
Communications of the ACM: Volume 55 Issue 2, February 2012
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 2, Downloads (12 Months): 120, Downloads (Overall): 1,541
18
January 2012
In this dissertation, we apply the computational perspective to three basic statistical questions which underlie and abstract several of the challenges encountered in the analysis of today's large datasets. Estimating Statistical Properties Given a sample drawn from an unknown distribution, and a specific statistical property of the distribution that we ...
19
October 2011
FOCS '11: Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science
Publisher: IEEE Computer Society
For a broad class of practically relevant distribution properties, which includes entropy and support size, nearly all of the proposed estimators have an especially simple form. Given a set of independent samples from a discrete distribution, these estimators tally the vector of summary statistics  the number of domain elements ...
Keywords:
Property Testing, Entropy Estimation, L1 Estimation, Duality
20
June 2011
PODC '11: Proceedings of the 30th annual ACM SIGACTSIGOPS symposium on Principles of distributed computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2, Downloads (12 Months): 10, Downloads (Overall): 97
Full text available:
PDF
Under many distributed protocols, the prescribed behavior for participants is to behave greedily, i.e., to repeatedly "best respond" to the others' actions. We present recent work (Proc. ICS'11) where we tackle the following general question: "When is it best for a longsighted participant to adhere to a distributed greedy protocol?" ...
Keywords:
greedy protocols, game theory

