Author image not provided
 Yoram Singer

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article65.66
Citation Count7,288
Publication count111
Publication years1993-2016
Available for download37
Average downloads per article800.22
Downloads (cumulative)29,608
Downloads (12 Months)1,690
Downloads (6 Weeks)167
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


111 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 111
Result page: 1 2 3 4 5 6

Sort by:

1
December 2016 NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 1,   Downloads (Overall): 1

Full text available: PDFPDF
We develop a general duality between neural networks and compositional kernel Hilbert spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly describes both a family of neural networks and a kernel space. Random neural networks are generated from a skeleton through node replication followed by ...

2
June 2016 ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48
Publisher: JMLR.org
Bibliometrics:
Citation Count: 6

We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. We derive ...

3
January 2016 The Journal of Machine Learning Research: Volume 17 Issue 1, January 2016
Publisher: JMLR.org
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 3,   Downloads (12 Months): 13,   Downloads (Overall): 25

Full text available: PDFPDF
Matrix approximation is a common tool in recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the partially observed matrix is low-rank. In this paper, we propose, analyze, and experiment with two procedures, one parallel and the other global, for constructing local matrix ...
Keywords: kernel smoothing, recommender systems, matrix approximation, collaborative filtering, non-parametric methods

4 published by ACM
April 2014 WWW '14: Proceedings of the 23rd international conference on World wide web
Publisher: ACM
Bibliometrics:
Citation Count: 27
Downloads (6 Weeks): 13,   Downloads (12 Months): 127,   Downloads (Overall): 1,079

Full text available: PDFPDF
Personalized recommendation systems are used in a wide variety of applications such as electronic commerce, social networks, web search, and more. Collaborative filtering approaches to recommendation systems typically assume that the rating matrix (e.g., movie ratings by viewers) is low-rank. In this paper, we examine an alternative approach in which ...
Keywords: collaborative filtering, ranking, recommender systems

5
September 2013 ECMLPKDD'13: Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 0

We describe a new, simplified, and general analysis of a fusion of Nesterov's accelerated gradient with parallel coordinate descent. The resulting algorithm, which we call BOOM, for boo sting with m omentum, enjoys the merits of both techniques. Namely, BOOM retains the momentum and convergence properties of the accelerated gradient ...
Keywords: accelerated gradient, coordinate descent, boosting

6
June 2013 ICML'13: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28
Publisher: JMLR.org
Bibliometrics:
Citation Count: 5

Matrix approximation is a common tool in recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the partially observed matrix is of low-rank. We propose a new matrix approximation model where we assume instead that the matrix is locally of low-rank, leading to ...

7
July 2011 EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing
Publisher: Association for Computational Linguistics
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 6,   Downloads (Overall): 56

Full text available: PDFPDF
We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the ...

8
July 2011 The Journal of Machine Learning Research: Volume 12, 2/1/2011
Publisher: JMLR.org
Bibliometrics:
Citation Count: 329
Downloads (6 Weeks): 20,   Downloads (12 Months): 172,   Downloads (Overall): 1,670

Full text available: PDFPDF
We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm ...

9
March 2011 Mathematical Programming: Series A and B: Volume 127 Issue 1, March 2011
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 0

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\epsilon}$$ is $${\tilde{O}(1 / \epsilon)}$$, where each iteration operates on a single training ...
Keywords: Second, First, More, SVM, Stochastic gradient descent

10
March 2011 Mathematical Programming: Series A and B - Special Issue on "Optimization and Machine learning"; Alexandre d’Aspremont • Francis Bach • Inderjit S. Dhillon • Bin Yu: Volume 127 Issue 1, March 2011
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 125

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\epsilon}$$ is $${\tilde{O}(1 / \epsilon)}$$, where each iteration operates on a single training ...
Keywords: Stochastic gradient descent

11
September 2010 Machine Learning: Volume 80 Issue 2-3, September 2010
Publisher: Kluwer Academic Publishers
Bibliometrics:
Citation Count: 5

Boosting algorithms build highly accurate prediction mechanisms from a collection of low-accuracy predictors. To do so, they employ the notion of weak-learnability. The starting point of this paper is a proof which shows that weak learnability is equivalent to linear separability with ℓ 1 margin. The equivalence is a direct ...
Keywords: Boosting, Linear separability, Margin, Minimax theorem

12
December 2009 NIPS'09: Proceedings of the 22nd International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 8

We describe, analyze, and experiment with a new framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term ...

13
December 2009 NIPS'09: Proceedings of the 22nd International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 20

Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a ...

14
December 2009 The Journal of Machine Learning Research: Volume 10, 12/1/2009
Publisher: JMLR.org
Bibliometrics:
Citation Count: 106
Downloads (6 Weeks): 0,   Downloads (12 Months): 19,   Downloads (Overall): 417

Full text available: PDFPDF
We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while ...

15
November 2009 IEEE Transactions on Information Theory: Volume 55 Issue 11, November 2009
Publisher: IEEE Press
Bibliometrics:
Citation Count: 3

Context trees are a popular and effective tool for tasks such as compression, sequential prediction, and language modeling. We present an algebraic perspective of context trees for the task of individual sequence prediction. Our approach stems from a generalization of the notion of margin used for linear predictors. By exporting ...
Keywords: online learning, Context trees, shifting bounds, context trees, perceptron

16 published by ACM
June 2009 ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
Publisher: ACM
Bibliometrics:
Citation Count: 24
Downloads (6 Weeks): 2,   Downloads (12 Months): 16,   Downloads (Overall): 424

Full text available: PDFPDF
We derive generalizations of AdaBoost and related gradient-based coordinate descent methods that incorporate sparsity-promoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back-pruning through regularization and give an automatic stopping criterion ...

17 published by ACM
July 2008 ICML '08: Proceedings of the 25th international conference on Machine learning
Publisher: ACM
Bibliometrics:
Citation Count: 169
Downloads (6 Weeks): 13,   Downloads (12 Months): 148,   Downloads (Overall): 1,257

Full text available: PDFPDF
We describe efficient algorithms for projecting a vector onto the l 1 -ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the l ...

18
June 2008 The Journal of Machine Learning Research: Volume 9, 6/1/2008
Publisher: JMLR.org
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 2,   Downloads (Overall): 119

Full text available: PDFPDF
We describe and analyze an algorithmic framework for online classification where each online trial consists of multiple prediction tasks that are tied together. We tackle the problem of updating the online predictor by defining a projection problem in which each prediction task corresponds to a single linear constraint. These constraints ...

19
January 2008 SIAM Journal on Computing: Volume 37 Issue 5, January 2008
Publisher: Society for Industrial and Applied Mathematics
Bibliometrics:
Citation Count: 56

The Perceptron algorithm, despite its simplicity, often performs well in online classification tasks. The Perceptron becomes especially effective when it is used in conjunction with kernel functions. However, a common difficulty encountered when implementing kernel-based online algorithms is the amount of memory required to store the online hypothesis, which may ...
Keywords: kernel methods, learning theory, the Perceptron algorithm, online classification

20
December 2007 Machine Learning: Volume 69 Issue 2-3, December 2007
Publisher: Kluwer Academic Publishers
Bibliometrics:
Citation Count: 27

We describe a novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization. We cast a sub-family of universal online bounds as an optimization problem. Using the weak duality theorem we reduce the process of online learning to the task ...
Keywords: Mistake bounds, Regret bounds, Online learning, Duality



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us