Author image not provided
 Aviv Tamar

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article4.33
Citation Count78
Publication count18
Publication years2011-2018
Available for download8
Average downloads per article130.13
Downloads (cumulative)1,041
Downloads (12 Months)494
Downloads (6 Weeks)57
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author




BOOKMARK & SHARE


18 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 18 of 18
Sort by:

1
December 2018 NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3,   Downloads (12 Months): 4,   Downloads (Overall): 4

Full text available: PDFPDF
In recent years, deep generative models have been shown to 'imagine' convincing high-dimensional observations such as images, audio, and even video, learning directly from raw data. In this work, we ask how to imagine goal-directed visual plans – a plausible sequence of observations that transition a dynamical system from its ...

2
December 2017 NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 21
Downloads (6 Weeks): 1,   Downloads (12 Months): 5,   Downloads (Overall): 5

Full text available: PDFPDF
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present ...

3
December 2017 NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 0,   Downloads (12 Months): 2,   Downloads (Overall): 2

Full text available: PDFPDF
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods ...

4 published by ACM
November 2017 HotNets-XVI: Proceedings of the 16th ACM Workshop on Hot Topics in Networks
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 33,   Downloads (12 Months): 370,   Downloads (Overall): 853

Full text available: PDFPDF
Recently, much attention has been devoted to the question of whether/when traditional network protocol design, which relies on the application of algorithmic insights by human experts, can be replaced by a data-driven (i.e., machine learning) approach. We explore this question in the context of the arguably most fundamental networking task: ...

5
August 2017 IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence
Publisher: AAAI Press
Bibliometrics:
Citation Count: 0

We introduce the value iteration network (VIN): a fully differentiable neural network with a 'planning module' embedded within. VINs can learn to plan , and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of ...

6
August 2017 ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70
Publisher: JMLR.org
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 20,   Downloads (12 Months): 98,   Downloads (Overall): 98

Full text available: PDFPDF
For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih ...

7
December 2016 NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 18
Downloads (6 Weeks): 0,   Downloads (12 Months): 2,   Downloads (Overall): 5

Full text available: PDFPDF
We introduce the value iteration network (VIN): a fully differentiable neural network with a 'planning module' embedded within. VINs can learn to plan , and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of ...

8
February 2016 AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
Publisher: AAAI Press
Bibliometrics:
Citation Count: 1

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced emphatic temporal differences (ETD) algorithm (Sutton, Mahmood, and White 2015), which encompasses the original ETD(λ), as well as several other off-policy evaluation algorithms as special cases. We call this ...

9
January 2016 The Journal of Machine Learning Research: Volume 17 Issue 1, January 2016
Publisher: JMLR.org
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 0,   Downloads (12 Months): 8,   Downloads (Overall): 20

Full text available: PDFPDF
In Markov decision processes (MDPs), the variance of the reward-to-go is a natural measure of uncertainty about the long term performance of a policy, and is important in domains such as finance, resource allocation, and process control. Currently however, there is no tractable procedure for calculating it in large scale ...
Keywords: Markov decision processes, reinforcement learning, simulation, temporal differences, variance estimation

10
December 2015 NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1
Publisher: MIT Press
Bibliometrics:
Citation Count: 1

Several authors have recently developed risk-sensitive policy gradient methods that augment the standard expected cost minimization problem with a measure of variability in cost. These studies have focused on specific risk-measures, such as the variance or conditional value at risk (CVaR). In this work, we extend the policy gradient method ...

11
December 2015 NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1
Publisher: MIT Press
Bibliometrics:
Citation Count: 1

In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR) objective, as opposed to a standard risk-neutral expectation. We refer to such problem as CVaR ...

12
November 2015 Foundations and Trends® in Machine Learning: Volume 8 Issue 5-6, 11 2015
Publisher: Now Publishers Inc.
Bibliometrics:
Citation Count: 2

Bayesian methods for machine learning have been widely investigated,yielding principled methods for incorporating prior information intoinference algorithms. In this survey, we provide an in-depth reviewof the role of Bayesian methods for the reinforcement learning RLparadigm. The major incentives for incorporating Bayesian reasoningin RL are: 1 it provides an elegant approach ...

13
January 2015 AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
Publisher: AAAI Press
Bibliometrics:
Citation Count: 4

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the gradient of ...

14
June 2014 ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32
Publisher: JMLR.org
Bibliometrics:
Citation Count: 3

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm. Previous studies showed that robust MDPs, based on a minimax approach to handling uncertainty, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that ...

15
June 2013 ICML'13: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28
Publisher: JMLR.org
Bibliometrics:
Citation Count: 0

In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose variants of both TD(0) and LSTD(λ) with linear function ...

16
June 2012 ICML'12: Proceedings of the 29th International Coference on International Conference on Machine Learning
Publisher: Omnipress
Bibliometrics:
Citation Count: 0

Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of ...

17
June 2012 The Journal of Machine Learning Research: Volume 13, 3/1/2012
Publisher: JMLR.org
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0,   Downloads (12 Months): 2,   Downloads (Overall): 43

Full text available: PDFPDF
In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal ...
Keywords: markov decision processes, stochastic approximation, temporal difference, hybrid model based model free algorithms, reinforcement learning
Also published in:
January 2012  The Journal of Machine Learning Research: Volume 13 Issue 1, January 2012

18
June 2011 ICML'11: Proceedings of the 28th International Conference on International Conference on Machine Learning
Publisher: Omnipress
Bibliometrics:
Citation Count: 0

In reinforcement learning an agent uses online feedback from the environment and prior knowledge in order to adaptively select an effective policy. Model free approaches address this task by directly mapping external and internal states to actions, while model based methods attempt to construct a model of the environment, followed ...



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us