Search CORE

54 research outputs found

Regularized fitted Q-iteration: application to planning

Author: A. Antos
B. Schölkopf
D. Ernst
D. Ormoneit
D.-X. Zhou
D.P. Bertsekas
F. Bunea
L. Györfi
N. Srebro
R. Munos
S. Mannor
X. Xu
Y. Engel
Y. Engel
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure

CiteSeerX

Crossref

SZTAKI Publication Repository

PolyPublie

Projective simulation for artificial intelligence

Author: Andy Clark
AP Hines
D Ormoneit
DanielL Schacter
David Deutsch
EdwardC Tolman
Ende Tulving
Germund Hesslow
Hendrik Weimer
Igor Antonov
JT Barreiro
Julia Kempe
Jun Tani
Lev Grover
Long-Ji Lin
M Toussaint
MartinV Butz
PreetiS Sareen
R Parr
Richard Feynman
RS Sutton
Sebastian Diehl
TG Dietterich
Publication venue
Publication date: 01/01/2012
Field of study

We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.Comment: 22 pages, 18 figures. Close to published version, with footnotes retaine

arXiv.org e-Print Archive

Crossref

PubMed Central

UPF Digital Repository

Cyclic animation using Partial differential Equations

Author: C. Hecker
D. Ormoneit
D. Terzopoulos
D. Terzopoulos
D. Tost
D.L. James
F. Multon
G. Allard
G. González Castro
G. González Castro
G. Xu
H. Du
H. Lee
H. Ugail
H. Ugail
H. Ugail
H. Ugail
I. Baran
J.J. Zhang
L. Leronutti
L. You
M. Athanasopoulos
M. Bertalmio
M. Gleicher
M. Pratscher
M.I.G. Bloor
M.J. Lighthill
P. Kalra
Q. Yu
Q. Zhu
S. Park II
X. Tu
X. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

YesThis work presents an efficient and fast method for achieving cyclic animation using Partial Differential Equations (PDEs). The boundary-value nature associ- ated with elliptic PDEs offers a fast analytic solution technique for setting up a framework for this type of animation. The surface of a given character is thus cre- ated from a set of pre-determined curves, which are used as boundary conditions so that a number of PDEs can be solved. Two different approaches to cyclic ani- mation are presented here. The first consists of using attaching the set of curves to a skeletal system hold- ing the animation for cyclic motions linked to a set mathematical expressions, the second one exploits the spine associated with the analytic solution of the PDE as a driving mechanism to achieve cyclic animation, which is also manipulated mathematically. The first of these approaches is implemented within a framework related to cyclic motions inherent to human-like char- acters, whereas the spine-based approach is focused on modelling the undulatory movement observed in fish when swimming. The proposed method is fast and ac- curate. Additionally, the animation can be either used in the PDE-based surface representation of the model or transferred to the original mesh model by means of a point to point map. Thus, the user is offered with the choice of using either of these two animation repre- sentations of the same object, the selection depends on the computing resources such as storage and memory capacity associated with each particular application

Crossref

Bradford Scholars

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Author: A. Antos
A. Antos
A. Nobel
András Antos
B. Yu
Csaba Szepesvári
D. Ernst
D. Haussler
D. Ormoneit
D. P. Bertsekas
D. P. Bertsekas
D. Pollard
E. Cheney
G. Gordon
J. N. Tsitsiklis
L. Devroye
L. Györfi
M. Anthony
M. Carrasco
M. Kuczma
M. Lagoudakis
P. Doukhan
P. Schweitzer
R. A. Howard
R. Bellman
R. Meir
R. Sutton
Rémi Munos
S. Bradtke
S. Meyn
S. Murphy
T. G. Dietterich
Y. Baraud
Y. Davidov
Publication venue
Publication date: 01/01/2008
Field of study

We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory

CiteSeerX

HAL - Lille 3

Crossref

SZTAKI Publication Repository

INRIA a CCSD electronic archive server

HAL: Hyper Article en Ligne

Deep Reinforcement Learning: An Overview

Author: AG Barto
D Ormoneit
F Sehnke
G Tesauro
H-G Beyer
J Kober
J Schmidhuber
LP Kaelbling
MG Bellemare
P Vincent
RS Sutton
S Hochreiter
SS Mousavi
V Mnih
W Böhmer
Y Bengio
Y Bengio
Y Bengio
Y Lecun
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/06/2018
Field of study

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

arXiv.org e-Print Archive

Crossref

Interactive generation of human animation with deformable motion models

Author: Howe N.
Jianyuan Min
Jinxiang Chai
Kovar L.
Lourakis M.
Ormoneit D.
Pavlović V.
Sidenbladh H.
Yen-Lin Chen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Reinforcement Learning: A Tutorial Survey and Recent Advances

Author: Abhijit Gosavi
Akar N.
Baird L.
Barto A. G.
Barto A. G.
Baxter J.
Bertsekas D.
Bertsekas D.
Bhatnagar S.
Boutilier C.
Bowling M.
Boyan J. A.
Bradtke S.
Bradtke S. J.
Burke E.
Cao X.-R.
Chapman D.
Crites R.
Darken C.
Dayan P.
Dean T.
Defarias D. P.
Dietterich T.
Evan-Dar E.
Forestier J.
Geibel P.
Gosavi A.
Gosavi A.
Gosavi A.
Gosavi A.
Grinstead C. M.
Hauskrecht M.
Howard R.
Hu J.
Kaelbling L. P.
Kaelbling L. P.
Keerthi S.
Kimura H.
Kimura H.
Konda V. R.
Lin Z.
Littman M.
Littman M.
Luce D.
Mahadevan S.
Mahadevan S.
Makar R.
Marbach P.
Murthy S.
Narendra K. S.
Nareyek A.
Ng A.
Ormoneit D.
Parr R.
Peng J.
Pepyne D. L.
Poznyak A. S.
Quinlan J.
Sato M.
Schuurmans D.
Schwartz A.
Singh S. P.
Singh S. P.
Singh S. P.
Singh S. P.
Smart W.
Sutton R.
Sutton R.
Sutton R.
Sutton R. S.
Sutton R. S.
Sutton R. S.
Szita I.
Tadepalli P.
Tesauro G.
Tezcan T.
Thathachar M. A. L.
Thrun S.
Tsitsiklis J. N.
Tsitsiklis J. N.
van Nunen J. A. E. E.
van Roy B.
Werbös P. J.
Werbös P. J.
Whiteson S.
Williams R.
Witten I. H.
Zhang W.
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

Continuous-state reinforcement learning with fuzzy approximation

Author: A. Sherstov
C.J.C.H. Watkins
C.K. Lin
D. Ernst
D. Ormoneit
D. Vengerov
H.R. Berenji
H.R. Berenji
J.N. Tsitsiklis
L. Jouffe
M. Wiering
R. Munos
R.S. Sutton
Publication venue
Publication date: 01/01/2007
Field of study

peer reviewedReinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Well-understood RL algorithms with good convergence and consistency properties exist. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed in the literature for the more difficult case where the state-action space is continuous. In this work, we propose a fuzzy approximation structure similar to those previously used for Q-learning, but we combine it with the model-based Q-value iteration algorithm. We show that the resulting algorithm converges. We also give a modif ed, serial variant of the algorithm that converges at least as fast as the original version. An illustrative simulation example is provided

CiteSeerX

Crossref

Open Repository and Bibliography - Liège

Portail HAL UNIV-RENNES

Experiments in predicting the German stock index DAX with density estimating neural networks

Author: D. Ormoneit
R. Neuneier
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date
Field of study

Crossref

Learning and Tracking Cyclic Human Motion

Author: D. Ormoneit
H. Sidenbladh
M. J. Black
Ormoneit Stanford University
T. Hastie
Publication venue: Dekker
Publication date: 01/01/2001
Field of study

We present methods for learning and tracking human motion in video. We estimate a statistical model of typical activities from a large set of 3D periodic human motion data by segmenting these data automatically into "cycles". Then the mean and the principal components of the cycles are computed using a new algorithm that accounts for missing information and enforces smooth transitions between cycles. The learned temporal model provides a prior probability distribution over human motions that can be used inaBayesian framework for tracking human subjects in complex monocular video sequences and recovering their 3D motion

CiteSeerX