54 research outputs found
Regularized fitted Q-iteration: application to planning
We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure
Projective simulation for artificial intelligence
We propose a model of a learning agent whose interaction with the environment
is governed by a simulation-based projection, which allows the agent to project
itself into future situations before it takes real action. Projective
simulation is based on a random walk through a network of clips, which are
elementary patches of episodic memory. The network of clips changes
dynamically, both due to new perceptual input and due to certain compositional
principles of the simulation process. During simulation, the clips are screened
for specific features which trigger factual action of the agent. The scheme is
different from other, computational, notions of simulation, and it provides a
new element in an embodied cognitive science approach to intelligent action and
learning. Our model provides a natural route for generalization to
quantum-mechanical operation and connects the fields of reinforcement learning
and quantum computation.Comment: 22 pages, 18 figures. Close to published version, with footnotes
retaine
Cyclic animation using Partial differential Equations
YesThis work presents an efficient and fast method for achieving cyclic animation using Partial Differential Equations (PDEs). The boundary-value nature associ- ated with elliptic PDEs offers a fast analytic solution technique for setting up a framework for this type of animation. The surface of a given character is thus cre- ated from a set of pre-determined curves, which are used as boundary conditions so that a number of PDEs can be solved. Two different approaches to cyclic ani- mation are presented here. The first consists of using attaching the set of curves to a skeletal system hold- ing the animation for cyclic motions linked to a set mathematical expressions, the second one exploits the spine associated with the analytic solution of the PDE as a driving mechanism to achieve cyclic animation, which is also manipulated mathematically. The first of these approaches is implemented within a framework related to cyclic motions inherent to human-like char- acters, whereas the spine-based approach is focused on modelling the undulatory movement observed in fish when swimming. The proposed method is fast and ac- curate. Additionally, the animation can be either used in the PDE-based surface representation of the model or transferred to the original mesh model by means of
a point to point map. Thus, the user is offered with the choice of using either of these two animation repre- sentations of the same object, the selection depends on the computing resources such as storage and memory capacity associated with each particular application
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory
Deep Reinforcement Learning: An Overview
In recent years, a specific machine learning method called deep learning has
gained huge attraction, as it has obtained astonishing results in broad
applications such as pattern recognition, speech recognition, computer vision,
and natural language processing. Recent research has also been shown that deep
learning techniques can be combined with reinforcement learning methods to
learn useful representations for the problems with high dimensional raw data
input. This chapter reviews the recent advances in deep reinforcement learning
with a focus on the most used deep architectures such as autoencoders,
convolutional neural networks and recurrent neural networks which have
successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201
Continuous-state reinforcement learning with fuzzy approximation
peer reviewedReinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Well-understood RL algorithms with good convergence and consistency properties exist. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed in the literature for the more difficult case where the state-action space is continuous. In this work, we propose a fuzzy approximation structure similar to those previously used for Q-learning, but we combine it with the model-based Q-value iteration algorithm. We show that the resulting algorithm converges. We also give a modif ed, serial variant of the algorithm that converges at least as fast as the original version. An illustrative simulation example is provided
Experiments in predicting the German stock index DAX with density estimating neural networks
Learning and Tracking Cyclic Human Motion
We present methods for learning and tracking human motion in video. We estimate a statistical model of typical activities from a large set of 3D periodic human motion data by segmenting these data automatically into "cycles". Then the mean and the principal components of the cycles are computed using a new algorithm that accounts for missing information and enforces smooth transitions between cycles. The learned temporal model provides a prior probability distribution over human motions that can be used inaBayesian framework for tracking human subjects in complex monocular video sequences and recovering their 3D motion
- …
