7,750 research outputs found
Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation
The control of nonlinear dynamical systems remains a major challenge for
autonomous agents. Current trends in reinforcement learning (RL) focus on
complex representations of dynamics and policies, which have yielded impressive
results in solving a variety of hard control tasks. However, this new
sophistication and extremely over-parameterized models have come with the cost
of an overall reduction in our ability to interpret the resulting policies. In
this paper, we take inspiration from the control community and apply the
principles of hybrid switching systems in order to break down complex dynamics
into simpler components. We exploit the rich representational power of
probabilistic graphical models and derive an expectation-maximization (EM)
algorithm for learning a sequence model to capture the temporal structure of
the data and automatically decompose nonlinear dynamics into stochastic
switching linear dynamical systems. Moreover, we show how this framework of
switching models enables extracting hierarchies of Markovian and
auto-regressive locally linear controllers from nonlinear experts in an
imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro
CRC 1114 - Report Membrane Deformation by N-BAR Proteins: Extraction of membrane geometry and protein diffusion characteristics from MD simulations
We describe simulations of Proteins and artificial pseudo-molecules
interacting and shaping lipid bilayer membranes. We extract protein diffusion
Parameters, membrane deformation profiles and the elastic properties of the
used membrane models in preparation of calculations based on a large scale
continuum model
Convergence & Competition: United Ways and Community Foundations - A National Inquiry
This U.S. report summarizes key findings of the research that was commissioned to support the active dialogue among leaders of United Ways and other community foundations about their respective roles in community philanthropy and what the options for strategic co-existence -- if not full-fledged cooperation -- will look like in the coming years
Interactive television or enhanced televisiion? : the Dutch users interest in applications of ITV via set-top boxes
This paper is both an analysis of the phenomenon of interactive television with background concepts of interactivity and television and a report of an empirical investigation among Dutch users of set-top-box ITV. In the analytic part a distinction is made between levels of interactivity in the applications of ITV. Activities labelled as selection, customisation, transaction and reaction reveal low levels of interactivity. They may be called ‘enhanced television’. They are extensions of existing television programmes that keep their linear character. Activities called production and conversation have the potential of higher interactivity. They may lead to ‘real’ interactive television as the user input makes a difference to programmes. It is suggested that so-called hybrid ITV– TV combined with telephone and email reply channels- and (broadband) Internet ITV offer better opportunities for high interactivity than set-top-box ITV. \ud
The empirical investigation shows that the demand of subscribers to set-top-box ITV in the Netherlands matches supply. They favour the less interactive applications of selection and reaction. Other striking results are that young subscribers appreciate interactive applications more than the older ones and that those with a low level of education prefer these applications more than high educated subscribers. No significant gender differences were found
Data-efficient Domain Randomization with Bayesian Optimization
When learning policies for robot control, the required real-world data is
typically prohibitively expensive to acquire, so learning in simulation is a
popular strategy. Unfortunately, such polices are often not transferable to the
real world due to a mismatch between the simulation and reality, called
'reality gap'. Domain randomization methods tackle this problem by randomizing
the physics simulator (source domain) during training according to a
distribution over domain parameters in order to obtain more robust policies
that are able to overcome the reality gap. Most domain randomization approaches
sample the domain parameters from a fixed distribution. This solution is
suboptimal in the context of sim-to-real transferability, since it yields
policies that have been trained without explicitly optimizing for the reward on
the real system (target domain). Additionally, a fixed distribution assumes
there is prior knowledge about the uncertainty over the domain parameters. In
this paper, we propose Bayesian Domain Randomization (BayRn), a black-box
sim-to-real algorithm that solves tasks efficiently by adapting the domain
parameter distribution during learning given sparse data from the real-world
target domain. BayRn uses Bayesian optimization to search the space of source
domain distribution parameters such that this leads to a policy which maximizes
the real-word objective, allowing for adaptive distributions during policy
optimization. We experimentally validate the proposed approach in sim-to-sim as
well as in sim-to-real experiments, comparing against three baseline methods on
two robotic tasks. Our results show that BayRn is able to perform sim-to-real
transfer, while significantly reducing the required prior knowledge.Comment: Accepted at RA-L / ICR
f-Divergence constrained policy improvement
To ensure stability of learning, state-of-the-art generalized policy
iteration algorithms augment the policy improvement step with a trust region
constraint bounding the information loss. The size of the trust region is
commonly determined by the Kullback-Leibler (KL) divergence, which not only
captures the notion of distance well but also yields closed-form solutions. In
this paper, we consider a more general class of f-divergences and derive the
corresponding policy update rules. The generic solution is expressed through
the derivative of the convex conjugate function to f and includes the KL
solution as a special case. Within the class of f-divergences, we further focus
on a one-parameter family of -divergences to study effects of the
choice of divergence on policy improvement. Previously known as well as new
policy updates emerge for different values of . We show that every type
of policy update comes with a compatible policy evaluation resulting from the
chosen f-divergence. Interestingly, the mean-squared Bellman error minimization
is closely related to policy evaluation with the Pearson -divergence
penalty, while the KL divergence results in the soft-max policy update and a
log-sum-exp critic. We carry out asymptotic analysis of the solutions for
different values of and demonstrate the effects of using different
divergence functions on a multi-armed bandit problem and on common standard
reinforcement learning problems
- …
