7,750 research outputs found

    Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation

    Full text link
    The control of nonlinear dynamical systems remains a major challenge for autonomous agents. Current trends in reinforcement learning (RL) focus on complex representations of dynamics and policies, which have yielded impressive results in solving a variety of hard control tasks. However, this new sophistication and extremely over-parameterized models have come with the cost of an overall reduction in our ability to interpret the resulting policies. In this paper, we take inspiration from the control community and apply the principles of hybrid switching systems in order to break down complex dynamics into simpler components. We exploit the rich representational power of probabilistic graphical models and derive an expectation-maximization (EM) algorithm for learning a sequence model to capture the temporal structure of the data and automatically decompose nonlinear dynamics into stochastic switching linear dynamical systems. Moreover, we show how this framework of switching models enables extracting hierarchies of Markovian and auto-regressive locally linear controllers from nonlinear experts in an imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro

    CRC 1114 - Report Membrane Deformation by N-BAR Proteins: Extraction of membrane geometry and protein diffusion characteristics from MD simulations

    Get PDF
    We describe simulations of Proteins and artificial pseudo-molecules interacting and shaping lipid bilayer membranes. We extract protein diffusion Parameters, membrane deformation profiles and the elastic properties of the used membrane models in preparation of calculations based on a large scale continuum model

    Convergence & Competition: United Ways and Community Foundations - A National Inquiry

    Get PDF
    This U.S. report summarizes key findings of the research that was commissioned to support the active dialogue among leaders of United Ways and other community foundations about their respective roles in community philanthropy and what the options for strategic co-existence -- if not full-fledged cooperation -- will look like in the coming years

    Interactive television or enhanced televisiion? : the Dutch users interest in applications of ITV via set-top boxes

    Get PDF
    This paper is both an analysis of the phenomenon of interactive television with background concepts of interactivity and television and a report of an empirical investigation among Dutch users of set-top-box ITV. In the analytic part a distinction is made between levels of interactivity in the applications of ITV. Activities labelled as selection, customisation, transaction and reaction reveal low levels of interactivity. They may be called ‘enhanced television’. They are extensions of existing television programmes that keep their linear character. Activities called production and conversation have the potential of higher interactivity. They may lead to ‘real’ interactive television as the user input makes a difference to programmes. It is suggested that so-called hybrid ITV– TV combined with telephone and email reply channels- and (broadband) Internet ITV offer better opportunities for high interactivity than set-top-box ITV. \ud The empirical investigation shows that the demand of subscribers to set-top-box ITV in the Netherlands matches supply. They favour the less interactive applications of selection and reaction. Other striking results are that young subscribers appreciate interactive applications more than the older ones and that those with a low level of education prefer these applications more than high educated subscribers. No significant gender differences were found

    Data-efficient Domain Randomization with Bayesian Optimization

    Full text link
    When learning policies for robot control, the required real-world data is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) during training according to a distribution over domain parameters in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. In this paper, we propose Bayesian Domain Randomization (BayRn), a black-box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning given sparse data from the real-world target domain. BayRn uses Bayesian optimization to search the space of source domain distribution parameters such that this leads to a policy which maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach in sim-to-sim as well as in sim-to-real experiments, comparing against three baseline methods on two robotic tasks. Our results show that BayRn is able to perform sim-to-real transfer, while significantly reducing the required prior knowledge.Comment: Accepted at RA-L / ICR

    f-Divergence constrained policy improvement

    Full text link
    To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment the policy improvement step with a trust region constraint bounding the information loss. The size of the trust region is commonly determined by the Kullback-Leibler (KL) divergence, which not only captures the notion of distance well but also yields closed-form solutions. In this paper, we consider a more general class of f-divergences and derive the corresponding policy update rules. The generic solution is expressed through the derivative of the convex conjugate function to f and includes the KL solution as a special case. Within the class of f-divergences, we further focus on a one-parameter family of α\alpha-divergences to study effects of the choice of divergence on policy improvement. Previously known as well as new policy updates emerge for different values of α\alpha. We show that every type of policy update comes with a compatible policy evaluation resulting from the chosen f-divergence. Interestingly, the mean-squared Bellman error minimization is closely related to policy evaluation with the Pearson χ2\chi^2-divergence penalty, while the KL divergence results in the soft-max policy update and a log-sum-exp critic. We carry out asymptotic analysis of the solutions for different values of α\alpha and demonstrate the effects of using different divergence functions on a multi-armed bandit problem and on common standard reinforcement learning problems
    corecore