96,992 research outputs found

    Model-Based Reinforcement Learning with Continuous States and Actions

    No full text
    Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment

    Multi-Task Policy Search for Robotics

    No full text
    © 2014 IEEE.Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics. Training individual policies for every single potential task is often impractical, especially for continuous task variations, requiring more principled approaches to share and transfer knowledge among similar tasks. We present a novel approach for learning a nonlinear feedback policy that generalizes across multiple tasks. The key idea is to define a parametrized policy as a function of both the state and the task, which allows learning a single policy that generalizes across multiple known and unknown tasks. Applications of our novel approach to reinforcement and imitation learning in realrobot experiments are shown

    An Experimental Evaluation of Bayesian Optimization on Bipedal Locomotion

    No full text
    © 2014 IEEE.The design of gaits and corresponding control policies for bipedal walkers is a key challenge in robot locomotion. Even when a viable controller parametrization already exists, finding near-optimal parameters can be daunting. The use of automatic gait optimization methods greatly reduces the need for human expertise and time-consuming design processes. Many different approaches to automatic gait optimization have been suggested to date. However, no extensive comparison among them has yet been performed. In this paper, we present some common methods for automatic gait optimization in bipedal locomotion, and analyze their strengths and weaknesses. We experimentally evaluated these gait optimization methods on a bipedal robot, in more than 1800 experimental evaluations. In particular, we analyzed Bayesian optimization in different configurations, including various acquisition functions

    Data-efficient learning of feedback policies from image pixels using deep dynamical models

    Get PDF
    Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ( torques ) from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques

    Effect of osmotic potential of activator solution and temperature on viability and vigour of wheat seed

    Get PDF
    Experiment was conducted to investigate if seed priming with polyethylene glycol (PEG) activator solutions affects the viability and vigour of deteriorating wheat (Triticum aestivum L. cv. Caxton) seed lot. Seeds were subjected to PEG priming solutions with varying osmotic potentials (-1, -2, -3 MPa) at temperatures of 15 or 20°C for 6, 12, 18 or 24 h and compared to the non-primed control. Highest germination percentage at first and final count, length of plumule and dry weight of seedling were all associated with Treatment 1 MPa/20°C/6 h (92%, 94%, 9.2 cm, 0.0133 mg, respectively) compared to the control (82.5%, 86%, 7.8 cm, 0.0112 mg, respectively). The best values of coefficient of velocity of germination (CVG), mean germination time (MGT) and germination rate index (GRI) were associated with Treatment 2 MPa/15°C/24 h. There were significant interactions between the factors under study and whilst most positive effects decreased with incubation time the opposite was true at 15°C Treatment 2 MPa where an initial decrease in germination after 6 h was restored with longer incubation times. Significant correlations were found between most of the characteristics under study although these did not always account for a high percentage of variation but CVG and MGT were very highly correlated. It was concluded that, 6 h in Treatment 1 MPa PEG at a temperature of 20°C resulted in significantly improved germination percentage whilst 24 h Treatment 2 MPa at 15°C was optimal for the highest CVG and MGT. The highest speed of germination was not associated with the highest germination percentage

    Serum parathyroid hormone levels and renal handling of phosphorus in patients with chronic renal disease

    Get PDF
    In eight patients with advanced renal insufficiency (inulin clearance 1.4-9.1 ml/min), concentrations of serum calcium (S[Ca]) and phosphorus (S[P]) were maintained normal (S[Ca] > 9.0 mg/100 ml, (S[P] < 3.5 mg/100 ml) for at least 20 consecutive days with phosphate binding antacids and oral calcium carbonate. The initial serum levels of immunoreactive parathyroid hormone (S-PTH) were elevated in three (426-9230 pg/ml), normal in four (one after subtotal parathyroidectomy), and not available in one. The initial fractional excretion of filtered phosphorus was high in all and ranged from 0.45-1.05. Following sustained normo-calcemia and normo-phosphatemia, S-PTH was reduced below control levels in all patients; being normal in six and elevated in two. decreased below control levels in all patients; it remained high in six (of which five had normal S-PTH) and was normal in two (of which one had elevated S-PTH). The observed relationship between S-PTH and could either reflect the inability of the radioimmunoassay for PTH employed to measure a circulating molecular species of PTH which was present in which case the actual levels of S-PTH were higher than those measured, and/or it could be indicative of the presence of additional important factor(s) (other than S-PTH) which inhibit tubular reabsorption of phosphorus in advanced chronic renal failure. © 1972 by The Endocrine Society

    Approximate Dynamic Programming with Gaussian Processes

    Get PDF
    In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionality and strongly depends on the chosen temporal sampling rate. In this paper, we introduce Gaussian process dynamic programming (GPDP) and determine an approximate globally optimal closed-loop policy. In GPDP, value functions in the Bellman recursion of the dynamic programming algorithm are modeled using Gaussian processes. GPDP returns an optimal statefeedback for a finite set of states. Based on these outcomes, we learn a possibly discontinuous closed-loop policy on the entire state space by switching between two independently trained Gaussian processes. A binary classifier selects one Gaussian process to predict the optimal control signal. We show that GPDP is able to yield an almost optimal solution to an LQ problem using few sample points. Moreover, we successfully apply GPDP to the underpowered pendulum swing up, a complex nonlinear control problem

    Evaluation of Coulomb potential in a triclinic cell with periodic boundary conditions

    Full text link
    Lekner and Sperb's work on the evaluation of Coulomb energy and forces under periodic boundary conditions is generalized that makes it possible to use a triclinic unit cell in simulations in 3D rather than just an orthorhombic cell. The expressions obtained are in a similar form as previously obtained by Lekner and Sperb for the especial case of orthorhombic cell
    corecore