383 research outputs found

    Dynamic Weights in Multi-Objective Deep Reinforcement Learning

    Full text link
    Many real-world decision problems are characterized by multiple conflicting objectives which must be balanced based on their relative importance. In the dynamic weights setting the relative importance changes over time and specialized algorithms that deal with such change, such as a tabular Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are required. However, this earlier work is not feasible for RL settings that necessitate the use of function approximators. We generalize across weight changes and high-dimensional inputs by proposing a multi-objective Q-network whose outputs are conditioned on the relative importance of objectives and we introduce Diverse Experience Replay (DER) to counter the inherent non-stationarity of the Dynamic Weights setting. We perform an extensive experimental evaluation and compare our methods to adapted algorithms from Deep Multi-Task/Multi-Objective Reinforcement Learning and show that our proposed network in combination with DER dominates these adapted algorithms across weight change scenarios and problem domains

    Computing Convex Coverage Sets for Faster Multi-objective Coordination

    Get PDF
    In this article, we propose new algorithms for multi-objective coordination graphs (MO- CoGs). Key to the efficiency of these algorithms is that they compute a convex coverage set (CCS) instead of a Pareto coverage set (PCS). Not only is a CCS a sufficient solution set for a large class of problems, it also has important characteristics that facilitate more efficient solutions. We propose two main algorithms for computing a CCS in MO-CoGs. Convex multi-objective variable elimination (CMOVE) computes a CCS by performing a series of agent eliminations, which can be seen as solving a series of local multi-objective subproblems. Variable elimination linear support (VELS) iteratively identifies the single weight vector w that can lead to the maximal possible improvement on a partial CCS and calls variable elimination to solve a scalarized instance of the problem for w. VELS is faster than CMOVE for small and medium numbers of objectives and can compute an ε-approximate CCS in a fraction of the runtime. In addition, we propose variants of these methods that employ AND/OR tree search instead of variable elimination to achieve memory efficiency. We analyze the runtime and space complexities of these methods, prove their correctness, and compare them empirically against a naive baseline and an existing PCS method, both in terms of memory-usage and runtime. Our results show that, by focusing on the CCS, these methods achieve much better scalability in the number of agents than the current state of the art

    Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

    Full text link
    In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at https://github.com/lmzintgraf/gp_pref_elici

    Structure in the Value Function of Two-Player Zero-Sum Games of Incomplete Information

    Get PDF
    Zero-sum stochastic games provide a rich model for competitive decision making. However, under general forms of state uncertainty as considered in the Partially Observable Stochastic Game (POSG), such decision making problems are still not very well understood. This paper makes a contribution to the theory of zero-sum POSGs by characterizing structure in their value function. In particular, we introduce a new formulation of the value function for zs-POSGs as a function of the "plan-time sufficient statistics" (roughly speaking the information distribution in the POSG), which has the potential to enable generalization over such information distributions. We further delineate this generalization capability by proving a structural result on the shape of value function: it exhibits concavity and convexity with respect to appropriately chosen marginals of the statistic space. This result is a key pre-cursor for developing solution methods that may be able to exploit such structure. Finally, we show how these results allow us to reduce a zs-POSG to a "centralized" model with shared observations, thereby transferring results for the latter, narrower class, to games with individual (private) observations

    An analytical packet/flow-level modelling approach for wireless LANs with Quality-of-Service support

    Get PDF
    We present an analytical packet/flow-level modelling approach for the performance analysis of IEEE 802.11e WLAN, where we explicitly take into account QoS differentiation mechanisms based on minimum contention window size values and Arbitration InterFrame Space (AIFS) values, as included in the Enhanced Distributed Channel Access (EDCA) protocol of the 802.11e standard. We first enhance the packet-level approach previously used for best-effort WLANs to include traffic classes with different QoS requirements. The packet-level model approach yields service weights that discriminate among traffic classes. From these observations, the packet/flow-level model for 802.11e is the \textit{generalized} discriminatory processor-sharing (GDPS) queueing model where the state-dependent system capacity is distributed among active traffic classes according to state-dependent priority weights. Extensive simulations show that the discriminatory processor-sharing model closely represents the flow behavior of 802.11e

    Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

    Full text link
    Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options

    Variational Multi-Objective Coordination

    Get PDF
    In this paper, we propose variational optimistic linear support (VOLS), a novel algorithm that finds bounded approximate solutions for multi-objective coordination graphs (MO-CoGs). VOLS builds and improves upon an existing exact algorithm called variable elimination linear support (VELS). Like VELS, VOLS solves a MO-CoG as a series of scalarized single-objective coordination graphs. We improve upon VELS in two important ways. Firstly, where VELS uses a single-objective solver called variable elimination (VE) as a subroutine, VOLS uses a variational method called weighted mini-buckets (WMB). Because variational methods scale much better than VE, VOLS can be used to solve much larger MO-CoGs than was previously possible. Furthermore, we show that because WMB computes bounded approximations, so does VOLS. Secondly, we leverage the insight that VOLS can hot-start each call to WMB by reusing the reparameterizations output by WMB on earlier calls. We show empirically that VOLS scales much better than VELS and introduces only negligle error. Our experimental results indicate that the reuse of reparameterizations keeps the runtime low and the approximation quality high
    corecore