383 research outputs found
Dynamic Weights in Multi-Objective Deep Reinforcement Learning
Many real-world decision problems are characterized by multiple conflicting
objectives which must be balanced based on their relative importance. In the
dynamic weights setting the relative importance changes over time and
specialized algorithms that deal with such change, such as a tabular
Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are
required. However, this earlier work is not feasible for RL settings that
necessitate the use of function approximators. We generalize across weight
changes and high-dimensional inputs by proposing a multi-objective Q-network
whose outputs are conditioned on the relative importance of objectives and we
introduce Diverse Experience Replay (DER) to counter the inherent
non-stationarity of the Dynamic Weights setting. We perform an extensive
experimental evaluation and compare our methods to adapted algorithms from Deep
Multi-Task/Multi-Objective Reinforcement Learning and show that our proposed
network in combination with DER dominates these adapted algorithms across
weight change scenarios and problem domains
Computing Convex Coverage Sets for Faster Multi-objective Coordination
In this article, we propose new algorithms for multi-objective coordination graphs (MO- CoGs). Key to the efficiency of these algorithms is that they compute a convex coverage set (CCS) instead of a Pareto coverage set (PCS). Not only is a CCS a sufficient solution set for a large class of problems, it also has important characteristics that facilitate more efficient solutions. We propose two main algorithms for computing a CCS in MO-CoGs. Convex multi-objective variable elimination (CMOVE) computes a CCS by performing a series of agent eliminations, which can be seen as solving a series of local multi-objective subproblems. Variable elimination linear support (VELS) iteratively identifies the single weight vector w that can lead to the maximal possible improvement on a partial CCS and calls variable elimination to solve a scalarized instance of the problem for w. VELS is faster than CMOVE for small and medium numbers of objectives and can compute an ε-approximate CCS in a fraction of the runtime. In addition, we propose variants of these methods that employ AND/OR tree search instead of variable elimination to achieve memory efficiency. We analyze the runtime and space complexities of these methods, prove their correctness, and compare them empirically against a naive baseline and an existing PCS method, both in terms of memory-usage and runtime. Our results show that, by focusing on the CCS, these methods achieve much better scalability in the number of agents than the current state of the art
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
In multi-objective decision planning and learning, much attention is paid to
producing optimal solution sets that contain an optimal policy for every
possible user preference profile. We argue that the step that follows, i.e,
determining which policy to execute by maximising the user's intrinsic utility
function over this (possibly infinite) set, is under-studied. This paper aims
to fill this gap. We build on previous work on Gaussian processes and pairwise
comparisons for preference modelling, extend it to the multi-objective decision
support scenario, and propose new ordered preference elicitation strategies
based on ranking and clustering. Our main contribution is an in-depth
evaluation of these strategies using computer and human-based experiments. We
show that our proposed elicitation strategies outperform the currently used
pairwise methods, and found that users prefer ranking most. Our experiments
further show that utilising monotonicity information in GPs by using a linear
prior mean at the start and virtual comparisons to the nadir and ideal points,
increases performance. We demonstrate our decision support framework in a
real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at
https://github.com/lmzintgraf/gp_pref_elici
Structure in the Value Function of Two-Player Zero-Sum Games of Incomplete Information
Zero-sum stochastic games provide a rich model for competitive decision
making. However, under general forms of state uncertainty as considered in the
Partially Observable Stochastic Game (POSG), such decision making problems are
still not very well understood. This paper makes a contribution to the theory
of zero-sum POSGs by characterizing structure in their value function. In
particular, we introduce a new formulation of the value function for zs-POSGs
as a function of the "plan-time sufficient statistics" (roughly speaking the
information distribution in the POSG), which has the potential to enable
generalization over such information distributions. We further delineate this
generalization capability by proving a structural result on the shape of value
function: it exhibits concavity and convexity with respect to appropriately
chosen marginals of the statistic space. This result is a key pre-cursor for
developing solution methods that may be able to exploit such structure.
Finally, we show how these results allow us to reduce a zs-POSG to a
"centralized" model with shared observations, thereby transferring results for
the latter, narrower class, to games with individual (private) observations
An analytical packet/flow-level modelling approach for wireless LANs with Quality-of-Service support
We present an analytical packet/flow-level modelling approach for the performance analysis of IEEE 802.11e WLAN, where we explicitly take into account QoS differentiation mechanisms based on minimum contention window size values and Arbitration InterFrame Space (AIFS) values, as included in the Enhanced Distributed Channel Access (EDCA) protocol of the 802.11e standard. We first enhance the packet-level approach previously used for best-effort WLANs to include traffic classes with different QoS requirements. The packet-level model approach yields service weights that discriminate among traffic classes. From these observations, the packet/flow-level model for 802.11e is the \textit{generalized} discriminatory processor-sharing (GDPS) queueing model where the state-dependent system capacity is distributed among active traffic classes according to state-dependent priority weights. Extensive simulations show that the discriminatory processor-sharing model closely represents the flow behavior of 802.11e
Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets
Many real-world reinforcement learning problems have a hierarchical nature,
and often exhibit some degree of partial observability. While hierarchy and
partial observability are usually tackled separately (for instance by combining
recurrent neural networks and options), we show that addressing both problems
simultaneously is simpler and more efficient in many cases. More specifically,
we make the initiation set of options conditional on the previously-executed
option, and show that options with such Option-Observation Initiation Sets
(OOIs) are at least as expressive as Finite State Controllers (FSCs), a
state-of-the-art approach for learning in POMDPs. OOIs are easy to design based
on an intuitive description of the task, lead to explainable policies and keep
the top-level and option policies memoryless. Our experiments show that OOIs
allow agents to learn optimal policies in challenging POMDPs, while being much
more sample-efficient than a recurrent neural network over options
Variational Multi-Objective Coordination
In this paper, we propose variational optimistic linear support (VOLS), a novel algorithm that finds bounded approximate solutions for multi-objective coordination graphs (MO-CoGs). VOLS builds and improves upon an existing exact algorithm called variable elimination linear support (VELS). Like VELS, VOLS solves a MO-CoG as a series of scalarized single-objective coordination graphs. We improve upon VELS in two important ways. Firstly, where VELS uses a single-objective solver called variable elimination (VE) as a subroutine, VOLS uses a variational method called weighted mini-buckets (WMB). Because variational methods scale much better than VE, VOLS can be used to solve much larger MO-CoGs than was previously possible. Furthermore, we show that because WMB computes bounded approximations, so does VOLS. Secondly, we leverage the insight that VOLS can hot-start each call to WMB by reusing the reparameterizations output by WMB on earlier calls. We show empirically that VOLS scales much better than VELS and introduces only negligle error. Our experimental results indicate that the reuse of reparameterizations keeps the runtime low and the approximation quality high
- …
