182 research outputs found
Global Versus Local Constructive Function Approximation for On-Line Reinforcement Learning
In order to scale to problems with large or continuous state-spaces, reinforcement learning algorithms need to be combined with function approximation techniques. The majority of work on function approximation for reinforcement learning has so far focused either on global function approximation with a static structure (such as multi-layer perceptrons), or on constructive architectures using locally responsive units. The former, whilst achieving some notable successes, has also been shown to fail on some relatively simple tasks. The locally constructive approach has been shown to be more stable, but may scale poorly to higher-dimensional inputs, as it will require a dramatic increase in resources. This paper explores the use of two constructive algorithms using non-locally responsive neurons based on the popular Cascade-Correlation supervised-learning algorithm. The algorithms are applied within the sarsa reinforcement learning algorithm, and their performance compared against both a multi-layer perceptron and a locally constructive algorithm (the Resource Allocating Network) across three reinforcement learning tasks. It is shown that the globally constructive algorithms are less stable, but that on some tasks they can achieve similar performance to the locally constructive approach, whilst generating much more compact solutions.
Softmax exploration strategies for multiobjective reinforcement learning
Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation
Participant observation of griefing in a journey through the World of Warcraft
Through the ethnographic method of participant observation in World of Warcraft, this paper aims to document various actions that may be considered griefing among the Massively Multiplayer Online Role-Playing Game community. Griefing as a term can be very subjective, so witnessing the anti-social and intentional actions first-hand can be used as a means to understand this subjectivity among players as well as produce a thorough recount of some of the toxic behavior in this genre. The participant observation was conducted across several years and expansions of World of Warcraft and the author became familiar with many griefing related actions; although some of these were perceived as acceptable game-play elements
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
The rapid advancement of artificial intelligence (AI) systems suggests that
artificial general intelligence (AGI) systems may soon arrive. Many researchers
are concerned that AIs and AGIs will harm humans via intentional misuse
(AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents,
there is an increasing effort focused on developing algorithms and paradigms
that ensure AI systems are aligned to what humans intend, e.g. AI systems that
yield actions or recommendations that humans might judge as consistent with
their intentions and goals. Here we argue that alignment to human intent is
insufficient for safe AI systems and that preservation of long-term agency of
humans may be a more robust standard, and one that needs to be separated
explicitly and a priori during optimization. We argue that AI systems can
reshape human intention and discuss the lack of biological and psychological
mechanisms that protect humans from loss of agency. We provide the first formal
definition of agency-preserving AI-human interactions which focuses on
forward-looking agency evaluations and argue that AI systems - not humans -
must be increasingly tasked with making these evaluations. We show how agency
loss can occur in simple environments containing embedded agents that use
temporal-difference learning to make action recommendations. Finally, we
propose a new area of research called "agency foundations" and pose four
initial topics designed to improve our understanding of agency in AI-human
interactions: benevolent game theory, algorithmic foundations of human rights,
mechanistic interpretability of agency representation in neural-networks and
reinforcement learning from internal states
Position : intent-aligned ai systems must optimize for agency preservation
A central approach to AI-safety research has been to generate aligned AI systems: i.e. systems that do not deceive users and yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that truthful AIs aligned solely to human intent are insufficient and that preservation of long-term agency of humans may be a more robust standard that may need to be separated and explicitly optimized for. We discuss the science of intent and control and how human intent can be manipulated and we provide a formal definition of agency-preserving AI-human interactions focusing on forward-looking explicit agency evaluations. Our work points to a novel pathway for human harm in AI-human interactions and proposes solutions to this challenge. Copyright 2024 by the author(s
Explainable reinforcement learning for broad-XAI: a conceptual framework and survey
Broad-XAI moves away from interpreting individual decisions based on a single datum and aims to provide integrated explanations from multiple machine learning algorithms into a coherent explanation of an agent’s behaviour that is aligned to the communication needs of the explainee. Reinforcement Learning (RL) methods, we propose, provide a potential backbone for the cognitive model required for the development of Broad-XAI. RL represents a suite of approaches that have had increasing success in solving a range of sequential decision-making problems. However, these algorithms operate as black-box problem solvers, where they obfuscate their decision-making policy through a complex array of values and functions. EXplainable RL (XRL) aims to develop techniques to extract concepts from the agent’s: perception of the environment; intrinsic/extrinsic motivations/beliefs; Q-values, goals and objectives. This paper aims to introduce the Causal XRL Framework (CXF), that unifies the current XRL research and uses RL as a backbone to the development of Broad-XAI. CXF is designed to incorporate many standard RL extensions and integrated with external ontologies and communication facilities so that the agent can answer questions that explain outcomes its decisions. This paper aims to: establish XRL as a distinct branch of XAI; introduce a conceptual framework for XRL; review existing approaches explaining agent behaviour; and identify opportunities for future research. Finally, this paper discusses how additional information can be extracted and ultimately integrated into models of communication, facilitating the development of Broad-XAI. © 2023, The Author(s)
Assessing the impact of griefing in MMORPGs using self-determination theory
Toxic behavior has been impacting players in online multiplayer environments since their inception. Griefing is a type of toxic behavior that focuses on player-to-player in-game disruption and is quite prevalent. However, research into the extent of the impact is still scarce. The present study investigated the impact on the psychological needs of autonomy, competence, and relatedness, as defined by the self-determination theory, for players that perform griefing (the griefers) and those subjected to griefing (the griefed). A sample of 656 respondents from massively multiplayer online role-playing game communities participated in the study. The results discovered that for the majority of players there is no change to their wellbeing, but that when there was a change, the griefed players in general were impacted more negatively, and the perpetrators were impacted more positively. Significant associations also revealed that the magnitude of impacts increased as the player was subjected to or performed griefing more frequently. © 2024 The Author
The impact of environmental stochasticity on value-based multiobjective reinforcement learning
A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions. © 2021, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature
- …
