99 research outputs found
Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
While recent advances in deep reinforcement learning have allowed autonomous
learning agents to succeed at a variety of complex tasks, existing algorithms
generally require a lot of training data. One way to increase the speed at
which agents are able to learn to perform tasks is by leveraging the input of
human trainers. Although such input can take many forms, real-time,
scalar-valued feedback is especially useful in situations where it proves
difficult or impossible for humans to provide expert demonstrations. Previous
approaches have shown the usefulness of human input provided in this fashion
(e.g., the TAMER framework), but they have thus far not considered
high-dimensional state spaces or employed the use of deep learning. In this
paper, we do both: we propose Deep TAMER, an extension of the TAMER framework
that leverages the representational power of deep neural networks in order to
learn complex tasks in just a short amount of time with a human trainer. We
demonstrate Deep TAMER's success by using it and just 15 minutes of
human-provided feedback to train an agent that performs better than humans on
the Atari game of Bowling - a task that has proven difficult for even
state-of-the-art reinforcement learning methods.Comment: 9 pages, 6 figure
Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time
This paper investigates how to utilize different forms of human interaction
to safely train autonomous systems in real-time by learning from both human
demonstrations and interventions. We implement two components of the
Cycle-of-Learning for Autonomous Systems, which is our framework for combining
multiple modalities of human interaction. The current effort employs human
demonstrations to teach a desired behavior via imitation learning, then
leverages intervention data to correct for undesired behaviors produced by the
imitation learner to teach novel tasks to an autonomous agent safely, after
only minutes of training. We demonstrate this method in an autonomous perching
task using a quadrotor with continuous roll, pitch, yaw, and throttle commands
and imagery captured from a downward-facing camera in a high-fidelity simulated
environment. Our method improves task completion performance for the same
amount of human interaction when compared to learning from demonstrations
alone, while also requiring on average 32% less data to achieve that
performance. This provides evidence that combining multiple modes of human
interaction can increase both the training speed and overall performance of
policies for autonomous systems.Comment: 9 pages, 6 figure
Violence Attribution Errors Among Low-Risk and High-Risk Offenders
Juvenile offenders have numerous factors that contribute to their delinquency, including family dysfunction, drug and alcohol abuse, negative peer influences, and social cognitive development. One area of social cognitive development linked to deviant behavior is attributional biases. Based on the prior research of Daley and Onwuegbuzie (2004), the purpose of the present concurrent mixed methods study was to explore the differences in the frequency of violence attribution errors among juvenile delinquents; the extent that peer-victimization, self-esteem, and demographic variables predict violence attribution errors among juveniles; and the differences in the types of violence attribution errors between incarcerated (high-risk) and probation (low-risk) juvenile delinquents.
The results indicated juvenile offenders made violence attribution errors more than 50% of the time when evaluating the behavior of others, suggesting that the low-risk offenders are at major risk of committing high-risk offenses in the future. The results of the multiple regression analysis indicated that 5 variables (i.e., attitude towards the violent acts of others, verbal victimization, attacks on property, social relationships, and morals) statistically predicted the number of violence attribution errors a youth made (F [21, 88] = 2.28,p = .004). Further, with regard to the typology of reasons for violence attributions, the same 7 emergent themes were extracted for all 3 offender samples: self-control, violation of rights, provocation, irresponsibility, poor judgment, fate, and conflict resolution. Findings are discussed relative to the literature on attributional bias and offender behavior
Development of a Practical Visual-Evoked Potential-Based Brain-Computer Interface
There are many different neuromuscular disorders that disrupt the normal communication pathways between the brain and the rest of the body. These diseases often leave patients in a `locked-in state, rendering them unable to communicate with their environment despite having cognitively normal brain function. Brain-computer interfaces (BCIs) are augmentative communication devices that establish a direct link between the brain and a computer. Visual evoked potential (VEP)- based BCIs, which are dependent upon the use of salient visual stimuli, are amongst the fastest BCIs available and provide the highest communication rates compared to other BCI modalities. However. the majority of research focuses solely on improving the raw BCI performance; thus, most visual BCIs still suffer from a myriad of practical issues that make them impractical for everyday use. The focus of this dissertation is on the development of novel advancements and solutions that increase the practicality of VEP-based BCIs. The presented work shows the results of several studies that relate to characterizing and optimizing visual stimuli. improving ergonomic design. reducing visual irritation, and implementing a practical VEP-based BCI using an extensible software framework and mobile devices platforms
Spectral Transfer Learning Using Information Geometry for a User-Independent Brain-Computer Interface
Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry, STIG), which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIG method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as outperform traditional within-subject calibration techniques when limited data is available. This method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system
COA-GPT: Generative Pre-trained Transformers for Accelerated Course of Action Development in Military Operations
The development of Courses of Action (COAs) in military operations is
traditionally a time-consuming and intricate process. Addressing this
challenge, this study introduces COA-GPT, a novel algorithm employing Large
Language Models (LLMs) for rapid and efficient generation of valid COAs.
COA-GPT incorporates military doctrine and domain expertise to LLMs through
in-context learning, allowing commanders to input mission information - in both
text and image formats - and receive strategically aligned COAs for review and
approval. Uniquely, COA-GPT not only accelerates COA development, producing
initial COAs within seconds, but also facilitates real-time refinement based on
commander feedback. This work evaluates COA-GPT in a military-relevant scenario
within a militarized version of the StarCraft II game, comparing its
performance against state-of-the-art reinforcement learning algorithms. Our
results demonstrate COA-GPT's superiority in generating strategically sound
COAs more swiftly, with added benefits of enhanced adaptability and alignment
with commander intentions. COA-GPT's capability to rapidly adapt and update
COAs during missions presents a transformative potential for military planning,
particularly in addressing planning discrepancies and capitalizing on emergent
windows of opportunities.Comment: Accepted at the NATO Science and Technology Organization Symposium
(ICMCIS) organized by the Information Systems Technology (IST) Panel,
IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 202
- …
