171 research outputs found
Mean-field games among teams
In this paper, we present a model of a game among teams. Each team consists
of a homogeneous population of agents. Agents within a team are cooperative
while the teams compete with other teams. The dynamics and the costs are
coupled through the empirical distribution (or the mean field) of the state of
agents in each team. This mean-field is assumed to be observed by all agents.
Agents have asymmetric information (also called a non-classical information
structure). We propose a mean-field based refinement of the Team-Nash
equilibrium of the game, which we call mean-field Markov perfect equilibrium
(MF-MPE). We identify a dynamic programming decomposition to characterize
MF-MPE. We then consider the case where each team has a large number of players
and present a mean-field approximation which approximates the game among
large-population teams as a game among infinite-population teams. We show that
MF-MPE of the game among teams of infinite population is easier to compute and
is an -approximate MF-MPE of the game among teams of finite
population.Comment: 20 page
Clinical Disease Activity and Radiological Damage in Early Rheumatoid Arthritis
Disease progression in rheumatoid arthritis (RA) is assessed by standard clinical, radiological and functional measures. Clinical disease activity in RA is graded as no disease (remission), low, moderate and high disease, based on validated criteria. Radiological progression in RA is monitored by serial x-rays of hands and feet, and by quantification of structural damage, using various scoring methods. This proves to be a valuable outcome measure in RA studies. RA patients with active disease usually develop progressive radiological damage. However, it has been shown that clinical disease activity may not correlate with radiological damage, particularly in early RA. Therefore, this thesis was mainly aimed to test the hypothesis that, „radiological damage can progress despite clinical disease inactivity or remission‟ and to investigate possible underlying mechanisms including disease heterogeneity, treatment effect and scoring methodology. Disease progression, outcomes and prognostic factors were analysed in an inception cohort of early RA (Early Rheumatoid Arthritis Study/ERAS) for this thesis. In this study of early RA patients, sustained remission was less frequent than remission at individual time points and baseline variables such as gender, duration of symptoms, disease activity (DAS) and health assessment questionnaire (HAQ) scores have shown predictive value for sustained remission. Structural damage on x-rays progressed despite clinical disease inactivity or remission in a subgroup of patients and disease heterogeneity was the most likely explanation for the disconnect between clinical disease activity and radiological damage in the ERAS cohort. This study has also found that scoring methods as well as reading order of x-ray films could influence radiographic progression in early RA, particularly at individual level. Male sex, rheumatoid factor (RF) and radiographic damage at baseline showed prognostic value in predicting radiographic progression despite remission. Study patients with persistent clinical disease inactivity have shown better radiological, surgical, functional, and other outcomes compared to relapsing-remitting or persistent disease activity. There was no significant difference in functional and other outcomes between patients in remission with x-ray progression and those in remission without xray progression. Therefore, x-rays of hands and feet at regular intervals are valuable in determining true disease progression in early RA, even during clinical disease inactivity. Scoring methodology in itself could have an influence on the type of radiographic progression in RA studies. Sustained disease inactivity in RA is more favourable than relapsingremitting disease
Loss of IP<sub>3</sub> receptor function in neuropeptide secreting neurons leads to obesity in adult Drosophila
Background: Intracellular calcium signaling regulates a variety of cellular and physiological processes. The inositol 1,4,5 trisphosphate receptor (IP3R) is a ligand gated calcium channel present on the membranes of endoplasmic reticular stores. In previous work we have shown that Drosophila mutants for the IP3R (itprku) become unnaturally obese as adults with excessive storage of lipids on a normal diet. While the phenotype manifests in cells of the fat body, genetic studies suggest dysregulation of a neurohormonal axis.
Results: We show that knockdown of the IP3R, either in all neurons or in peptidergic neurons alone, mimics known itpr mutant phenotypes. The peptidergic neuron domain includes, but is not restricted to, the medial neurosecretory cells as well as the stomatogastric nervous system. Conversely, expression of an itpr+ cDNA in the same set of peptidergic neurons rescues metabolic defects of itprku mutants. Transcript levels of a gene encoding a gastric lipase CG5932 (magro), which is known to regulate triacylglyceride storage, can be regulated by itpr knockdown and over-expression in peptidergic neurons. Thus, the focus of observed itpr mutant phenotypes of starvation resistance, increased body weight, elevated lipid storage and hyperphagia derive primarily from peptidergic neurons.
Conclusions: The present study shows that itpr function in peptidergic neurons is not only necessary but also sufficient for maintaining normal lipid metabolism in Drosophila. Our results suggest that intracellular calcium signaling in peptidergic neurons affects lipid metabolism by both cell autonomous and non-autonomous mechanisms
Counterfactual Explanation Policies in RL
As Reinforcement Learning (RL) agents are increasingly employed in diverse
decision-making problems using reward preferences, it becomes important to
ensure that policies learned by these frameworks in mapping observations to a
probability distribution of the possible actions are explainable. However,
there is little to no work in the systematic understanding of these complex
policies in a contrastive manner, i.e., what minimal changes to the policy
would improve/worsen its performance to a desired level. In this work, we
present COUNTERPOL, the first framework to analyze RL policies using
counterfactual explanations in the form of minimal changes to the policy that
lead to the desired outcome. We do so by incorporating counterfactuals in
supervised learning in RL with the target outcome regulated using desired
return. We establish a theoretical connection between Counterpol and widely
used trust region-based policy optimization methods in RL. Extensive empirical
analysis shows the efficacy of COUNTERPOL in generating explanations for
(un)learning skills while keeping close to the original policy. Our results on
five different RL environments with diverse state and action spaces demonstrate
the utility of counterfactual explanations, paving the way for new frontiers in
designing and developing counterfactual policies.Comment: ICML Workshop on Counterfactuals in Minds and Machines, 202
SARC: Soft Actor Retrospective Critic
The two-time scale nature of SAC, which is an actor-critic algorithm, is
characterised by the fact that the critic estimate has not converged for the
actor at any given time, but since the critic learns faster than the actor, it
ensures eventual consistency between the two. Various strategies have been
introduced in literature to learn better gradient estimates to help achieve
better convergence. Since gradient estimates depend upon the critic, we posit
that improving the critic can provide a better gradient estimate for the actor
at each time. Utilizing this, we propose Soft Actor Retrospective Critic
(SARC), where we augment the SAC critic loss with another loss term -
retrospective loss - leading to faster critic convergence and consequently,
better policy gradient estimates for the actor. An existing implementation of
SAC can be easily adapted to SARC with minimal modifications. Through extensive
experimentation and analysis, we show that SARC provides consistent improvement
over SAC on benchmark environments. We plan to open-source the code and all
experiment data at: https://github.com/sukritiverma1996/SARC.Comment: Accepted at RLDM 202
Explaining RL Decisions with Trajectories
Explanation is a key component for the adoption of reinforcement learning
(RL) in many real-world decision-making problems. In the literature, the
explanation is often provided by saliency attribution to the features of the RL
agent's state. In this work, we propose a complementary approach to these
explanations, particularly for offline RL, where we attribute the policy
decisions of a trained RL agent to the trajectories encountered by it during
training. To do so, we encode trajectories in offline training data
individually as well as collectively (encoding a set of trajectories). We then
attribute policy decisions to a set of trajectories in this encoded space by
estimating the sensitivity of the decision with respect to that set. Further,
we demonstrate the effectiveness of the proposed approach in terms of quality
of attributions as well as practical scalability in diverse environments that
involve both discrete and continuous state and action spaces such as
grid-worlds, video games (Atari) and continuous control (MuJoCo). We also
conduct a human study on a simple navigation task to observe how their
understanding of the task compares with data attributed for a trained RL
policy. Keywords -- Explainable AI, Verifiability of AI Decisions, Explainable
RL.Comment: Published at International Conference on Learning Representations
(ICLR), 202
Behavior Optimized Image Generation
The last few years have witnessed great success on image generation, which
has crossed the acceptance thresholds of aesthetics, making it directly
applicable to personal and commercial applications. However, images, especially
in marketing and advertising applications, are often created as a means to an
end as opposed to just aesthetic concerns. The goal can be increasing sales,
getting more clicks, likes, or image sales (in the case of stock businesses).
Therefore, the generated images need to perform well on these key performance
indicators (KPIs), in addition to being aesthetically good. In this paper, we
make the first endeavor to answer the question of "How can one infuse the
knowledge of the end-goal within the image generation process itself to create
not just better-looking images but also "better-performing'' images?''. We
propose BoigLLM, an LLM that understands both image content and user behavior.
BoigLLM knows how an image should look to get a certain required KPI. We show
that BoigLLM outperforms 13x larger models such as GPT-3.5 and GPT-4 in this
task, demonstrating that while these state-of-the-art models can understand
images, they lack information on how these images perform in the real world. To
generate actual pixels of behavior-conditioned images, we train a
diffusion-based model (BoigSD) to align with a proposed BoigLLM-defined reward.
We show the performance of the overall pipeline on two datasets covering two
different behaviors: a stock dataset with the number of forward actions as the
KPI and a dataset containing tweets with the total likes as the KPI, denoted as
BoigBench. To advance research in the direction of utility-driven image
generation and understanding, we release BoigBench, a benchmark dataset
containing 168 million enterprise tweets with their media, brand account names,
time of post, and total likes
- …
