Search CORE

2,323 research outputs found

DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation

Author: Guo Xinghong
Liu Jinxin
Wang Donglin
Zhuang Zifeng
Publication venue
Publication date: 23/05/2024
Field of study

In this paper, we propose a novel approach called DIffusion-guided DIversity (DIDI) for offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a mixture of label-free offline data. We achieve this by leveraging diffusion probabilistic models as priors to guide the learning process and regularize the policy. By optimizing a joint objective that incorporates diversity and diffusion-guided regularization, we encourage the emergence of diverse behaviors while maintaining the similarity to the offline data. Experimental results in four decision-making domains (Push, Kitchen, Humanoid, and D4RL tasks) show that DIDI is effective in discovering diverse and discriminative skills. We also introduce skill stitching and skill interpolation, which highlight the generalist nature of the learned skill space. Further, by incorporating an extrinsic reward function, DIDI enables reward-guided behavior generation, facilitating the learning of diverse and optimal behaviors from sub-optimal data.Comment: ICML202

arXiv.org e-Print Archive

An Effective Software Risk Prediction Management Analysis of Data Using Machine Learning and Data Mining Method

Author: Li Ruisi
Wang Yue
Wang Ziyue
Xu Jinxin
Zhao Qian
Publication venue
Publication date: 29/06/2024
Field of study

For one to guarantee higher-quality software development processes, risk management is essential. Furthermore, risks are those that could negatively impact an organization's operations or a project's progress. The appropriate prioritisation of software project risks is a crucial factor in ascertaining the software project's performance features and eventual success. They can be used harmoniously with the same training samples and have good complement and compatibility. We carried out in-depth tests on four benchmark datasets to confirm the efficacy of our CIA approach in closed-world and open-world scenarios, with and without defence. We also present a sequential augmentation parameter optimisation technique that captures the interdependencies of the latest deep learning state-of-the-art WF attack models. To achieve precise software risk assessment, the enhanced crow search algorithm (ECSA) is used to modify the ANFIS settings. Solutions that very slightly alter the local optimum and stay inside it are extracted using the ECSA. ANFIS variable when utilising the ANFIS technique. An experimental validation with NASA 93 dataset and 93 software project values was performed. This method's output presents a clear image of the software risk elements that are essential to achieving project performance. The results of our experiments show that, when compared to other current methods, our integrative fuzzy techniques may perform more accurately and effectively in the evaluation of software project risks

arXiv.org e-Print Archive

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

Author: Chen Zhengyu
Liu Jinxin
Tian Qiangxing
Wang Donglin
Publication venue
Publication date: 13/12/2021
Field of study

It is of significance for an agent to learn a widely applicable and general-purpose policy that can achieve diverse goals including images and text descriptions. Considering such perceptually-specific goals, the frontier of deep reinforcement learning research is to learn a goal-conditioned policy without hand-crafted rewards. To learn this kind of policy, recent works usually take as the reward the non-parametric distance to a given goal in an explicit embedding space. From a different viewpoint, we propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM), which jointly learns both an abstract-level policy and a goal-conditioned policy. The abstract-level policy is conditioned on a latent variable to optimize a discriminator and discovers diverse states that are further rendered into perceptually-specific goals for the goal-conditioned policy. The learned discriminator serves as an intrinsic reward function for the goal-conditioned policy to imitate the trajectory induced by the abstract-level policy. Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method which substantially outperforms prior techniques.Comment: Accepted by AAAI-2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning

Author: He Li
Liu Jinxin
Wang Donglin
Zu Lipeng
Publication venue
Publication date: 23/06/2023
Field of study

Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and labeled datasets, which eliminates the time-consuming data collection in online RL. However, offline RL still bears a large burden of specifying/handcrafting extrinsic rewards for each transition in the offline data. As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve that, we introduce \textbf{C}alibrated \textbf{L}atent g\textbf{U}idanc\textbf{E} (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space such that intrinsic rewards can be directly qualified over the latent space. CLUE's key idea is to align the intrinsic rewards consistent with the expert intention via enforcing the embeddings of expert data to a calibrated contextual representation. We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks. Empirically, we find that CLUE can effectively improve the sparse-reward offline RL performance, outperform the state-of-the-art offline IL baselines, and discover diverse skills from static reward-free offline data

arXiv.org e-Print Archive

Modeling and Analyzing for the Friction Torque of a Sliding Bearing Based on Grey System Theory

Author: Baoming Wang
Jinxin Xu
ShengSheng Chen
Zaixin Wu
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2014
Field of study

Based on the grey system theory, the grey relational analysis method is proposed and used in analyzing the influence of various factors on the friction torque of a sliding bearing. On the basis of the grey relational analysis the multidimensional grey model GM(1,N,D) for the friction torque of a sliding bearing is built up. Taking Al-based alloy sliding bearing as an example, the calculation results show that, compared with other influence factors, friction coefficient, load, temperature and rotational speed have more significant influence on the bearing friction torque.Comparingexperimental results and the calculated value of the GM(1,N,D) model based on these important influence factors, the maximum relative residuals is 9.09%, the average relative residuals is 7.9% and the accuracy is 92.1%. It verify that GM(1,N,D) model has good accuracy and is applicable for predicting friction torque of a sliding bearing

Indonesian Journal of Electrical Engineering and Computer Science

Improving Offline-to-Online Reinforcement Learning with Q Conditioned State Entropy Exploration

Author: Liu Jinxin
Wang Donglin
Xiong Xiao
Zhang Ziqi
Zhuang Zifeng
Publication venue
Publication date: 28/05/2024
Field of study

Studying how to fine-tune offline reinforcement learning (RL) pre-trained policy is profoundly significant for enhancing the sample efficiency of RL algorithms. However, directly fine-tuning pre-trained policies often results in sub-optimal performance. This is primarily due to the distribution shift between offline pre-training and online fine-tuning stages. Specifically, the distribution shift limits the acquisition of effective online samples, ultimately impacting the online fine-tuning performance. In order to narrow down the distribution shift between offline and online stages, we proposed Q conditioned state entropy (QCSE) as intrinsic reward. Specifically, QCSE maximizes the state entropy of all samples individually, considering their respective Q values. This approach encourages exploration of low-frequency samples while penalizing high-frequency ones, and implicitly achieves State Marginal Matching (SMM), thereby ensuring optimal performance, solving the asymptotic sub-optimality of constraint-based approaches. Additionally, QCSE can seamlessly integrate into various RL algorithms, enhancing online fine-tuning performance. To validate our claim, we conduct extensive experiments, and observe significant improvements with QCSE (about 13% for CQL and 8% for Cal-QL). Furthermore, we extended experimental tests to other algorithms, affirming the generality of QCSE

arXiv.org e-Print Archive

Mass Transfer Performance of a Water-Sparged Aerocyclone Reactor and Its Application in Wastewater Treatment

Author: Fuping Wang
Jinxin Xiang
Qinghua Zhao
Xuejun Quan
Zhiliang Cheng
Publication venue: 'IntechOpen'
Publication date: 26/10/2011
Field of study

IntechOpen

Crossref

Beyond Reward: Offline Preference-guided Policy Optimization

Author: He Li
Kang Yachen
Liu Jinxin
Shi Diyuan
Wang Donglin
Publication venue
Publication date: 09/06/2023
Field of study

This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a variant of conventional reinforcement learning that dispenses with the need for online interaction or specification of reward functions. Instead, the agent is provided with fixed offline trajectories and human preferences between pairs of trajectories to extract the dynamics and task information, respectively. Since the dynamics and task information are orthogonal, a naive approach would involve using preference-based reward learning followed by an off-the-shelf offline RL algorithm. However, this requires the separate learning of a scalar reward function, which is assumed to be an information bottleneck of the learning process. To address this issue, we propose the offline preference-guided policy optimization (OPPO) paradigm, which models offline trajectories and preferences in a one-step process, eliminating the need for separately learning a reward function. OPPO achieves this by introducing an offline hindsight information matching objective for optimizing a contextual policy and a preference modeling objective for finding the optimal context. OPPO further integrates a well-performing decision policy by optimizing the two objectives iteratively. Our empirical results demonstrate that OPPO effectively models offline preferences and outperforms prior competing baselines, including offline RL algorithms performed over either true or pseudo reward function specifications. Our code is available on the project website: https://sites.google.com/view/oppo-icml-2023

arXiv.org e-Print Archive