Search CORE

543 research outputs found

FedDisco: Federated Learning with Discrepancy-Aware Collaboration

Author: Chen Siheng
Wang Jianyu
Wang Yanfeng
Xu Chenxin
Xu Mingkai
Ye Rui
Publication venue
Publication date: 30/05/2023
Field of study

This work considers the category distribution heterogeneity in federated learning. This issue is due to biased labeling preferences at multiple clients and is a typical setting of data heterogeneity. To alleviate this issue, most previous works consider either regularizing local models or fine-tuning the global model, while they ignore the adjustment of aggregation weights and simply assign weights based on the dataset size. However, based on our empirical observations and theoretical analysis, we find that the dataset size is not optimal and the discrepancy between local and global category distributions could be a beneficial and complementary indicator for determining aggregation weights. We thus propose a novel aggregation method, Federated Learning with Discrepancy-aware Collaboration (FedDisco), whose aggregation weights not only involve both the dataset size and the discrepancy value, but also contribute to a tighter theoretical upper bound of the optimization error. FedDisco also promotes privacy-preservation, communication and computation efficiency, as well as modularity. Extensive experiments show that our FedDisco outperforms several state-of-the-art methods and can be easily incorporated with many existing methods to further enhance the performance. Our code will be available at https://github.com/MediaBrain-SJTU/FedDisco.Comment: Accepted by International Conference on Machine Learning (ICML2023

arXiv.org e-Print Archive

Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot, Generalizable Approach using RGB Images

Author: Fan Zhiwen
Feng Brandon Y.
Li Chenxin
Pan Panwang
Wang Peihao
Wang Zhangyang
Publication venue
Publication date: 13/06/2023
Field of study

The accurate estimation of six degrees-of-freedom (6DoF) object poses is essential for many applications in robotics and augmented reality. However, existing methods for 6DoF pose estimation often depend on CAD templates or dense support views, restricting their usefulness in realworld situations. In this study, we present a new cascade framework named Cas6D for few-shot 6DoF pose estimation that is generalizable and uses only RGB images. To address the false positives of target object detection in the extreme few-shot setting, our framework utilizes a selfsupervised pre-trained ViT to learn robust feature representations. Then, we initialize the nearest top-K pose candidates based on similarity score and refine the initial poses using feature pyramids to formulate and update the cascade warped feature volume, which encodes context at increasingly finer scales. By discretizing the pose search range using multiple pose bins and progressively narrowing the pose search range in each stage using predictions from the previous stage, Cas6D can overcome the large gap between pose candidates and ground truth poses, which is a common failure mode in sparse-view scenarios. Experimental results on the LINEMOD and GenMOP datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2% and 3.8% accuracy (Proj-5) under the 32-shot setting compared to OnePose++ and Gen6D

arXiv.org e-Print Archive

Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

Author: Chen Siheng
Tan Robby T.
Tan Yuhong
Wang Xinchao
Wang Yanfeng
Xu Chenxin
Publication venue
Publication date: 17/08/2023
Field of study

Exploring spatial-temporal dependencies from observed motions is one of the core challenges of human motion prediction. Previous methods mainly focus on dedicated network structures to model the spatial and temporal dependencies. This paper considers a new direction by introducing a model learning framework with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise and the goal is to recover corrupted coordinates depending on the rest coordinates. To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies. Through auxiliary tasks, the auxiliary-adapted transformer is promoted to capture more comprehensive spatial-temporal dependencies among body joints' coordinates, leading to better feature learning. Extensive experimental results have shown that our method outperforms state-of-the-art methods by remarkable margins of 7.2%, 3.7%, and 9.4% in terms of 3D mean per joint position error (MPJPE) on the Human3.6M, CMU Mocap, and 3DPW datasets, respectively. We also demonstrate that our method is more robust under data missing cases and noisy data cases. Code is available at https://github.com/MediaBrain-SJTU/AuxFormer.Comment: Accpeted to ICCV202

arXiv.org e-Print Archive

EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

Author: Chen Siheng
Tan Robby T.
Tan Yuhong
Wang Xinchao
Wang Yanfeng
Wang Yu Guang
Xu Chenxin
Publication venue
Publication date: 27/03/2023
Field of study

Learning to predict agent motions with relationship reasoning is important for many applications. In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle. However, such equivariance and invariance properties are overlooked by most existing methods. To fill this gap, we propose EqMotion, an efficient equivariant motion prediction model with invariant interaction reasoning. To achieve motion equivariance, we propose an equivariant geometric feature learning module to learn a Euclidean transformable feature through dedicated designs of equivariant operations. To reason agent's interactions, we propose an invariant interaction reasoning module to achieve a more stable interaction modeling. To further promote more comprehensive motion features, we propose an invariant pattern feature learning module to learn an invariant pattern feature, which cooperates with the equivariant geometric feature to enhance network expressiveness. We conduct experiments for the proposed model on four distinct scenarios: particle dynamics, molecule dynamics, human skeleton motion prediction and pedestrian trajectory prediction. Experimental results show that our method is not only generally applicable, but also achieves state-of-the-art prediction performances on all the four tasks, improving by 24.0/30.1/8.6/9.2%. Code is available at https://github.com/MediaBrain-SJTU/EqMotion.Comment: Accepted to CVPR 202

arXiv.org e-Print Archive

Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning

Author: Chen Siheng
Dong Xiaowen
Tang Shuo
Wang Yanfeng
Xu Chenxin
Ye Rui
Publication venue
Publication date: 11/03/2024
Field of study

Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server, with each agent solving varied tasks over time. To achieve efficient collaboration, agents should: i) autonomously identify beneficial collaborative relationships in a decentralized manner; and ii) adapt to dynamically changing task observations. In this paper, we propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs. To promote autonomous collaboration relationship learning, we propose a decentralized graph structure learning algorithm, eliminating the need for external priors. To facilitate adaptation to dynamic tasks, we design a memory unit to capture the agents' accumulated learning history and knowledge, while preserving finite storage consumption. To further augment the system's expressive capabilities and computational efficiency, we apply algorithm unrolling, leveraging the advantages of both mathematical optimization and neural networks. This allows the agents to `learn to collaborate' through the supervision of training tasks. Our theoretical analysis verifies that inter-agent collaboration is communication efficient under a small number of communication rounds. The experimental results verify its ability to facilitate the discovery of collaboration strategies and adaptation to dynamic learning scenarios, achieving a 98.80% reduction in MSE and a 188.87% improvement in classification accuracy. We expect our work can serve as a foundational technique to facilitate future works towards an intelligent, decentralized, and dynamic multi-agent system. Code is available at https://github.com/ShuoTang123/DeLAMA.Comment: 23 pages, 15 figure

arXiv.org e-Print Archive

Language-Driven Interactive Traffic Trajectory Generation

Author: Chen Siheng
Wang Yanfeng
Xia Junkai
Xie Chen
Xu Chenxin
Xu Qingyao
Publication venue
Publication date: 24/05/2024
Field of study

Realistic trajectory generation with natural language control is pivotal for advancing autonomous vehicle technology. However, previous methods focus on individual traffic participant trajectory generation, thus failing to account for the complexity of interactive traffic dynamics. In this work, we propose InteractTraj, the first language-driven traffic trajectory generator that can generate interactive traffic trajectories. InteractTraj interprets abstract trajectory descriptions into concrete formatted interaction-aware numerical codes and learns a mapping between these formatted codes and the final interactive trajectories. To interpret language descriptions, we propose a language-to-code encoder with a novel interaction-aware encoding strategy. To produce interactive traffic trajectories, we propose a code-to-trajectory decoder with interaction-aware feature aggregation that synergizes vehicle interactions with the environmental map and the vehicle moves. Extensive experiments show our method demonstrates superior performance over previous SoTA methods, offering a more realistic generation of interactive traffic trajectories with high controllability via diverse natural language commands. Our code is available at https://github.com/X1a-jk/InteractTraj.gi

arXiv.org e-Print Archive

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

Author: Chen Siheng
Liu Changxing
Lu Yifan
Wang Yanfeng
Wang Zi
Wei Yuxi
Xu Chenxin
Zhao Hao
Publication venue
Publication date: 26/06/2024
Field of study

Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first system that enables editable photo-realistic 3D driving scene simulations via natural language commands with external digital assets. To enable editing with high command flexibility,~ChatSim leverages a large language model (LLM) agent collaboration framework. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field method. Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering. Our experiments on Waymo Open Dataset demonstrate that ChatSim can handle complex language commands and generate corresponding photo-realistic scene videos.Comment: CVPR 2024(Highlight

arXiv.org e-Print Archive

Knowledge-Based Approach to Assembly Sequence Planning for Wind-Driven Generator

Author: Chenxin Wang
Meiping Wu
Yi Zhao
Publication venue: Hindawi Limited
Publication date: 01/01/2013
Field of study

Assembly sequence planning plays an essential role in the manufacturing industry. However, there still exist some challenges for the research of assembly planning, one of which is the weakness in effective description of assembly knowledge and information. In order to reduce the computational task, this paper presents a novel approach based on engineering assembly knowledge to the assembly sequence planning problem and provides an appropriate way to express both geometric information and nongeometric knowledge. In order to increase the sequence planning efficiency, the assembly connection graph is built according to the knowledge in engineering, design, and manufacturing fields. Product semantic information model could offer much useful information for the designer to finish the assembly (process) design and make the right decision in that process. Therefore, complex and low-efficient computation in the assembly design process could be avoided. Finally, a product assembly planning example is presented to illustrate the effectiveness of the proposed approach. Initial experience with the approach indicates the potential to reduce lead times and thereby can help in completing new product launch projects on time.</jats:p

Crossref

Directory of Open Access Journals