Search CORE

206 research outputs found

Three-level performance optimization for heterogeneous systems based on software prefetching under power constraints

Author: Cheng Lianglun
Wang Hao
Wang Zhuowei
Zhao Wuqing
Publication venue: Elsevier
Publication date: 01/01/2018
Field of study

High power consumption has become one of the critical problems restricting the development of high-performance computers. Recently, there are numerous studies on optimizing the execution performance while satisfying the power constraint in recent years. However, these methods mainly focus on homogeneous systems without considering the power or speed difference of heterogeneous processors, so it is difficult to apply these methods in the heterogeneous systems with an accelerator. In this paper, by abstracting the current execution model of a heterogeneous system, we propose a new framework for managing the system power consumption with a three-level power control mechanism. The three levels from top to bottom are: system-level power controller (SPC), group-level power controller (GPC) and unit-level power controller (UPC). The study establishes a power management method for software prefetch in UPC to scale frequency and voltage of programs, select the optimal prefetch distance and guide optimization process to satisfy the constraint boundary according to power constraints. The strategy for dividing power based on key threads is put forward in GPC to preferentially allocate power to threads in key paths. In SPC, a method for evaluating the performance of heterogeneous processing engines is designed for dividing power in order to improve the overall execution performance of the system while sustaining the fairness between concurrent applications. Finally, the proposed framework is verified on a central processing unit (CPU)-graphics processing unit (GPU) heterogeneous system.submittedVersionPublisher embargo until September 2020 (c) This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0

Crossref

NTNU Open (Norwegian University of Science and Technology)

Energy optimization of parallel programs in a heterogeneous system by combining processor core-shutdown and dynamic voltage scaling

Author: Cheng Lianglun
Wang Hao
Wang Zhuowei
Zhao Wuqing
Publication venue: Elsevier
Publication date: 01/01/2019
Field of study

Reducing power consumption and improving efficiency are important aspects of the development of supercomputers into large-scale systems. As a result, heterogeneous systems have become an important development trend in high-performance computing. From the perspective of heterogeneous systems, this study establishes a model for energy optimization of parallel programs (EOPP) and puts forward a method of using it. By considering the energy overheads caused by re-synchronization, voltage switching, and operations in critical sections, the model effectively combines processor core-shutdown and dynamic voltage scaling technologies, which can be applied in a heterogeneous system to guide the optimization process. The results show that the proposed model can effectively reduce the energy consumption of parallel programs. Moreover, increasing the proportion of operations in the critical section enhances the optimal frequency of a processor while decreasing the probability of conflicts in the critical section. It can thus provide optimization space for reducing the frequency of a processor which ultimately reduces the energy overhead of the system.acceptedVersionPublisher embargo until March 2021 (c) This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0

NTNU Open (Norwegian University of Science and Technology)

Stackelberg Game-Based Joint Computing Resource Allocation and Task Offloading Method in Edge Computing

Author: Chai Yuan
Chen Quan
Cheng Lianglun
Zeng Xiaojun
Publication venue
Publication date: 01/06/2025
Field of study

Edge computing (EC) has emerged as an important technology to support the low-delay request of massive devices nowadays. Task offloading is an essential part in EC because it can influence the use of network resources and network performance dramatically. Most existing task offloading works are only from the view of users. To effectively considering the features and objectives of both users and edge nodes from their different perspectives, a Stackelberg game-based joint computing resource allocation and task offloading method is proposed in this paper. For the nature in EC where edge nodes and users play different roles, the problem is formulated as a bi-level optimization model with multiple leaders and multiple followers. The edge nodes can be seen as leaders and the users are followers. When jointly allocating computing resource and offloading tasks, edge nodes and users have different objectives. The objective of edge nodes is to achieve the most revenue and least energy cost, and the objective of users is to obtain short delay, consume little energy and pay less. Further, considering the particular features of EC, unlike existing Stackelberg game-based task offloading research, we focus on the computing resource allocation rather than pricing. The edge nodes decide the amount of computing resources to be allocated to each user. The users will then react according to such allocation to decide task offloading strategies. Interference, delay, energy, and payoff are all considered. Evolutionary optimization method BLEAQ-II is applied to solve the designed Stackelberg game-based task offloading model. Numerical results have shown the effectiveness of the proposed method.<br/

The University of Manchester - Institutional Repository

Whole procedure heterogeneous multiprocessors low-power optimization at algorithm-level

Author: Cheng Lianglun
Wang Hao
Wang Zhuowei
Xiong Naixue
Zhao Wuqing
Publication venue: Springer
Publication date: 01/01/2018
Field of study

Power consumption reduction is the primary problem for the design and implementation of heterogeneous parallel systems. As it is difficult to make progress in the low-power optimization in the hardware layer to meet the increasing need for power optimization, more attention has been paid to low-power optimization in the hardware layer. The relationship between the execution time and dynamic power consumption of programs divided between homogeneous and heterogeneous computing sections is analysed. In addition, the communication power consumption for data transmission and dynamic multi-task allocation are described. Afterwards, this study establishes a power model for the whole procedure of heterogeneous parallel systems. By using this model, a selection algorithm is designed for the optimal frequency of processors with optimal power consumption under time constraints, optimal descent-based time allocation algorithms in multiple computing sections, and profiling dynamic analysis-based integral linear programming at algorithm-level, separately. Finally, the validity of the power optimization algorithm is ascertained using typical applications.submittedVersionhis is a pre-print of an article published in Cluster Comput (2018). The final authenticated version is available online at: https://doi.org/10.1007/s10586-018-1920-

Crossref

NTNU Open (Norwegian University of Science and Technology)

Recommended from our members

Improving Cognitive Capability of Large Language Model: A Multi-Step Symbolic Reasoning Approach

Author: Chen Chong
Cheng Lianglun
Wang Tao
Wang Zhuowei
Zhai Jinkun
Publication venue: eScholarship, University of California
Publication date: 01/01/2025
Field of study

The emergence of large language model (LLM) has promoted the research progress in many fields, but it still faces challenges in imitating human logical reasoning, especially in the step-by-step reasoning of complex tasks and zero-shot logical cognition. To address these challenges, we propose a multi-step symbolic reasoning strategy that decomposes complex tasks into subtasks and optimizes the decomposition using a subtask verification module. Moreover, we also introduce a new zero-shot symbolic module which can help improve the model's reasoning ability on unseen samples with symbolic representation and logical schemes. We evaluated our method on four reasoning datasets: the industrial private dataset Ship Assembly Technology and the public datasets ProntoQA, ProofWriter, and OpenBookQA. Our framework demonstrates substantial improvements in reasoning interpretability and generalization capacity compared to existing prompting paradigms. The proposed method establishes a new pathway for enhancing LLMs' cognitive architectures through symbolic system integration, showing strong potential for efficient knowledge transfer to downstream applications while preserving human-understandable reasoning traces

eScholarship - University of California

Energy Optimization by Software Prefetching for Task Granularity in GPU-based Embedded Systems

Author: Cheng Lianglun
Song Xiaoyu
Wang Hao
Wang Zhuowei
Zhao Wuqing
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2019
Field of study

Energy saving and optimization play an increasingly important role in industrial electronic systems. A heterogeneous embedded system is composed of a general-purpose central processing unit (CPU) with an enhanced module of graphics processing units (GPU). This paper explores the effective strategies of task granularity and software prefetching for energy optimization. We propose a novel energy optimization model for GPU-based embedded systems by harnessing a communication-based pipeline spatial and temporal relation. We analyze the characteristics of a multiple thread execution of parallel GPUs. We present an effective algorithm for the dynamic power optimization with the adaptively adjusted distance of software prefetching. The experimental results show that the dynamic energy consumption can be saved by 22.1% and 21.8% respectively under two prefetching strategies (register and shared memory) without loss of performance. We demonstrate the effectiveness of the proposed methods for energy saving and consumption reduction of performance driven computing in industrial scenarios.acceptedVersion© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

NTNU Open (Norwegian University of Science and Technology)

Warp-Aware Adaptive Energy Efficiency Calibration for Multi-GPU Systems

Author: Cheng Lianglun
Song Xiaoyu
Wan Hai
Wang Tao
Wang Zhuowei
Zhao Wuqing
Publication venue: PDXScholar
Publication date: 01/01/2022
Field of study

Massive GPU acceleration processors have been used in high-performance computing systems. The Dennard-scaling has led to power and thermal constraints limiting the performance of such systems. The demand for both increased performance and energy-efficiency is highly desired. This paper presents a multi-layer low-power optimisation method for warps and tasks parallelisms. We present a dynamic frequency regulation scheme for performance parameters in terms of load balance and load imbalance. The method monitors the energy parameters in runtime and adjusts adaptively the voltage level to ensure the performance efficiency with energy reduction. The experimental results show that the multi-layer low-power optimisation with dynamic frequency regulation can achieve 40% energy consumption reduction with only 1.6% performance degradation, thus reducing 59% maximum energy consumption. It can further save about 30% energy consumption in comparison with the single-layer energy optimisation

Crossref

PDXScholar (Portland State University)

Spectrum resource allocation method of maximizing transmission rate in cognitive heterogeneous wireless networks

Author: Gengzhong ZHENG
Lianglun CHENG
Tao WANG
Xiaoqing DONG
Publication venue: Editorial Department of Journal on Communications
Publication date: 01/09/2019
Field of study

Aiming at the problem that it is difficult to allocate spectrum resources to secondary users efficiently in cognitive heterogeneous wireless networks with heterogeneous spectrum attributes,dynamic channel conditions and diverse service requirements,a spectrum resource allocation strategy with maximum transmission rate was proposed.Firstly,the strategy aimed at maximizing the total transmission rate,and constrained the limited spectrum resources and user service requirements to construct a non-linear multi-constrained spectrum resource allocation 0-1 planning model.Then a polynomial time complexity simplification method was designed.According to idle spectrum information,channel conditions,business requirements and allocation decision history information,and the benefit matrix was constructed and modified to achieve constraint simplification,and the execution efficiency was improved by improving the coefficient matrix transformation strategy of the traditional Hungarian algorithm.Finally,the performance of the method was compared and analyzed by experiments.Experimental results show that the proposed method has higher transmission rate and execution efficiency

Directory of Open Access Journals

Model-agnostic meta-learning for fault diagnosis of industrial robots

Author: Chen Chong
Cheng Lianglun
Liu Yuxin
Qin Jian
Wang Tao
Publication venue: IEEE
Publication date: 16/10/2023
Field of study

The success of deep learning in the field of fault diagnosis depends on a large number of training data, but it is a challenge to achieve fault diagnosis of multi-axis industrial robots in the case of few-shot. To address this issue, this paper proposes a method called Model-Agnostic Meta-Learning (MAML) for fault diagnosis of industrial robots. Its goal is to train an effective industrial robot fault classifier using minimal training data. Additionally, it can learn to recognize faults in new scenarios with high accuracy based on the training data. Experimental results based on a six-axis industrial robot dataset show that the proposed method is superior to traditional convolutional neural network (CNN) and transfer learning, and that the diagnostic results with the same amount of data in few-shot cases are better than existing intelligent fault diagnosis methods

CERES Research Repository (Cranfield Univ.)