Search CORE

583 research outputs found

Transformations of High-Level Synthesis Codes for High-Performance Computing

Author: Besta Maciej
Hoefler Torsten
Licht Johannes de Fine
Meierhans Simon
Publication venue
Publication date: 29/10/2019
Field of study

Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

arXiv.org e-Print Archive

Repository for Publications and Research Data

SDNsec: Forwarding Accountability for the SDN Data Plane

Author: Hoefler Torsten
Lee Taeho
Pappas Christos
Perrig Adrian
Sasaki Takayuki
Publication venue
Publication date: 15/05/2016
Field of study

SDN promises to make networks more flexible, programmable, and easier to manage. Inherent security problems in SDN today, however, pose a threat to the promised benefits. First, the network operator lacks tools to proactively ensure that policies will be followed or to reactively inspect the behavior of the network. Second, the distributed nature of state updates at the data plane leads to inconsistent network behavior during reconfigurations. Third, the large flow space makes the data plane susceptible to state exhaustion attacks. This paper presents SDNsec, an SDN security extension that provides forwarding accountability for the SDN data plane. Forwarding rules are encoded in the packet, ensuring consistent network behavior during reconfigurations and limiting state exhaustion attacks due to table lookups. Symmetric-key cryptography is used to protect the integrity of the forwarding rules and enforce them at each switch. A complementary path validation mechanism allows the controller to reactively examine the actual path taken by the packets. Furthermore, we present mechanisms for secure link-failure recovery and multicast/broadcast forwarding.Comment: 14 page

arXiv.org e-Print Archive

Crossref

The Convergence of Sparsified Gradient Methods

Author: Alistarh Dan
Hoefler Torsten
Johansson Mikael
Khirirat Sarit
Konstantinov Nikola
Renggli Cédric
Publication venue
Publication date: 01/01/2018
Field of study

Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization, large-batch methods, and gradient sparsification, have been proposed. To date, gradient sparsification methods - where each node sorts gradients by magnitude, and only communicates a subset of the components, accumulating the rest locally - are known to yield some of the largest practical gains. Such methods can reduce the amount of communication per step by up to three orders of magnitude, while preserving model accuracy. Yet, this family of methods currently has no theoretical justification. This is the question we address in this paper. We prove that, under analytic assumptions, sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD. The main insight is that sparsification methods implicitly maintain bounds on the maximum impact of stale updates, thanks to selection by magnitude. Our analysis and empirical validation also reveal that these methods do require analytical conditions to converge well, justifying existing heuristics.Comment: NIPS 2018 - Advances in Neural Information Processing Systems; Authors in alphabetic orde

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)