163 research outputs found

    Multi-GPU Acceleration of the iPIC3D Implicit Particle-in-Cell Code

    Full text link
    iPIC3D is a widely used massively parallel Particle-in-Cell code for the simulation of space plasmas. However, its current implementation does not support execution on multiple GPUs. In this paper, we describe the porting of iPIC3D particle mover to GPUs and the optimization steps to increase the performance and parallel scaling on multiple GPUs. We analyze the strong scaling of the mover on two GPU clusters and evaluate its performance and acceleration. The optimized GPU version which uses pinned memory and asynchronous data prefetching outperform their corresponding CPU versions by 5-10x on two different systems equipped with NVIDIA K80 and V100 GPUs.Comment: Accepted for publication in ICCS 201

    Where should MMS look for electron diffusion regions?

    Full text link
    A great possible achievement for the MMS mission would be crossing electron diffusion regions (EDR). EDR are regions in proximity of reconnection sites where electrons decouple from field lines, breaking the frozen in condition. Decades of research on reconnection have produced a widely shared map of where EDRs are. We expect reconnection to take place around a so called x-point formed by the intersection of the separatrices dividing inflowing from outflowing plasma. The EDR forms around this x-point as a small electron scale box nested inside a larger ion diffusion region. But this point of view is based on a 2D mentality. We have recently proposed that once the problem is considered in full 3D, secondary reconnection events can form [Lapenta et al., Nature Physics, 11, 690, 2015] in the outflow regions even far downstream from the primary reconnection site. We revisit here this new idea confirming that even using additional indicators of reconnection and even considering longer periods and wider distances the conclusion remains true: secondary reconnection sites form downstream of a reconnection outflow causing a sort of chain reaction of cascading reconnection sites. If we are right, MMS will have an interesting journey even when not crossing necessarily the primary site. The chances are greatly increased that even if missing a primary site during an orbit, MMS could stumble instead on one of these secondary sites.Comment: submitted to the Astronum 2015 Conference Proceeding

    TensorFlow Doing HPC

    Full text link
    TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.Comment: Accepted for publication at The Ninth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES'19

    Čerenkov emission of quasiparallel whistlers by fast electron phase-space holes during magnetic reconnection

    Get PDF
    Kinetic simulations of magnetotail reconnection have revealed electromagnetic whistlers originating near the exhaust boundary and propagating into the inflow region. The whistler production mechanism is not a linear instability, but rather is Cerenkov emission of almost parallel whistlers from localized moving clumps of charge (finite-size quasiparticles) associated with nonlinear coherent electron phase space holes. Whistlers are strongly excited by holes without ever growing exponentially. In the simulation the whistlers are emitted in the source region from holes that accelerate down the magnetic separatrix towards the x line. The phase velocity of the whistlers upsilon(phi) in the source region is everywhere well matched to the hole velocity upsilon(H) as required by the Cerenkov condition. The simulation shows emission is most efficient near the theoretical maximum upsilon(phi) = half the electron Alfven speed, consistent with the new theoretical prediction that faster holes radiate more efficiently. While transferring energy to whistlers the holes lose coherence and dissipate over a few local ion inertial lengths. The whistlers, however, propagate to the x line and out over many 10's of ion inertial lengths into the inflow region of reconnection. As the whistlers pass near the x line they modulate the rate at which magnetic field lines reconnect.</p

    NVIDIA Tensor Core Programmability, Performance & Precision

    Full text link
    The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and the precision loss due to computation in mixed precision. Currently, NVIDIA provides three different ways of programming matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS GEMM. After experimenting with different approaches, we found that NVIDIA Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100 GPU, seven and three times the performance in single and half precision respectively. A WMMA implementation of batched GEMM reaches a performance of 4 Tflops/s. While precision loss due to matrix multiplication with half precision input might be critical in many HPC applications, it can be considerably reduced at the cost of increased computation. Our results indicate that HPC applications using matrix multiplications can strongly benefit from using of NVIDIA Tensor Cores.Comment: This paper has been accepted by the Eighth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 201

    Signatures of Secondary Collisionless Magnetic Reconnection Driven by Kink Instability of a Flux Rope

    Full text link
    The kinetic features of secondary magnetic reconnection in a single flux rope undergoing internal kink instability are studied by means of three-dimensional Particle-in-Cell simulations. Several signatures of secondary magnetic reconnection are identified in the plane perpendicular to the flux rope: a quadrupolar electron and ion density structure and a bipolar Hall magnetic field develop in proximity of the reconnection region. The most intense electric fields form perpendicularly to the local magnetic field, and a reconnection electric field is identified in the plane perpendicular to the flux rope. An electron current develops along the reconnection line in the opposite direction of the electron current supporting the flux rope magnetic field structure. Along the reconnection line, several bipolar structures of the electric field parallel to the magnetic field occur making the magnetic reconnection region turbulent. The reported signatures of secondary magnetic reconnection can help to localize magnetic reconnection events in space, astrophysical and fusion plasmas

    Nonlinear evolution of the magnetized Kelvin-Helmholtz instability: from fluid to kinetic modeling

    Full text link
    The nonlinear evolution of collisionless plasmas is typically a multi-scale process where the energy is injected at large, fluid scales and dissipated at small, kinetic scales. Accurately modelling the global evolution requires to take into account the main micro-scale physical processes of interest. This is why comparison of different plasma models is today an imperative task aiming at understanding cross-scale processes in plasmas. We report here the first comparative study of the evolution of a magnetized shear flow, through a variety of different plasma models by using magnetohydrodynamic, Hall-MHD, two-fluid, hybrid kinetic and full kinetic codes. Kinetic relaxation effects are discussed to emphasize the need for kinetic equilibriums to study the dynamics of collisionless plasmas in non trivial configurations. Discrepancies between models are studied both in the linear and in the nonlinear regime of the magnetized Kelvin-Helmholtz instability, to highlight the effects of small scale processes on the nonlinear evolution of collisionless plasmas. We illustrate how the evolution of a magnetized shear flow depends on the relative orientation of the fluid vorticity with respect to the magnetic field direction during the linear evolution when kinetic effects are taken into account. Even if we found that small scale processes differ between the different models, we show that the feedback from small, kinetic scales to large, fluid scales is negligable in the nonlinear regime. This study show that the kinetic modeling validates the use of a fluid approach at large scales, which encourages the development and use of fluid codes to study the nonlinear evolution of magnetized fluid flows, even in the colisionless regime

    A body at the edge of language: writing anorexia, bulimia and recovering

    Get PDF
    This practice-led life writing project explores this writer-scholar&#039;s experience of her eating disorder through a series of poetic essays developed from material and somatic writing methods including ink-and-paper, found text, and movement. Through these particular methods, and the episodic acts of the writing itself, this PhD discovers a form of somatic life writing that both demonstrates and analyses the lived experience of this psycho-somatic disorder. This research project responds to the challenges of writing anorexia, bulimia and recovering, by developing material writing methods to negotiate self-erasure, narrative authority and embodied memory on the page. The PhD examines the symbiotic relation between writing and (not) eating in ways that are analogous, metaphoric and mutually affective. It draws on a range of writers and feminist materialist scholars to propose that when the tensions of eating disorder are transposed to language and navigated on the page, moments can be found where bodies and writing are constituted and de-constituted. In locating their life-affirming entanglement, this writing practice counteracts the erasure and containment of the condition

    Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

    Full text link
    Analytic, first-principles performance modeling of distributed-memory parallel codes is notoriously imprecise. Even for applications with extremely regular and homogeneous compute-communicate phases, simply adding communication time to computation time does often not yield a satisfactory prediction of parallel runtime due to deviations from the expected simple lockstep pattern caused by system noise, variations in communication time, and inherent load imbalance. In this paper, we highlight the specific cases of provoked and spontaneous desynchronization of memory-bound, bulk-synchronous pure MPI and hybrid MPI+OpenMP programs. Using simple microbenchmarks we observe that although desynchronization can introduce increased waiting time per process, it does not necessarily cause lower resource utilization but can lead to an increase in available bandwidth per core. In case of significant communication overhead, even natural noise can shove the system into a state of automatic overlap of communication and computation, improving the overall time to solution. The saturation point, i.e., the number of processes per memory domain required to achieve full memory bandwidth, is pivotal in the dynamics of this process and the emerging stable wave pattern. We also demonstrate how hybrid MPI-OpenMP programming can prevent desirable desynchronization by eliminating the bandwidth bottleneck among processes. A Chebyshev filter diagonalization application is used to demonstrate some of the observed effects in a realistic setting.Comment: 18 pages, 8 figure

    Kinetic simulations of magnetic reconnection in presence of a background O+ population

    Full text link
    Particle-in-Cell simulations of magnetic reconnection with an H+ current sheet and a mixed background plasma of H+ and O+ ions are completed using physical mass ratios. Four main results are shown. First, the O+ presence slightly decreases the reconnection rate and the magnetic reconnection evolution depends mainly on the lighter H+ ion species in the presented simulations. Second, the Hall magnetic field is characterized by a two-scale structure in presence of O+ ions: it reaches sharp peak values in a small area in proximity of the neutral line, and then decreases slowly over a large region. Third, the two background species initially separate in the outflow region because H+ and O+ ions are accelerated by different mechanisms occurring on different time scales and with different strengths. Fourth, the effect of a guide field on the O+ dynamics is studied: the O+ presence does not change the reconnected flux and all the characteristic features of guide field magnetic reconnection are still present. Moreover, the guide field introduces an O+ circulation pattern between separatrices that enhances high O+ density areas and depletes low O+ density regions in proximity of the reconnection fronts. The importance and the validity of these results are finally discussed
    corecore