690 research outputs found

    Power efficient job scheduling by predicting the impact of processor manufacturing variability

    Get PDF
    Modern CPUs suffer from performance and power consumption variability due to the manufacturing process. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations and wasted power. In order to avoid such negative impact, users and system administrators must actively counteract any manufacturing variability. In this work we show that parallel systems benefit from taking into account the consequences of manufacturing variability when making scheduling decisions at the job scheduler level. We also show that it is possible to predict the impact of this variability on specific applications by using variability-aware power prediction models. Based on these power models, we propose two job scheduling policies that consider the effects of manufacturing variability for each application and that ensure that power consumption stays under a system-wide power budget. We evaluate our policies under different power budgets and traffic scenarios, consisting of both single- and multi-node parallel applications, utilizing up to 4096 cores in total. We demonstrate that they decrease job turnaround time, compared to contemporary scheduling policies used on production clusters, up to 31% while saving up to 5.5% energy.Postprint (author's final draft

    Τεχνικές βελτιστοποίησης για παράλληλα προγραμματιστικά μοντέλα βασισμένα σε εργασίες

    Get PDF
    126 σ.Ένα από τα πιο απαιτητικά προβλήματα στα σύγχρονα παράλληλα υπολογιστικά συστήματα είναι η εκμετάλλευση του μεγάλου αριθμού των νημάτων/πυρήνων που προσφέρει το σύγχρονο υλικό, με σκοπό την βελτίωση της αποδοτικότητας εφαρμογών που εκτελούν κομμάτια κώδικα παράλληλα. Στην βιβλιογραφία και την βιομηχανία έχουν προταθεί διάφορα προγραμματιστικά μοντέλα για αυτό τον σκοπό, στα οποία περιλαμβάνεται και το μοντέλο με παράλληλες εργασίες. Στο συγκεκριμένο μοντέλο, που έχει σκοπό την απλοποίηση του παράλληλου προγραμματισμού, ο προγραμματιστής εκφράζει τον παραλληλισμό της εφαρμογής ως εργασίες που μπορούν να εκτελεστούν παράλληλα και το σύστημα εκτέλεσης αποφασίζει πως αυτές οι εργασίες θα ανατεθούν σε νήματα του λειτουργικού συστήματος προς εκτέλεση. Στόχος της παρούσας εργασίας είναι να εξερευνήσει και να βελτιστοποιήσει τους εσωτερικούς μηχανισμούς της βιβλιοθήκης Intel TBB κάτω από συγκεκριμένους αρχιτεκτονικούς περιορισμούς. Αρχικά εξετάζουμε τον scheduler εργασιών της βιβλιοθήκης, με έμφαση στον μηχανισμό «κλοπής εργασιών», ώστε να αναγνωριστούν οι βασικές λειτουργίες του και εκτελούμε profiling για να μετρήσουμε την επιβάρυνση που επιφέρει η καθεμία. Εν συνεχεία, γίνεται προσπάθεια να βελτιστοποιήσουμε τον μηχανισμό τυχαίας κλοπής προσθέτοντας πληροφορίες που αφορούν την αρχιτεκτονική, κυρίως την ιεραρχία κρυφών μνημών και την διαμόρφωση των packages. Υλοποιούμε έναν μηχανισμό κλοπής εργασιών που ακολουθεί δύο πολιτικές: 1) κλοπή από τους κοντινότερους πυρήνες (σε απόσταση ιεραρχίας μνήμης), 2) κλοπή από τον πιο φορτωμένο με εργασίες πυρήνα. Η πρώτη πολιτική έχει στόχο να μεγιστοποιήσει την επαναχρησιμοποίηση δεδομένων που μοιράζονται πυρήνες στην ιεραρχία μνήμης, μείωση της μόλυνσης της κρυφής μνήμης με μη σχετικά δεδομένα (μείωση των conflict/coherence misses), ενθαρρύνοντας την πρόσβαση δεδομένων σε τοπικό αρχιτεκτονικό επίπεδο. Η δεύτερη πολιτική έχει στόχο την βελτίωση της εξισορρόπησης φορτίου μεταξύ των πυρήνων. Για την αξιολόγηση των παραπάνω παρουσιάζουμε πειραματικά αποτελέσματα που αφορούν την βελτίωση της απόδοσης διάφορων εφαρμογών σε μία SMP πλατφόρμα 24 πυρήνων, μία NUMA πλατφόρμα 12 πυρήνων και μία NUMA πλατφόρμα 32 πυρήνων (με πολυνηματισμό).One of the most challenging problems in modern parallel processing systems is to exploit the large number of cores/threads available in modern hardware, in order to improve the efficiency of applications by executing pieces of code in parallel. Various programming models have been proposed for this purpose, among which the task programming model. This model aims at simplifying parallel programming. In this model, the programmer expresses parallelism as tasks to be executed in parallel and the runtime system decides how these tasks are assigned to system threads. The goal of this thesis is to explore and optimize the internals of the Intel TBB Library under certain architectural conditions. Initially we examine the library task scheduler, focusing on the task stealing mechanism, in order to identify its basic functions and we run some profiling to verify the task stealing functionality and to measure the overheads of each basic function. Subsequently we attempt to optimize the architecture agnostic random stealing function by adding architecture information, mainly about the cache hierarchy and the socket configuration. We implement a stealing mechanism that adopts certain policies: i) stealing from the closest (in terms of cache/NUMA locality) core, ii) stealing from the most loaded core. The first policy aims to maximize the reuse of data shared between cores, reduce cache pollution due to irrelevant data (i.e. minimize con-flict/coherence misses), and promote data accesses from local NUMA memory nodes. The second policy tries to achieve better load balancing among the cores. To that end, we present experimental results on performance improvement by measuring the speedup of several applications on a 24-core SMP and a 12-core (with hyperthreading) NUMA multicore machine.Αθανάσιος-Άκανθος Μ. Χασάπη

    Energy conversion in turbulent weakly-collisional plasmas: Eulerian Hybrid Vlasov-Maxwell simulations

    Full text link
    Kinetic simulations based on the Eulerian Hybrid Vlasov-Maxwell (HVM) formalism permit the examination of plasma turbulence with useful resolution of the proton velocity distribution function (VDF). The HVM model is employed here to study the balance of energy, focusing on channels of conversion that lead to proton kinetic effects, including growth of internal energy and temperature anisotropies. We show that this Eulerian simulation approach, which is almost noise-free, is able to provide an accurate energy balance for protons. The results demonstrate explicitly that the recovered temperature growth is directly related to the role of the pressure-strain interaction. Furthermore, analysis of local spatial correlations indicates that the pressure-strain interaction is qualitatively associated with strong-current, high-vorticity structures, although other local terms -- such as the heat flux -- weaken the correlation. These numerical capabilities based on the Eulerian approach will enable deeper study of transfer and conversion channels in weakly collisional Vlasov plasmas.Comment: Accepted for publication on Physics of Plasma

    Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes

    Get PDF
    This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243). This work was also partially performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-689878). Finally, the authors are grateful to the reviewers for their valuable comments, to the RoMoL team, to Xavier Teruel and Kallia Chronaki from the Programming Models group of BSC and the Computation Department of LLNL for their technical support and useful feedback.Peer ReviewedPostprint (published version

    Antimony arsenide: Chemical ordering in the compound SbAs

    Full text link
    The semimetallic Group V elements display a wealth of correlated electron phenomena due to a small indirect band overlap that leads to relatively small, but equal, numbers of holes and electrons at the Fermi energy with high mobility. Their electronic bonding characteristics produce a unique crystal structure, the rhombohedral A7 structure, which accommodates lone pairs on each site. Here we show that the A7 structure can display chemical ordering of Sb and As, which were previously thought to mix randomly. Our structural characterization of the compound SbAs is performed by single-crystal and high-resolution synchrotron x-ray diffraction, and neutron and x-ray pair distribution function analysis. All least-squares refinements indicate ordering of Sb and As, resulting in a GeTe-type structure without inversion symmetry. High-temperature diffraction studies reveal an ordering transition around 550 K. Transport and infrared reflectivity measurements, along with first-principles calculations, confirm that SbAs is a semimetal, albeit with a direct band separation larger than that of Sb or As. Because even subtle substitutions in the semimetals, notably Bi_{1-x}Sb_x, can open semiconducting energy gaps, a further investigation of the interplay between chemical ordering and electronic structure on the A7 lattice is warranted.Comment: 9 pages, 8 figure
    corecore