133 research outputs found
Parallel Adaptive Mesh Coarsening for Seismic Tomography
International audienceSeismic tomography enables to model the internal structure of the Earth. In order to improve the precision of existing models, a huge amount of acquired seismic data must be analyzed. The analysis of such massive data require a considerable computing power which can only be delivered by parallel computational equipments. Yet, parallel computation is not sufficient for the task: we also need algorithms to automatically concentrate the computations on the most relevant data parts. The objective of the paper is to present such an algorithm. From an initial regular mesh in which cells carry data with varying relevance, we present a method to aggregate elementary cells so as to homogenize the relevance of data. The result is an irregular mesh which has the ad- vantage over the initial mesh of having orders of magnitude less cells while preserving the geophysical meaning of data. We present both a sequential and a parallel algorithm to solve this problem under the hypotheses and constraints inherited from the geophysical context
Comparison and tuning of MPI implementation in a grid context
International audienceToday, clusters are often interconnected by long distance networks to compose grids and to provide users with a huge number of available ressources. To write parallel applica- tions, developers are generally using the standard communication library MPI, which has been optimized for clusters. However, two main features of grids - long distance networks and technological heterogeneity - raise the question of MPI efficiency in grids. This paper presents an evaluation and tuning of four recent MPI implementations (MPICH2, MPICH-Madeleine, OpenMPI and YAMPII) in a research grid: Grid'5000. The comparison is based on the execution of pingpong and NAS Parallel Bench- marks. We show that these implementations present several performance differences. We show that YAMPII performs better results than the others. But we argue that executing MPI appli- cations on a grid can be beneficial if some specific parameters are well tuned. The paper details, for each implementation, the tuning leading the best performances
Improving Simulations of MPI Applications Using A Hybrid Network Model with Topology and Contention Support
Proper modeling of collective communications is essential for understanding the behavior of medium-to-large scale parallel applications, and even minor deviations in implementation can adversely affect the prediction of real-world performance. We propose a hybrid network model extending LogP based approaches to account for topology and contention in high-speed TCP networks. This model is validated within SMPI, an MPI implementation provided by the SimGrid simulation toolkit. With SMPI, standard MPI applications can be compiled and run in a simulated network environment, and traces can be captured without incurring errors from tracing overheads or poor clock synchronization as in physical experiments. SMPI provides features for simulating applications that require large amounts of time or resources, including selective execution, ram folding, and off-line replay of execution traces. We validate our model by comparing traces produced by SMPI with those from other simulation platforms, as well as real world environments.Une bonne modélisation des communications collective est indispensable à la compréhension des performances des applications parallèles et des différences, même minimes, dans leur implémentation peut drastiquement modifier les performances escomptées. Nous proposons un modèle réseau hybrid étendant les approches de type LogP mais permettant de rendre compte de la topologie et de la contention pour les réseaux hautes performances utilisant TCP. Ce modèle est mis en oeuvre et validé au sein de SMPI, une implémentation de MPI fournie par l'environnement SimGrid. SMPI permet de compiler et d'exécuter sans modification des applications MPI dans un environnement simulé. Il est alors possible de capturer des traces sans l'intrusivité ni les problème de synchronisation d'horloges habituellement rencontrés dans des expériences réelles. SMPI permet également de simuler des applications gourmandes en mémoire ou en temps de calcul à l'aide de techniques telles l'exécution sélective, le repliement mémoire ou le rejeu hors-ligne de traces d'exécutions. Nous validons notre modèle en comparant les traces produites à l'aide de SMPI avec celles de traces d'exécution réelle. Nous montrons le gain obtenu en les comparant également à celles obtenues avec des modèles plus classiques utilisés dans des outils concurrents
Comparison and tuning of MPI implementations in a grid context
Today, clusters are often interconnected by long distance networks within grids to offer a huge number of available ressources to a range of users. MPI, the standard communication library used to write parallel applications, has been implemented for clusters. Two main features of grids: long distance networks and technological heterogeneity, raise the question of MPI efficiency in grids. This report presents an evaluation of four recent MPI implementations (MPICH2, MPICH-Madeleine, OpenMPI and GridMPI) in the french research grid: Grid'5000. The comparison is based on the execution of pingpong, NAS Parallel Benchmarks and a real application in geophysics. We show that this implementations present performance differences. Executing MPI applications on the grid can be beneficial if the parameters are well tuned. The paper details the tuning required on each implementation to get the best performances
On the efficiency of several VM provisioning strategies for workflows with multi-threaded tasks on clouds
Cloud computing promises the delivery of on-demand pay-per-use access to unlimited resources. Using these resources requires more than a simple access to them as most clients have certain constraints in terms of cost and time that need to be fulfilled. Therefore certain scheduling heuristics have been devised to optimize the placement of client tasks on allocated virtual machines. The applications can be roughly divided in two categories: independent bag-of-tasks and workflows. In this paper we focus on the latter and investigate a less studied problem, i.e., the effect the virtual machine allocation policy has on the scheduling outcome. For this we look at how workflow structure, execution time, virtual machine instance type affect the efficiency of the provisioning method when cost and makespan are considered. To aid our study we devised a mathematical model for cost and makespan in case single or multiple instance types are used. While the model allows us to determine the boundaries for two of our extreme methods, the complexity of workflow applications calls for a more experimental approach to determine the general relation. For this purpose we considered synthetically generated workflows that cover a wide range of possible cases. Results have shown the need for probabilistic selection methods in case small and heterogeneous execution times are used, while for large homogeneous ones the best algorithm is clearly noticed. Several other conclusions regarding the efficiency of powerful instance types as compared to weaker ones, and of dynamic methods against static ones are also made
Client-side resource management on the cloud: survey and future directions
Cloud computing and how to bridge the gap between various providers is getting increasing attention. In this context, efficiently scheduling tasks on heterogeneous resources is of extreme importance. The state-of-the-art for this field has been continuously growing during the last years and has reached a point in which a comprehensive overview indicating current solutions and ongoing challenges is of extreme importance for researchers. This paper aims to offer this analysis from a client-side scheduling perspective in which emphasis is not put on physical resource selection but on task to virtual machine mappings and virtual machine allocation. It provides a taxonomy for the current state-of-the-art and a unified model concerning the various metrics and goals used throughout literature. This model is designed to be sufficiently generic, extensible, and comprehensive to support most of the future work in the field. Several promising research directions and existing challenges are described
Experiments in running a scientific MPI application on Grid'5000
received the 'Intel Best Paper Award'.International audienc
Parallelization of the Lattice-Boltzmann schemes using the task-based method
National audienceThe popularization of graphic processing units (GPUs) has led to their extensive use in highperformance numerical simulations. The Lattice Boltzmann Methodology (LBM) is a general framework for constructing efficient numerical fluid simulations. In this scheme, the fluid quantities are approximated on a structured grid. At each time step, a shift-relaxation process is applied, where each kinetic value is shifted to the corresponding direction in the lattice. Thanks to its simplicity, the LBM is subject to many software optimizations. State-of-the-art techniques aim at adapting the LBM scheme to improve the computational throughput on modern processors. Currently, most effort is put into optimizing this process on GPUs, as their architecture is highly suited for this type of computation. A bottleneck of GPU implementations is that the data size of the simulation is limited by the GPU memory. This restricts the number of volume elements and, therefore, the degree of precision one can obtain. In this work, we divide the lattice structure into multiple subsets that can be executed individually. This allows the work to be distributed among different processing units at the cost of increased complexity and memory transfers. But the constraint on GPU memory is relaxed, as the subsets can be made as small as needed. Additionally, we use the task-based approach for parallelizing the application, which allows the computation to be efficiently distributed among multiple processing units
Loop-based Modeling of Parallel Communication Traces
This paper describes an algorithm that takes a trace of a distributed program and builds a model of all communications of the program. The model is a set of nested loops representing repeated patterns. Loop bodies collect events representing communication actions performed by the various processes, like sending or receiving messages, and participating in collective operations. The model can be used for compact visualization of full executions, for program understanding and debugging, and also for building statistical analyzes of various quantitative aspects of the program's behavior. The construction of the communication model is performed in two phases. First, a local model is built for each process, capturing local regularities; this phase is incremental and fast, and can be done on-line, during the execution. The second phase is a reduction process that collects, aligns, and finally merges all local models into a global, system-wide model. This global model is a compact representation of all communications of the original program, capturing patterns across groups of processes. It can be visualized directly and, because it takes the form of a sequence of loop nests, can be used to replay the original program's communication actions. Because the model is based on communication events only, it completely ignores other quantitative aspects like timestamps or messages sizes. Including such data would in most case break regularities, reducing the usefulness of trace-based modeling. Instead, the paper shows how one can efficiently access quantitative data kept in the original trace(s), by annotating the model and compiling data scanners automatically.Ce rapport de recherche décrit un algorithme qui prend en entrée la trace d'un programme distribué, et construit un modèle de l'ensemble des communications du programme. Le modèle prend la forme d'un ensemble de boucles imbriquées représentant la répétition de motifs de communication. Le corps des boucles regroupe des événements représentant les actions de communication réalisées par les différents processus impliqués, tels que l'envoi et la réception de messages, ou encore la participation à des opérations collectives. Le modèle peut servir à la visualisation compact d'exécutions complètes, à la compréhension de programme et au debugging, mais également à la construction d'analyses statistiques de divers aspects quantitatifs du comportement du programme. La construction du modèle de communication s'effectue en deux phases. Premièrement, un modèle local est construit au sein de chaque processus, capturant les régularités locales~; cette phase est incrémentale et rapide, et peut être réalisée au cours de l'exécution. La seconde phase est un processus de réduction qui rassemble, aligne, et finalement fusionne tous les modèles locaux en un modèle global décrivant la totalité du système. Ce modèle global est une représentation compacte de toutes les communications du programme original, représentant des motifs de communication entre groupes de processus. Il peut être visualisé directement et, puisqu'il prend la forme d'un ensemble de nids de boucles, peut servir à rejouer les opérations de communication du programme initial. Puisque le modèle construit se base uniquement sur les opérations de communication, il ignore complètement d'autres données quantitatives, telles que les informations chronologiques, ou les tailles de messages. L'inclusion de telles données briserait dans la plupart des cas les régularités topologiques, réduisant l'efficacité de la modélisation par boucles. Nous préférons, dans ce rapport, montrer comment, grâce au modèle construit, il est possible d'accéder efficacement aux données quantitatives si celles-ci sont conservées dans les traces individuelles, en annotant le modèle et en l'utilisant pour compiler automatiquement des programmes d'accès aux données
Émergence d'un leadership distribué pour la construction d'un enseignement
National audienceDuring the first year of the Bachelor degree in Mathematics and Computer Sciences at the University of Strasbourg, students discover computer science by attending a class named Algorithmics and Programming. The pedagogic team is composed of multiple teachers with various backgrounds. In this paper, we present the collective work the pedagogic team achieved to completely rework the curriculum, by switching to a new programming language and also changing the teaching method, using flipped classroom. We analyse the organization of this collective work, built by collaborative decisions, cooperative achievements and a distributed leadership. We discuss the efficiency of this organization, and the observed effects over several years.En première année de licence de mathématique et informatique, à l’Université de Strasbourg, les étudiants débutent l’informatique en suivant un enseignement intitulé « algorithmique et programmation ». L’équipe pédagogique est nombreuse, et constituée d’enseignants aux profils variés. Dans cette communication, nous présentons le travail collectif réalisé par l’équipe pédagogique pour remanier en profondeur l’enseignement, en changeant le langage de programmation et la méthode pédagogique (classe inversée). Nous analysons l’organisation de ce travail collectif, caractérisé par des décisions collaboratives, des réalisations coopératives et un leadership distribué. Nous discutons de l’efficacité de cette organisation, et des effets observés sur plusieurs années
- …
