1,234 research outputs found

    Near-Memory Address Translation

    Full text link
    Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

    Selective, accurate, and timely self-invalidation using last-touch prediction

    Get PDF
    Communication in cache-coherent distributed shared memory (DSM) often requires invalidating (or writing back) cached copies of a memory block, incurring high overheads. This paper proposes Last-Touch Predictors (LTPs) that learn and predict the “last touch ” to a memory block by one processor before the block is accessed and subsequently invalidated by another. By predicting a last-touch and (self-)invalidating the block in advance, an LTP hides the invalidation time, significantly reducing the coherence overhead. The key behind accurate last-touch prediction is tracebased correlation, associating a last-touch with the sequence of instructions (i.e., a trace) touching the block from a coherence miss until the block is invalidated. Correlating instructions enables an LTP to identify a last-touch to a memory block uniquely throughout an application’s execution. In this paper, we use results from running shared-memory applications on a simulated DSM to evaluate LTPs. The results indicate that: (1) our base case LTP design, maintaining trace signatures on a per-block basis, substantially improves prediction accuracy over previous self-invalidation schemes to an average of 79%; (2) our alternative LTP design, maintaining a global trace signature table, reduces storage overhead but only achieves an average accuracy of 58%; (3) last-touch prediction based on a single instruction only achieves an average accuracy of 41 % due to instruction reuse within and across computation; and (4) LTP enables selective, accurate, and timely self-invalidation in DSM, speeding up program execution on average by 11%.

    Lower cost automotive piston from 2124/SiC/25p metal-matrix composite

    Get PDF
    Engineered materials have made a breakthrough in a quest for materials with a combination of custom-made properties to suit particular applications. One of such materials is 2124/SiC/25p, a high-quality aerospace grade aluminium alloy reinforced with ultrafine particles of silicon carbide, manufactured by a powder metallurgy route. This aluminium matrix composite offers a combination of greater fatigue strength at elevated temperatures, lower thermal expansion and greater wear resistance in comparison with conventionally used piston materials. The microscale particulate reinforcement also offers good formability and machinability. Despite the benefits, the higher manufacturing cost often limits their usage in high-volume industries such as automotive where such materials could significantly improve the engine performance. This paper presents mechanical and forging data for a lower cost processing route for metal matrix composites. Finite element modelling and analysis were used to examine forging of an automotive piston and die wear. This showed that selection of the forging route is important to maximise die life. Mechanical testing of the forged material showed a minimal reduction in fatigue properties at the piston operating temperature

    Wear behaviour of laser cladded Ni-based WC composite coating for Inconel hot extrusion : practical challenges and effectiveness

    Get PDF
    In forging, tooling costs make up a significant percentage of the total manufacturing cost. To combat tool failure, forging dies can be manufactured using or including layers of high wear-resistant alloys. The present work compares the manufacturing process challenges and wear response of traditional Nitriding to laser cladding using Ni-based WC on an H13 substrate for IN718 extrusion. The results have shown that machining of NiCrSiB + WC matrix material is problematic, both with regards to cutting tool wear and achievable surface finish. Assessment of pre- and post-extrusion Nitrided H13 and NiCrSiB + 30%WC laser clad dies shows more significant wear features in the case of the additively coated die. Crack formation and surface discontinuities attributed to the effects of material porosity and die heating are also discussed

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    An assessment of the speed of transition from municipal solid waste and packaging waste landfilling to recycling

    Get PDF
    In recent years, the transition from landfilling municipal solid waste (MSW) to recycling has emerged as a significant trend in response to the growing volume of waste generated and the pressing need for sustainable waste management practices. The primary objective of this study was to identify countries that have experienced the most rapid transitions and subsequently analyse their performance. The central concept was to investigate the speed of transition by examining actualized transitions in selected countries. Based on this objective, seven European countries (Croatia, Italy, Latvia, Lithuania, Poland, Slovakia, and Slovenia) that had considerable progress in transitioning from landfilling to recycling MSW have been selected as the cases of the study. Also, key metrics for evaluating the speed of this transition, along with the environmental impact of different waste treatment methods have been introduced and calculated. The present investigation analyses key metrics to assess the speed of change during the transition period. Based on the total municipal solid waste (MSW) generated, the speed of transition to recycling (STR) and landfill reduction (SLR) was calculated. STR values ranged from 2.2 to 4 percent per year, whereas SLR values ranged from 2.6 to 7 percent per year. Recycling development (RD) and landfill reduction (LR) during the transition period ranged from 61% to 892% for RD and 10% to 88% for LR. The greenhouse gas (GHG) emissions from MSW and packaging waste (PW) in Italy and Poland were analyzed since only data were available for these countries. The total percentage of GHG emission changes (EC) was -31.4% and -37.4% for Italy and Poland, respectively, and the speed of GHG emission changes (SEC) were -3.9 and -3.7 %-points/a. The μ index, which quantifies GHG emissions per kilo ton of landfill and recycling of MSW, indicated changes in total emissions during the transition period, with ∆μ values of -0.15 and -0.32 (kt CO2eq/kt (recycling + landfill)) for Italy and Poland, respectively. In the study, recycling is highlighted as a critical part of sustainable waste management and environmentally friendly waste disposal
    corecore