48 research outputs found

    Scale-Out ccNUMA: Exploiting Skew with Strongly Consistent Caching

    Get PDF
    Today’s cloud based online services are underpinned by distributed key-value stores (KVS). Such KVS typically use a scale-out architecture, whereby the dataset is partitioned across a pool of servers, each holding a chunk of the dataset in memory and being responsible for serving queries against the chunk. One important performance bottleneck that a KVS design must address is the load imbalance caused by skewed popularity distributions. Despite recent work on skew mitigation, existing approaches offer only limited benefit for high-throughput in-memory KVS deployments. In this paper, we embrace popularity skew as a performance opportunity. Our insight is that aggressively caching popular items at all nodes of the KVS enables both load balance and high throughput – a combination that has eluded previous approaches. We introduce symmetric caching, wherein every server node is provisioned with a small cache that maintains the most popular objects in the dataset. To ensure consistency across the caches, we use high-throughput fully-distributed consistency protocols. A key result of this work is that strong consistency guarantees (per-key linearizability) need not compromise on performance. In a 9-node RDMA-based rack and with modest write ratios, our prototype design, dubbed ccKVS, achieves 2.2× the throughput of the state-of-the-art KVS while guaranteeing strong consistency

    McVerSi: A Test Generation Framework for Fast Memory Consistency Verification in Simulation

    Get PDF

    Adsorptive Removal of As(V) from Aqueous Solution onto Steel Slag Recovered Iron – Chitosan Composite: Response Surface Modeling and Kinetics

    Get PDF
    In the present work iron particles was recovered by dry magnetic separation, from waste steel slag, doped with chitosan for adsorbent prepared, characterized and evaluated for the removal of As(V) from an aqueous solution. The adsorption of As(V) was optimized by using response surface methodology through Box-Behnken design model, which gave high correlation coefficient (R2 = 0.9175), and a predictive model of quadratic polynomial equation. Analysis of variance and Fischer's F-test were used to govern the parameters which interrupt the adsorption of As(V).The adsorbent was characterized by FTIR, XRD and SEM. Optimal conditions, including adsorbent dosage, pH, temperature, initial ion concentration and contact time for the removal of As(V), were found to be 0.8 g, pH 4, 308 K, 10 mg L−1 and 3 h, respectively. Langmuir isotherm model fitted better compared to the Freundlich model having a maximum adsorption capacity of 11.76 mg g−1, a high regression coefficient value of 0.993 and least chi-square value of 0.1959. The process was found to follow monolayer adsorption and pseudo-second-order kinetics. Thermodynamic parameters such as ∆S, ∆H and ∆G indicated the feasibility, spontaneous and endothermic nature of adsorption. Successful regeneration of the adsorbent implies its applicability to the removal of arsenic from real life wastewater

    RC3: Consistency directed cache coherence for x86-64 with RC extensions

    Get PDF

    Blasting Through The Front-End Bottleneck With Shotgun

    Get PDF
    The front-end bottleneck is a well-established problem in server workloads owing to their deep software stacks and large instruction working sets. Despite years of research into effective L1-I and BTB prefetching, state-of-the-art techniques force a trade-off between performance and metadata storage costs. This work introduces Shotgun, a BTB-directed front-end prefetcher powered by a new BTB organization that maintains a logical map of an application's instruction footprint, which enables high-efficacy prefetching at low storage cost. To map active code regions, Shotgun precisely tracks an application's global control flow (e.g., function and trap routine entry points) and summarizes local control flow within each code region. Because the local control flow enjoys high spatial locality, with most functions comprised of a handful of instruction cache blocks, it lends itself to a compact region-based encoding. Meanwhile, the global control flow is naturally captured by the application's unconditional branch working set (calls, returns, traps). Based on these insights, Shotgun devotes the bulk of its BTB capacity to branches responsible for the global control flow and a spatial encoding of their target regions. By effectively capturing a map of the application's instruction footprint in the BTB, Shotgun enables highly effective BTB-directed prefetching. Using a storage budget equivalent to a conventional BTB, Shotgun outperforms the state-of-the-art BTB-directed front-end prefetcher by up to 14% on a set of varied commercial workloads.</p

    DCA: a DRAM-Cache-Aware DRAM Controller

    Get PDF
    3D-stacking technology has enabled the option of embedding a large DRAM cache onto the processor. Since the DRAM cache can be orders of magnitude larger than a conventional SRAM cache, the size of its cache tags can also be large. Recent works have proposed storing these tags in the stacked DRAM array itself. However, this increases the complexity of a DRAM cache request, which now translates into multiple DRAM cache accesses (tag/data).In this work, we address how to schedule these DRAM cache accesses. We start by exploring whether or not a conventional DRAM controller will work well. We introduce two potential baseline designs and study their limitations. We then derive a set of design principles that a DRAM cache controller must ideally satisfy. Our DRAM-cache-aware (DCA) DRAM controller, that is based on these principles, consistently improves performance over various DRAM cache organizations

    Boomerang: a Metadata-Free Architecture for Control Flow Delivery

    Get PDF
    Contemporary server workloads feature massive instruction footprints stemming from deep, layered software stacks. The active instruction working set of the entire stack can easily reach into megabytes, resulting in frequent frontendstalls due to instruction cache misses and pipeline flushes due to branch target buffer (BTB) misses. While a number of techniques have been proposed to address these problems, every one of them requires dedicated metadata structures, translating into significant storage and complexity costs. In this paper, we ask the question whether it is possible to achieve high-performance control flow delivery without the metadata costs of prior techniques. We revisit a previously proposed approach of branch-predictor-directed prefetching, which leverages just the branch predictor and BTB to discover and prefetch the missing instruction cache blocks by exploring theprogram control flow ahead of the core front-end. Contrary to conventional wisdom, we find that this approach can be effective in covering instruction cache misses in modern CMPs with long LLC access latencies and multi-MB server binaries. Our first contribution lies in explaining the reasons for the efficacy of branch-predictor-directed prefetching. Our second contribution is in Boomerang, a metadata-free architecture for control flow delivery. Boomerang leverages a branch-predictor directed prefetcher to discover and prefill not only the instruction cache blocks, but also the missing BTB entries. Crucially, wedemonstrate that the additional hardware cost required to identify and fill BTB misses is negligible. Our experimental evaluation shows that Boomerang matches the performance of the state-of-the-art control flow delivery scheme without the latter’s high metadata and complexity overheads

    Automatic Parameter Tuning of Motion Planning Algorithms

    Get PDF
    Motion planning algorithms attempt to find a good compromise between planning time and quality of solution. Due to their heuristic nature, they are typically configured with several parameters. In this paper we demonstrate that, in many scenarios, the widely used default parameter values are not ideal. However, finding the best parameters to optimise some metric(s) is not trivial because the size of the parameter space can be large. We evaluate and compare the efficiency of four different methods (i.e. random sampling, AUC-Bandit, random forest, and bayesian optimisation) to tune the parameters of two motion planning algorithms, BKPIECE and RRT-connect. We present a table-top-reaching scenario where the seven degrees-of-freedom KUKA LWR robotic arm has to move from an initial to a goal pose in the presence of several objects in the environment. We show that the best methods for BKPIECE (AUC-Bandit) and RRT-Connect (random forest) improve the performance by 4.5x and 1.26x on average respectively. Then, we generate a set of random scenarios of increasing complexity, and we observe that optimal parameters found in simple environments perform well in more complex scenarios. Finally, we find that the time required to evaluate parameter configurations can be reduced by more than 2/3 with low error. Overall, our results demonstrate that for a variety of motion planning problems it is possible to find solutions that significantly improve the performance over default configurations while requiring very reasonable computation times
    corecore