187 research outputs found

    GraphStep: A System Architecture for Sparse-Graph Algorithms

    Get PDF
    Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this “memory wall,” we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreading activation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor

    Packet Switched vs. Time Multiplexed FPGA Overlay Networks

    Get PDF
    Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require a large number of distinct PE-to-PE connections (e.g. using one PE to simulate 100s of operators, each requiring input data from thousands of other operators), but with each connection having low throughput compared with the PE’s operating cycle time. In these highly interconnected conditions, dedicating spatial interconnect resources for all possible connections is costly and inefficient. Alternatively, we can time share physical network resources by virtualizing interconnect links, either by statically scheduling the sharing of resources prior to runtime or by dynamically negotiating resources at runtime. We explore the tradeoffs (e.g. area, route latency, route quality) between time-multiplexed and packet-switched networks overlayed on top of commodity FPGAs. We demonstrate modular and scalable networks which operate on a Xilinx XC2V6000-4 at 166MHz. For our applications, time-multiplexed, offline scheduling offers up to a 63% performance increase over online, packet-switched scheduling for equivalent topologies. When applying designs to equivalent area, packet-switching is up to 2× faster for small area designs while time-multiplexing is up to 5× faster for larger area designs. When limited to the capacity of a XC2V6000, if all communication is known, time-multiplexed routing outperforms packet-switching; however when the active set of links drops below 40% of the potential links, packet-switched routing can outperform time-multiplexing

    RESURGENCE OF THE ABREU-MILGROM-PEARCE FORMULA

    Get PDF
    This paper reexamines the formula by Abreu, Milgrom and Pearce (1991), which characterizes the maximum symmetric pure strategy equilibrium payoff of symmetric repeated prisoners' dilemma with imperfect monitoring. While the formula itself covers a limited class of equilibria of a limited class of games, we argue that the idea of the formula is richer. We demonstrate how the formula is useful for analysis of repeated games where the players observe private signals about past actions

    GRAph Parallel Actor Language: A Programming Language for Parallel Graph Algorithms

    Get PDF
    We introduce a domain-specific language, GRAph Parallel Actor Language, that enables parallel graph algorithms to be written in a natural, high-level form. GRAPAL is based on our GraphStep compute model, which enables a wide range of parallel graph algorithms that are high-level, deterministic, free from race conditions, and free from deadlock. Programs written in GRAPAL are easy for a compiler and runtime to map to efficient parallel field programmable gate array (FPGA) implementations. We show that the GRAPAL compiler can verify that the structure of operations conforms to the GraphStep model. We allocate many small processing elements in each FPGA that take advantage of the high on-chip memory bandwidth (5x the sequential processor) and process one graph edge per clock cycle per processing element. We show how to automatically choose parameters for the logic architecture so the high-level GRAPAL programming model is independent of the target FPGA architecture. We compare our GRAPAL applications mapped to a platform with four 65 nm Virtex-5 SX95T FPGAs to sequential programs run on a single 65 nm Xeon 5160. Our implementation achieves a total mean speedup of 8x with a maximum speedup of 28x. The speedup per chip is 2x with a maximum of 7x. The ratio of energy used by our GRAPAL implementation over the sequential implementation has a mean of 1/10 with a minimum of 1/80

    GRAph Parallel Actor Language — A Programming Language for Parallel

    No full text
    All Rights Reservediii We introduce a domain-specific language, GRAph PArallel Actor Language, that enables parallel graph algorithms to be written in a natural, high-level form. GRAPAL is based on our GraphStep compute model, which enables a wide range of parallel graph algorithms that are high-level, deterministic, free from race conditions, and free from deadlock. Programs written in GRAPAL are easy for a compiler and runtime to map to efficient parallel field programmable gate array (FPGA) implementations. We show that the GRAPAL compiler can verify that the structure of operations conforms to the GraphStep model. We allocate many small processing elements in each FPGA that take advantage of the high on-chip memory bandwidth (5x the sequential processor) and process one graph edge per clock cycle per processing element. We show how to automatically choose parameters for the logic architecture so the high-level GRAPAL programming model is independent of the target FPGA architecture. We compare our GRAPAL applications mapped to a platform with four 65 nm Virtex-5 SX95T FPGAs to sequential programs run on a single 65 n

    Structure-Stabilizing RNA Modifications Prevent MBNL Binding to Toxic CUG and CCUG Repeat RNA in Myotonic Dystrophy

    Get PDF
    Myotonic dystrophy is a genetic neurodegenerative disease caused by repeat expansion mutations. Myotonic dystrophy type 1 (DM1) is caused by a CTG repeat expansion in the 3’ UTR of the dystrophia myotonia protein kinase (DMPK) gene, while myotonic dystrophy type 2 (DM2) is caused by a CCTG repeat expansion in intron 1 of the zinc finger protein nine (Znf9) gene. When expressed, these genes produce long CUG/CCUG repeat RNAs that bind and sequester a family of RNA-binding proteins known as muscleblind-like 1, 2 and 3 (MBNL1, MBNL2, MBNL3). Sequestration of these proteins plays a prominent role in pathogenicity in myotonic dystrophy. MBNL proteins regulate alternative splicing, and myotonic dystrophy symptoms are a result of mis-spliced transcripts that MBNL proteins regulate. MBNL proteins bind to a consensus sequence YGCY (Y = pyrimidine), which is found in CUG and CCUG repeats, and cellular RNA substrates that MBNL proteins bind and regulate. CUG and CCUG repeats can form A-form helices, however it is hypothesized that MBNL proteins bind to the helices when they are open and the YGCY binding site is single-stranded in nature. To evaluate this hypothesis, we used structure-stabilizing RNA modifications pseudouridine (Ψ) and 2’-O-methylation to determine if stabilization of CUG and CCUG repeat helices affected MBNL1 binding and toxicity. We also used Ψ to determine if the structure-stabilizing modification affected MBNL binding to single-stranded YGCY RNA. CUG repeats modified with Ψ or 2’-O-methyl groups exhibited enhanced structural stability and reduced affinity for MBNL1. Ψ also stabilized the structure of CCUG repeats and rigidified single-stranded YGCY RNA and inhibited MBNL1 binding to both of these RNAs. Binding data from CCUG repeats and single-stranded YGCY RNA suggest that both pyrimidines in the YGCY motif must be modified for significant inhibition. Molecular dynamics and X-ray crystallography suggest a potential water-bridging mechanism for Ψ-mediated CUG repeat stabilization. Molecular dynamics simulations suggest that Ψ increases base-stacking interactions, and reducing the flexibility of single-stranded RNA leads to reduced MBNL1 binding. Ψ modification rescued mis-splicing in a cellular DM1 model and prevented CUG repeat toxicity in zebrafish embryos. This dissertation includes previously published and unpublished coauthored material
    corecore