187 research outputs found
GraphStep: A System Architecture for Sparse-Graph Algorithms
Many important applications are organized around
long-lived, irregular sparse graphs (e.g., data and knowledge
bases, CAD optimization, numerical problems, simulations). The
graph structures are large, and the applications need regular
access to a large, data-dependent portion of the graph for each
operation (e.g., the algorithm may need to walk the graph, visiting
all nodes, or propagate changes through many nodes in the
graph). On conventional microprocessors, the graph structures
exceed on-chip cache capacities, making main-memory bandwidth
and latency the key performance limiters. To avoid this
“memory wall,” we introduce a concurrent system architecture
for sparse graph algorithms that places graph nodes in small
distributed memories paired with specialized graph processing
nodes interconnected by a lightweight network. This gives us a
scalable way to map these applications so that they can exploit
the high-bandwidth and low-latency capabilities of embedded
memories (e.g., FPGA Block RAMs). On typical spreading activation
queries on the ConceptNet Knowledge Base, a sample
application, this translates into an order of magnitude speedup
per FPGA compared to a state-of-the-art Pentium processor
Packet Switched vs. Time Multiplexed FPGA Overlay Networks
Dedicated, spatially configured FPGA interconnect
is efficient for applications that require high throughput connections
between processing elements (PEs) but with a limited degree
of PE interconnectivity (e.g. wiring up gates and datapaths).
Applications which virtualize PEs may require a large number
of distinct PE-to-PE connections (e.g. using one PE to simulate
100s of operators, each requiring input data from thousands of
other operators), but with each connection having low throughput
compared with the PE’s operating cycle time. In these highly interconnected
conditions, dedicating spatial interconnect resources
for all possible connections is costly and inefficient. Alternatively,
we can time share physical network resources by virtualizing
interconnect links, either by statically scheduling the sharing
of resources prior to runtime or by dynamically negotiating
resources at runtime. We explore the tradeoffs (e.g. area, route
latency, route quality) between time-multiplexed and packet-switched
networks overlayed on top of commodity FPGAs. We
demonstrate modular and scalable networks which operate on
a Xilinx XC2V6000-4 at 166MHz. For our applications, time-multiplexed,
offline scheduling offers up to a 63% performance
increase over online, packet-switched scheduling for equivalent
topologies. When applying designs to equivalent area, packet-switching
is up to 2× faster for small area designs while time-multiplexing
is up to 5× faster for larger area designs. When
limited to the capacity of a XC2V6000, if all communication is
known, time-multiplexed routing outperforms packet-switching;
however when the active set of links drops below 40% of the
potential links, packet-switched routing can outperform time-multiplexing
RESURGENCE OF THE ABREU-MILGROM-PEARCE FORMULA
This paper reexamines the formula by Abreu, Milgrom and Pearce (1991), which characterizes the maximum symmetric pure strategy equilibrium payoff of symmetric repeated prisoners' dilemma with imperfect monitoring. While the formula itself covers a limited class of equilibria of a limited class of games, we argue that the idea of the formula is richer. We demonstrate how the formula is useful for analysis of repeated games where the players observe private signals about past actions
GRAph Parallel Actor Language: A Programming Language for Parallel Graph Algorithms
We introduce a domain-specific language, GRAph Parallel Actor Language, that enables parallel graph algorithms to be written in a natural, high-level form. GRAPAL is based on our GraphStep compute model, which enables a wide range of parallel graph algorithms that are high-level, deterministic, free from race conditions, and free from deadlock. Programs written in GRAPAL are easy for a compiler and runtime to map to efficient parallel field programmable gate array (FPGA) implementations. We show that the GRAPAL compiler can verify that the structure of operations conforms to the GraphStep model. We allocate many small processing elements in each FPGA that take advantage of the high on-chip memory bandwidth (5x the sequential processor) and process one graph edge per clock cycle per processing element. We show how to automatically choose parameters for the logic architecture so the high-level GRAPAL programming model is independent of the target FPGA architecture. We compare our GRAPAL applications mapped to a platform with four 65 nm Virtex-5 SX95T FPGAs to sequential programs run on a single 65 nm Xeon 5160. Our implementation achieves a total mean speedup of 8x with a maximum speedup of 28x. The speedup per chip is 2x with a maximum of 7x. The ratio of energy used by our GRAPAL implementation over the sequential implementation has a mean of 1/10 with a minimum of 1/80
GRAph Parallel Actor Language — A Programming Language for Parallel
All Rights Reservediii We introduce a domain-specific language, GRAph PArallel Actor Language, that enables parallel graph algorithms to be written in a natural, high-level form. GRAPAL is based on our GraphStep compute model, which enables a wide range of parallel graph algorithms that are high-level, deterministic, free from race conditions, and free from deadlock. Programs written in GRAPAL are easy for a compiler and runtime to map to efficient parallel field programmable gate array (FPGA) implementations. We show that the GRAPAL compiler can verify that the structure of operations conforms to the GraphStep model. We allocate many small processing elements in each FPGA that take advantage of the high on-chip memory bandwidth (5x the sequential processor) and process one graph edge per clock cycle per processing element. We show how to automatically choose parameters for the logic architecture so the high-level GRAPAL programming model is independent of the target FPGA architecture. We compare our GRAPAL applications mapped to a platform with four 65 nm Virtex-5 SX95T FPGAs to sequential programs run on a single 65 n
Flying in the Face of Suspicionless Cell Phone Searches: Fourth Circuit Grants Airline Passengers Heightened Protection From Searches by Customs Officers
Structure-Stabilizing RNA Modifications Prevent MBNL Binding to Toxic CUG and CCUG Repeat RNA in Myotonic Dystrophy
Myotonic dystrophy is a genetic neurodegenerative disease caused by repeat expansion mutations. Myotonic dystrophy type 1 (DM1) is caused by a CTG repeat expansion in the 3’ UTR of the dystrophia myotonia protein kinase (DMPK) gene, while myotonic dystrophy type 2 (DM2) is caused by a CCTG repeat expansion in intron 1 of the zinc finger protein nine (Znf9) gene. When expressed, these genes produce long CUG/CCUG repeat RNAs that bind and sequester a family of RNA-binding proteins known as muscleblind-like 1, 2 and 3 (MBNL1, MBNL2, MBNL3). Sequestration of these proteins plays a prominent role in pathogenicity in myotonic dystrophy. MBNL proteins regulate alternative splicing, and myotonic dystrophy symptoms are a result of mis-spliced transcripts that MBNL proteins regulate. MBNL proteins bind to a consensus sequence YGCY (Y = pyrimidine), which is found in CUG and CCUG repeats, and cellular RNA substrates that MBNL proteins bind and regulate. CUG and CCUG repeats can form A-form helices, however it is hypothesized that MBNL proteins bind to the helices when they are open and the YGCY binding site is single-stranded in nature. To evaluate this hypothesis, we used structure-stabilizing RNA modifications pseudouridine (Ψ) and 2’-O-methylation to determine if stabilization of CUG and CCUG repeat helices affected MBNL1 binding and toxicity. We also used Ψ to determine if the structure-stabilizing modification affected MBNL binding to single-stranded YGCY RNA. CUG repeats modified with Ψ or 2’-O-methyl groups exhibited enhanced structural stability and reduced affinity for MBNL1. Ψ also stabilized the structure of CCUG repeats and rigidified single-stranded YGCY RNA and inhibited MBNL1 binding to both of these RNAs. Binding data from CCUG repeats and single-stranded YGCY RNA suggest that both pyrimidines in the YGCY motif must be modified for significant inhibition. Molecular dynamics and X-ray crystallography suggest a potential water-bridging mechanism for Ψ-mediated CUG repeat stabilization. Molecular dynamics simulations suggest that Ψ increases base-stacking interactions, and reducing the flexibility of single-stranded RNA leads to reduced MBNL1 binding. Ψ modification rescued mis-splicing in a cellular DM1 model and prevented CUG repeat toxicity in zebrafish embryos.
This dissertation includes previously published and unpublished coauthored material
- …
