Search CORE

85 research outputs found

GPU-Accelerated BWT Construction for Large Collection of Short Reads

Author: Lam Tak-Wah
Liu Chi-Man
Luo Ruibang
Publication venue
Publication date: 29/01/2014
Field of study

Advances in DNA sequencing technology have stimulated the development of algorithms and tools for processing very large collections of short strings (reads). Short-read alignment and assembly are among the most well-studied problems. Many state-of-the-art aligners, at their core, have used the Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome (typical example, NCBI human genome). Recently, BWT has also found its use in string-graph assembly, for indexing the reads (i.e., raw data from DNA sequencers). In a typical data set, the volume of reads is tens of times of the sequenced genome and can be up to 100 Gigabases. Note that a reference genome is relatively stable and computing the index is not a frequent task. For reads, the index has to computed from scratch for each given input. The ability of efficient BWT construction becomes a much bigger concern than before. In this paper, we present a practical method called CX1 for constructing the BWT of very large string collections. CX1 is the first tool that can take advantage of the parallelism given by a graphics processing unit (GPU, a relative cheap device providing a thousand or more primitive cores), as well as simultaneously the parallelism from a multi-core CPU and more interestingly, from a cluster of GPU-enabled nodes. Using CX1, the BWT of a short-read collection of up to 100 Gigabases can be constructed in less than 2 hours using a machine equipped with a quad-core CPU and a GPU, or in about 43 minutes using a cluster with 4 such machines (the speedup is almost linear after excluding the first 16 minutes for loading the reads from the hard disk). The previously fastest tool BRC is measured to take 12 hours to process 100 Gigabases on one machine; it is non-trivial how BRC can be parallelized to take advantage a cluster of machines, let alone GPUs.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Author: Lam Tak-Wah
Li Dinghua
Liu Chi-Man
Luo Ruibang
Sadakane Kunihiko
Publication venue
Publication date: 23/12/2014
Field of study

MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license.Comment: 2 pages, 2 tables, 1 figure, submitted to Oxford Bioinformatics as an Application Not

arXiv.org e-Print Archive

Crossref

HKU Scholars Hub

BASE: a practical de novo assembler for large genomes using long NGS reads

Author: Binghang Liu
Chi-Man Liu
Dinghua Li
Hing-Fung Ting
Ruibang Luo
Siu-Ming Yiu
Tak-Wah Lam
Yingrui Li
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Â© 2016 The Author(s). Background: De novo genome assembly using NGS data remains a computation-intensive task especially for large genomes. In practice, efficiency is often a primary concern and favors using a more efficient assembler like SOAPdenovo2. Yet SOAPdenovo2, based on de Bruijn graph, fails to take full advantage of longer NGS reads (say, 150 bp to 250 bp from Illumina HiSeq and MiSeq). Assemblers that are based on string graphs (e.g., SGA), though less popular and also very slow, are more favorable for longer reads. Methods: This paper shows a new de novo assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs. Results: Experiments on two bacteria and four human datasets shows the advantage of BASE in both contig quality and speed in dealing with longer reads. In the experiment on bacteria, two datasets with read length of 100 bp and 250 bp were used. Especially for the 250 bp dataset, BASE gives much better quality than SOAPdenovo2 and SGA and is simlilar to SPAdes. Regarding speed, BASE is consistently a few times faster than SPAdes and SGA, but still slower than SOAPdenovo2. BASE and Soapdenov2 are further compared using human datasets with read length 100 bp, 150 bp and 250 bp. BASE shows a higher N50 for all datasets, while the improvement becomes more significant when read length reaches 250 bp. Besides, BASE is more-meory efficent than SOAPdenovo2 when sequencing data with error rate. Conclusions: BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.published_or_final_versio

Springer - Publisher Connector

PubMed Central

HKU Scholars Hub

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

Author: Chang Yu
Chi-Man Liu
David W Cheung
Edward Wu
Haoxiang Lin
Hing-Fung Ting
Jianqiao Zhu
Lap-Kei Lee
Ruibang Luo
Ruiqiang Li
Shaoliang Peng
Siu-Ming Yiu
Tak-Wah Lam
Thomas Wong
Wenjuan Zhu
Xiaoqian Zhu
Yingrui Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

The Francis Crick Institute

Data-Flow-Based Normalization Generation Algorithm of R1CS for Zero-Knowledge Proof

Author: Chen Hao
Li Guoqiang
Liu Ruibang
Shi Chenhao
Publication venue
Publication date: 16/09/2023
Field of study

The communities of blockchains and distributed ledgers have been stirred up by the introduction of zero-knowledge proofs (ZKPs). Originally designed to solve privacy issues, ZKPs have now evolved into an effective remedy for scalability concerns and are applied in Zcash (internet money like Bitcoin). To enable ZKPs, Rank-1 Constraint Systems (R1CS) offer a verifier for bi-linear equations. To accurately and efficiently represent R1CS, several language tools like Circom, Noir, and Snarky have been proposed to automate the compilation of advanced programs into R1CS. However, due to the flexible nature of R1CS representation, there can be significant differences in the compiled R1CS forms generated from circuit language programs with the same underlying semantics. To address this issue, this paper uses a data-flow-based R1CS paradigm algorithm, which produces a standardized format for different R1CS instances with identical semantics. By using the normalized R1CS format circuits, the complexity of circuits' verification can be reduced. In addition, this paper presents an R1CS normalization algorithm benchmark, and our experimental evaluation demonstrates the effectiveness and correctness of our methods.Comment: 10pages, 8 figures, a shorter version is accepted by PRDC 202

arXiv.org e-Print Archive

MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC)

Author: Chan Sze-Hang
Cheung Jeanno
He Guangzhu
Lam Tak-Wah
Law Wai-Chun
Li Ruiqiang
Li Yingrui
Liu Chi-Man
Luo Ruibang
Peng Shaoliang
Wang Heng
Wang Jun
Wu Edward
Yu Chang
Zhou Dazong
Zhu Xiaoqian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background: Short-read aligners have recently gained a lot of speed by exploiting the massive parallelism of GPU. An uprising alterative to GPU is Intel MIC; supercomputers like Tianhe-2, currently top of TOP500, is built with 48,000 MIC boards to offer ~55 PFLOPS. The CPU-like architecture of MIC allows CPU-based software to be parallelized easily; however, the performance is often inferior to GPU counterparts as an MIC card contains only ~60 cores (while a GPU card typically has over a thousand cores). Results: To better utilize MIC-enabled computers for NGS data analysis, we developed a new short-read aligner MICA that is optimized in view of MIC's limitation and the extra parallelism inside each MIC core. By utilizing the 512-bit vector units in the MIC and implementing a new seeding strategy, experiments on aligning 150 bp paired-end reads show that MICA using one MIC card is 4.9 times faster than BWA-MEM (using 6 cores of a top-end CPU), and slightly faster than SOAP3-dp (using a GPU). Furthermore, MICA's simplicity allows very efficient scale-up when multiple MIC cards are used in a node (3 cards give a 14.1-fold speedup over BWA-MEM). Summary: MICA can be readily used by MIC-enabled supercomputers for production purpose. We have tested MICA on Tianhe-2 with 90 WGS samples (17.47 Tera-bases), which can be aligned in an hour using 400 nodes. MICA has impressive performance even though MIC is only in its initial stage of development. Availability and implementation: MICA's source code is freely available at http://sourceforge.net/projects/mica-aligner under GPL v3. Supplementary information: Supplementary information is available as "Additional File 1". Datasets are available at www.bio8.cs.hku.hk/dataset/mica.published_or_final_versio

Crossref

PubMed Central

HKU Scholars Hub

UQ eSpace (University of Queensland)

ZK-ProVer: Proving Programming Verification in Non-Interactive Zero-Knowledge Proofs

Author: Guoqiang Li
Haoyu Wei
Jingyu Ke
Ruibang Liu
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 18/06/2025
Field of study

Program verification ensures software correctness through formal methods but incurs substantial computational overhead. It typically encodes program execution into formulas that are verified using a SAT solver and its extensions. However, this process exposes sensitive program details and requires redundant computations when multiple parties need to verify correctness. To overcome these limitations, zero-knowledge proofs (ZKPs) generate compact, reusable proofs with fast verification times, while provably hiding the program’s internal logic. We propose a two-phase zero-knowledge protocol that hides program implementation details throughout verification. Phase I uses a zero-knowledge virtual machine (zkVM) to encode programs into SAT formulas without revealing their semantics. Phase II employs the encoding of resolution proofs for UNSAT instances and circuits for satisfying assignment verification for SAT instances through PLONKish circuits. Evaluation on the Boolector benchmark demonstrates that our method achieves verification time that is efficient and is independent of clause width for UNSAT instances and formula size for SAT instances. The resulting ZKPs enable efficient verification of program properties while providing strong end-to-end privacy guarantees

Cryptology ePrint Archive

Can Language Models Pretend Solvers? Logic Code Simulation with LLMs

Author: Chang Xi
Chen Minyu
Li Guoqiang
Liu Ruibang
Su Yuxin
Wu Ling-I
Xue Jianxin
Publication venue
Publication date: 28/03/2024
Field of study

Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers for logic reasoning have been proposed recently. While existing research predominantly focuses on viewing LLMs as natural language logic solvers or translators, their roles as logic code interpreters and executors have received limited attention. This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs. To further investigate this novel task, we formulate our three research questions: Can LLMs efficiently simulate the outputs of logic codes? What strength arises along with logic code simulation? And what pitfalls? To address these inquiries, we curate three novel datasets tailored for the logic code simulation task and undertake thorough experiments to establish the baseline performance of LLMs in code simulation. Subsequently, we introduce a pioneering LLM-based code simulation technique, Dual Chains of Logic (DCoL). This technique advocates a dual-path thinking approach for LLMs, which has demonstrated state-of-the-art performance compared to other LLM prompt strategies, achieving a notable improvement in accuracy by 7.06% with GPT-4-Turbo.Comment: 12 pages, 8 figure

arXiv.org e-Print Archive

The oyster genome reveals stress adaptation and complexity of shell formation

Author: Andrew Mount
AS Mount
B Star
Baoyu Huang
Binghang Liu
Bo Wang
Bo Wen
C Sauvage
CA Lemmon
CE Bender
Christian E. W. Steinberg
Chunfang Peng
D Hedgecock
DD Mosser
DEK Ferrier
Dennis Hedgecock
Dingding Fan
DJ McGoldrick
E Sauvage
E Sodergren
EF Lee
EK Suk
ET Walters
F Marin
Fei Xu
Fengji Tan
Fucun Wu
Guofan Zhang
Guojie Zhang
Guoying Miao
H Li
Haigang Qi
Hailong Yang
Haiyan Wang
Huanming Yang
Huayong Que
JH Waite
Jiafeng Wang
Jian Wang
Jibiao Zhang
Jie Meng
Jinpeng Wang
JO Kitzman
Jordi Paps
Juan Li
Jun Liu
Jun Wang
Junyi Wang
JV Goldstone
K Caldeira
K Nagai
KS Small
Lan Yang
Li Chen
Li Li
Linlin Zhang
Longhai Luo
Lumin Qian
LW Hillier
M Barucca
M Stark
Maoshan Chen
Mei Yang
N Kourtis
Na Li
Na Zhang
Ning Li
Patrick M. Gaffney
PC Hanington
Peixiang Ni
Peizhou Cheng
Pengcheng Yang
Peter W. H. Holland
PM Gaffney
PS Galtsoff
Qiang Wang
Qihui Zhu
Qiumei Zheng
R Bonasio
R Li
R Li
RA Dalloul
Ronglian Huang
Ruibang Luo
S Keller
S Mathivanan
S Weiner
SC Talmage
Shan Wang
Shoudu Zhang
Shu Zhang
SM Zhang
T Furuhashi
Tao Qu
TM Carland
Tomislav Domazet-Lošo
Tong Wang
TS Chang
Wei Wang
Wen Huang
Wenjing Fu
WF Ponder
X Xu
Xiao Liu
Xiaodong Fang
Xiaoqing Sun
Xiaorui Song
Xiaotong Wang
Ximing Guo
Xuanting Jiang
Xuedi Du
Y Han
Yabing Zhu
Yan He
Yao Ming
Ye Yin
Yingrui Li
Yingxiang Li
Yinlong Xie
Yishuai Du
Yong Zhang
Yuanxin Chen
Yue Feng
Yunjie Liu
Zhe Xu
Zhicai She
Zhiqiang Xiong
Zhiyong Huang
Zhiyu Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster's adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster genome sequence fills a void in our understanding of the Lophotrochozoa. © 2012 Macmillan Publishers Limited. All rights reserved

University of Essex Research Repository

Crossref

Copenhagen University Research Information System

Oxford University Research Archive

HKU Scholars Hub

Explore Bristol Research

UQ eSpace (University of Queensland)