37 research outputs found
Genome-wide association study identifies 25 known breast cancer susceptibility loci as risk factors for triple-negative breast cancer
Triple-negative (TN) breast cancer is an aggressive subtype of breast cancer associated with a unique set of epidemiologic and genetic risk factors. We conducted a two-stage genome-wide association study of TN breast cancer (stage 1: 1529 TN cases, 3399 controls; stage 2: 2148 cases, 1309 controls) to identify loci that influence TN breast cancer risk. Variants in the 19p13.1 and PTHLH loci showed genome-wide significant associations (P < 5 × 10− 8) in stage 1 and 2 combined. Results also suggested a substantial enrichment of significantly associated variants among the single nucleotide polymorphisms (SNPs) analyzed in stage 2. Variants from 25 of 74 known breast cancer susceptibility loci were also associated with risk of TN breast cancer (P < 0.05). Associations with TN breast cancer were confirmed for 10 loci (LGR6, MDM4, CASP8, 2q35, 2p24.1, TERT-rs10069690, ESR1, TOX3, 19p13.1, RALY), and we identified associations with TN breast cancer for 15 additional breast cancer loci (P < 0.05: PEX14, 2q24.1, 2q31.1, ADAM29, EBF1, TCF7L2, 11q13.1, 11q24.3, 12p13.1, PTHLH, NTN4, 12q24, BRCA2, RAD51L1-rs2588809, MKL1). Further, two SNPs independent of previously reported signals in ESR1 [rs12525163 odds ratio (OR) = 1.15, P = 4.9 × 10− 4] and 19p13.1 (rs1864112 OR = 0.84, P = 1.8 × 10− 9) were associated with TN breast cancer. A polygenic risk score (PRS) for TN breast cancer based on known breast cancer risk variants showed a 4-fold difference in risk between the highest and lowest PRS quintiles (OR = 4.03, 95% confidence interval 3.46–4.70, P = 4.8 × 10− 69). This translates to an absolute risk for TN breast cancer ranging from 0.8% to 3.4%, suggesting that genetic variation may be used for TN breast cancer risk prediction
Indel sensitive and comprehensive variant/mutation detection from RNA sequencing data for precision medicine
Abstract Background RNA-seq is the most commonly used sequencing application. Not only does it measure gene expression but it is also an excellent media to detect important structural variants such as single nucleotide variants (SNVs), insertion/deletion (Indels) or fusion transcripts. However, detection of these variants is challenging and complex from RNA-seq. Here we describe a sensitive and accurate analytical pipeline which detects various mutations at once for translational precision medicine. Methods The pipeline incorporates most sensitive aligners for Indels in RNA-Seq, the best practice for data preprocessing and variant calling, and STAR-fusion is for chimeric transcripts. Variants/mutations are annotated, and key genes can be extracted for further investigation and clinical actions. Three datasets were used to evaluate the performance of the pipeline for SNVs, indels and fusion transcripts. Results For the well-defined variants from NA12878 by GIAB project, about 95% and 80% of sensitivities were obtained for SNVs and indels, respectively, in matching RNA-seq. Comparison with other variant specific tools showed good performance of the pipeline. For the lung cancer dataset with 41 known and oncogenic mutations, 39 were detected by the pipeline with STAR aligner and all by the GSNAP aligner. An actionable EML4 and ALK fusion was also detected in one of the tumors, which also demonstrated outlier ALK expression. For 9 fusions spiked-into RNA-seq libraries with different concentrations, the pipeline was able to detect all in unfiltered results although some at very low concentrations may be missed when filtering was applied. Conclusions The new RNA-seq workflow is an accurate and comprehensive mutation profiler from RNA-seq. Key or actionable mutations are reliably detected from RNA-seq, which makes it a practical alternative source for personalized medicine
Author Correction: UClncR: Ultrafast and comprehensive long non-coding RNA detection from RNA-seq
A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has not been fixed in the paper
Robust hierarchical density estimation and regression for re-stained histological whole slide image co-registration.
For many disease conditions, tissue samples are colored with multiple dyes and stains to add contrast and location information for specific proteins to accurately identify and diagnose disease. This presents a computational challenge for digital pathology, as whole-slide images (WSIs) need to be properly overlaid (i.e. registered) to identify co-localized features. Traditional image registration methods sometimes fail due to the high variation of cell density and insufficient texture information in WSIs-particularly at high magnifications. In this paper, we proposed a robust image registration strategy to align re-stained WSIs precisely and efficiently. This method is applied to 30 pairs of immunohistochemical (IHC) stains and their hematoxylin and eosin (H&E) counterparts. Our approach advances the existing methods in three key ways. First, we introduce refinements to existing image registration methods. Second, we present an effective weighting strategy using kernel density estimation to mitigate registration errors. Third, we account for the linear relationship across WSI levels to improve accuracy. Our experiments show significant decreases in registration errors when matching IHC and H&E pairs, enabling subcellular-level analysis on stained and re-stained histological images. We also provide a tool to allow users to develop their own registration benchmarking experiments
UClncR: Ultrafast and comprehensive long non-coding RNA detection from RNA-seq
Abstract Long non-coding RNA (lncRNA) is a large class of gene transcripts with regulatory functions discovered in recent years. Many more are expected to be revealed with accumulation of RNA-seq data from diverse types of normal and diseased tissues. However, discovering novel lncRNAs and accurately quantifying known lncRNAs is not trivial from massive RNA-seq data. Herein we describe UClncR, an Ultrafast and Comprehensive lncRNA detection pipeline to tackle the challenge. UClncR takes standard RNA-seq alignment file, performs transcript assembly, predicts lncRNA candidates, quantifies and annotates both known and novel lncRNA candidates, and generates a convenient report for downstream analysis. The pipeline accommodates both un-stranded and stranded RNA-seq so that lncRNAs overlapping with other genes can be predicted and quantified. UClncR is fully parallelized in a cluster environment yet allows users to run samples sequentially without a cluster. The pipeline can process a typical RNA-seq sample in a matter of minutes and complete hundreds of samples in a matter of hours. Analysis of predicted lncRNAs from two test datasets demonstrated UClncR’s accuracy and their relevance to sample clinical phenotypes. UClncR would facilitate researchers’ novel lncRNA discovery significantly and is publically available at http://bioinformaticstools.mayo.edu/research/UClncR
