465 research outputs found
A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal -statistic with a permutation reference distribution to obtain the corresponding -value. The number of computations required for the maximal test statistic is where is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm.
Results: We present a hybrid approach to obtain the -value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data.
Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy\u27\u27 package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the -value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher
Recommended from our members
GENOME WIDE DNA METHYLATION PROFILING IS PREDICTIVE OF OUTCOME IN JUVENILE MYELOMONOCYTIC LEUKEMIA
Statistical Evaluation of Evidence for Clonal Allelic Alterations in array-CGH Experiments
In recent years numerous investigators have conducted genetic studies of pairs of tumor specimens from the same patient to determine whether the tumors share a clonal origin. These studies have the potential to be of considerable clinical significance, especially in clinical settings where the distinction of a new primary cancer and metastatic spread of a previous cancer would lead to radically different indications for treatment. Studies of clonality have typically involved comparison of the patterns of somatic mutations in the tumors at candidate genetic loci to see if the patterns are sufficiently similar to indicate a clonal origin. More recently, some investigators have explored the use of array CGH for this purpose. Standard clustering approaches have been used to analyze the data, but these existing statistical methods are not suited to this problem due to the paired nature of the data, and the fact that there exists no “gold standard” diagnosis to provide a definitive determination of which pairs are clonal and which pairs are of independent origin. In this article we propose a new statistical method that focuses on the individual allelic gains or losses that have been identified in both tumors, and a statistical test is developed that assesses the degree of matching of the locations of the markers that indicate the endpoints of the allelic change. The validity and statistical power of the test is evaluated, and it is shown to be a promising approach for establishing clonality in tumor samples
A Metastasis or a Second Independent Cancer? Evaluating the Clonal Origin of Tumors Using Array-CGH Data
When a cancer patient develops a new tumor it is necessary to determine if this is a recurrence (metastasis) of the original cancer, or an entirely new occurrence of the disease. This is accomplished by assessing the histo-pathology of the lesions, and it is frequently relatively straightforward. However, there are many clinical scenarios in which this pathological diagnosis is difficult. Since each tumor is characterized by a genetic fingerprint of somatic mutations, a more definitive diagnosis is possible in principle in these difficult clinical scenarios by comparing the fingerprints. In this article we develop and evaluate a statistical strategy for this comparison when the data are derived from array comparative genomic hybridization, a technique designed to identify all of the somatic allelic gains and losses across the genome. Our method involves several stages. First a segmentation algorithm is used to estimate the regions of allelic gain and loss. Then the broad correlation in these patterns between the two tumors is assessed, leading to an initial likelihood ratio for the two diagnoses. This is then further refined by comparing in detail each plausibly clonal mutation within individual chromosome arms, and the results are aggregated to determine a final likelihood ratio. The method is employed to diagnose patients from several clinical scenarios, and the results show that in many cases a strong clonal signal emerges, occasionally contradicting the clinical diagnosis. The “quality” of the arrays can be summarized by a parameter that characterizes the clarity with which allelic changes are detected. Sensitivity analyses show that most of the diagnoses are robust when the data are of high quality
Five blood pressure loci identified by an updated genome-wide linkage scan: meta-analysis of the Family Blood Pressure Program.
BACKGROUND: A preliminary genome-wide linkage analysis of blood pressure in the Family Blood Pressure Program (FBPP) was reported previously. We harnessed the power and ethnic diversity of the final pooled FBPP dataset to identify novel loci for blood pressure thereby enhancing localization of genes containing less common variants with large effects on blood pressure levels and hypertension.
METHODS: We performed one overall and 4 race-specific meta-analyses of genome-wide blood pressure linkage scans using data on 4,226 African-American, 2,154 Asian, 4,229 Caucasian, and 2,435 Mexican-American participants (total N = 13,044). Variance components models were fit to measured (raw) blood pressure levels and two types of antihypertensive medication adjusted blood pressure phenotypes within each of 10 subgroups defined by race and network. A modified Fisher's method was used to combine the P values for each linkage marker across the 10 subgroups.
RESULTS: Five quantitative trait loci (QTLs) were detected on chromosomes 6p22.3, 8q23.1, 20q13.12, 21q21.1, and 21q21.3 based on significant linkage evidence (defined by logarithm of odds (lod) score ≥3) in at least one meta-analysis and lod scores ≥1 in at least 2 subgroups defined by network and race. The chromosome 8q23.1 locus was supported by Asian-, Caucasian-, and Mexican-American-specific meta-analyses.
CONCLUSIONS: The new QTLs reported justify new candidate gene studies. They may help support results from genome-wide association studies (GWAS) that fall in these QTL regions but fail to achieve the genome-wide significance
Recurrent epimutations activate gene body promoters in primary glioblastoma
Aberrant DNA hypomethylation may play an important role in the growth rate of glioblastoma (GBM), but the functional impact on transcription remains poorly understood. We assayed the GBM methylome with MeDIP-seq and MRE-seq, adjusting for copy number differences, in a small set of non-glioma CpG island methylator phenotype (non-G-CIMP) primary tumors. Recurrent hypomethylated loci were enriched within a region of chromosome 5p15 that is specified as a cancer amplicon and also encompasses TERT, encoding telomerase reverse transcriptase, which plays a critical role in tumorigenesis. Overall, 76 gene body promoters were recurrently hypomethylated, including TERT and the oncogenes GLI3 and TP73. Recurring hypomethylation also affected previously unannotated alternative promoters, and luciferase reporter assays for three of four of these promoters confirmed strong promoter activity in GBM cells. Histone H3 lysine 4 trimethylation (H3K4me3) ChIP-seq on tissue from the GBMs uncovered peaks that coincide precisely with tumor-specific decrease of DNA methylation at 200 loci, 133 of which are in gene bodies. Detailed investigation of TP73 and TERT gene body hypomethylation demonstrated increased expression of corresponding alternate transcripts, which in TP73 encodes a truncated p73 protein with oncogenic function and in TERT encodes a putative reverse transcriptase-null protein. Our findings suggest that recurring gene body promoter hypomethylation events, along with histone H3K4 trimethylation, alter the transcriptional landscape of GBM through the activation of a limited number of normally silenced promoters within gene bodies, in at least one case leading to expression of an oncogenic protein
Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications.
Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIP-seq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression
Representational oligonucleotide microarray analysis: A high-resolution method to detect genome copy number variation
We have developed a methodology we call ROMA (representational oligonucleotide microarray analysis), for the detection of the genomic aberrations in cancer and normal humans. By arraying oligonucleoticle probes designed from the human genome sequence, and hybridizing with "representations" from cancer and normal cells, we detect regions of the genome with altered "copy number." We achieve an average resolution of 30 kb throughout the genome, and resolutions as high as a probe every 15 kb are practical. We illustrate the characteristics of probes on the array and accuracy of measurements obtained using ROMA. Using this methodology, we identify variation between cancer and normal genomes, as well as between normal human genomes. In cancer genomes, we readily detect amplifications and large and small homozygous and hemizygous deletions. Between normal human genomes, we frequently detect large (100 kb to I Mb) deletions or duplications. Many of these changes encompass known genes. ROMA will assist in the discovery of genes and markers important in cancer, and the discovery of loci that may be important in inherited predispositions to disease
- …
