Search CORE

550 research outputs found

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

Author: The ENCODE Project Consortium
Publication venue
Publication date: 01/01/2011
Field of study

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome

Carolina Digital Repository

Genome-wide associations of gene expression variation in humans

Author: Andrew G Clark
Barbara E Stranger
Brenda Kahl
David Allison
Emmanouil T Dermitzakis
ENCODE Project Consortium
Mark J Minichiello
Matthew S Forrest
Panagiotis Deloukas
Robert Lyle
Samuel Deutsch
Sarah Hunt
Simon Tavaré
Stylianos E Antonarakis
The International HapMap Consortium
Publication venue: PUBLIC LIBRARY SCIENCE
Publication date: 01/01/2005
Field of study

The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

The Francis Crick Institute

A Geometric Framework for Evaluating Rare Variant Tests of Association

Author: 1000 Genomes Project Consortium
Asimit
Bansal
Basu
Cooper
Dai
Dering
Feng
Gibson
Han
Ionita-Laza
Ladouceur
Li
Li
Li
Lin
Luedtke
Madsen
Mayer-Jochimsen
Morgenthaler
Morris
Neale
Nelson
Pan
Powers
Price
Quintana
Rivas
Sul
Sun
Tennessen
The ENCODE Project Consortium
Tintle
Torgerson
Wu
Yi
Zawistowski
Zhang
Publication venue: 'Wiley'
Publication date: 01/05/2013
Field of study

The wave of next‐generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/97460/1/gepi21722.pd

Mapping the <i>Shh</i> long-range regulatory domain

Author: Amano
Belloni
Bickmore
Chuong
Davis
Dixon
Echelard
Epstein
Hecksher-Sorensen
Jeong
Jeong
Klopocki
Kokubu
Lettice
Lettice
Lettice
Lettice
Lettice
Lettice
Liu
Marinić
Mates
Montavon
Nagy
Niedermaier
Osoegawa
Paek
Riddle
Ruf
Sagai
Sagai
Sagai
Sharpe
Sharpe
Shen
Smallwood
Spitz
Sun
Symmons
Symmons
The ENCODE Consortium Project
Tsukiji
Publication venue: 'The Company of Biologists'
Publication date: 01/10/2014
Field of study

Coordinated gene expression controlled by long-distance enhancers is orchestrated by DNA regulatory sequences involving transcription factors and layers of control mechanisms. The Shh gene and well-established regulators are an example of genomic composition in which enhancers reside in a large desert extending into neighbouring genes to control the spatiotemporal pattern of expression. Exploiting the local hopping activity of the Sleeping Beauty transposon, the lacZ reporter gene was dispersed throughout the Shh region to systematically map the genomic features responsible for expression activity. We found that enhancer activities are retained inside a genomic region that corresponds to the topological associated domain (TAD) defined by Hi-C. This domain of approximately 900 kb is in an open conformation over its length and is generally susceptible to all Shh enhancers. Similar to the distal enhancers, an enhancer residing within the Shh second intron activates the reporter gene located at distances of hundreds of kilobases away, suggesting that both proximal and distal enhancers have the capacity to survey the Shh topological domain to recognise potential promoters. The widely expressed Rnf32 gene lying within the Shh domain evades enhancer activities by a process that may be common among other housekeeping genes that reside in large regulatory domains. Finally, the boundaries of the Shh TAD do not represent the absolute expression limits of enhancer activity, as expression activity is lost stepwise at a number of genomic positions at the verges of these domains

Crossref

PubMed Central

Edinburgh Research Explorer

Hardy-Weinberg Equilibrium Testing of Biological Ascertainment for Mendelian Randomization Studies

Author: Cupples
Davey-Smith
Gu
Guarnieri
Hardy
Hedrick
Hingorani
Ian N. M. Day
Kavvoura
Santiago Rodriguez
The ENCODE Project Consortium
The International HapMap Consortium
The Wellcome Trust Case Control Consortium
Tom R. Gaunt
Weinberg
Publication venue: Oxford University Press
Publication date: 15/02/2009
Field of study

Mendelian randomization (MR) permits causal inference between exposures and a disease. It can be compared with randomized controlled trials. Whereas in a randomized controlled trial the randomization occurs at entry into the trial, in MR the randomization occurs during gamete formation and conception. Several factors, including time since conception and sampling variation, are relevant to the interpretation of an MR test. Particularly important is consideration of the “missingness” of genotypes that can be originated by chance, genotyping errors, or clinical ascertainment. Testing for Hardy-Weinberg equilibrium (HWE) is a genetic approach that permits evaluation of missingness. In this paper, the authors demonstrate evidence of nonconformity with HWE in real data. They also perform simulations to characterize the sensitivity of HWE tests to missingness. Unresolved missingness could lead to a false rejection of causality in an MR investigation of trait-disease association. These results indicate that large-scale studies, very high quality genotyping data, and detailed knowledge of the life-course genetics of the alleles/genotypes studied will largely mitigate this risk. The authors also present a Web program (http://www.oege.org/software/hwe-mr-calc.shtml) for estimating possible missingness and an approach to evaluating missingness under different genetic models

Crossref

PubMed Central

Explore Bristol Research

Modeling associations between genetic markers using Bayesian networks

Author: Altshuler
Browning
C. D. Maciel
E. Villanueva
Liu
Mueller
Nothnagel
Pritchard
Scheet
The ENCODE Project Consortium
Thomas
Thomas
Tishkoff
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Understanding the patterns of association between polymorphisms at different loci in a population (linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging

Crossref

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Repositório da Produção USP (Univ. de São Paulo)

Modeling associations between genetic markers using Bayesian networks

Author: Altshuler
Browning
C. D. Maciel
E. Villanueva
Liu
Mueller
Nothnagel
Pritchard
Scheet
The ENCODE Project Consortium
Thomas
Thomas
Tishkoff
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Crossref

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Repositório da Produção USP (Univ. de São Paulo)

GIVE: portable genome browsers for personal websites.

Author: Alvin Zheng
B Sridhar
B Sridhar
C Tyner
D Barrios
D Comer
E Lieberman-Aiden
E Sharma
F Ozsolak
F Yue
FH Biase
JD Buenrostro
JG Aw
JT Robinson
LD Stein
ME Skinner
MJ Fullwood
Qiuyang Wu
R Bayer
R Li
R Mourad
S Carrere
Sheng Zhong
TC Nguyen
The ENCODE Project Consortium
VW Zhou
WJ Kent
X Li
X Zhou
Xiaoyi Cao
Z Lu
Zhangming Yan
Publication venue: eScholarship, University of California
Publication date: 01/07/2018
Field of study

Growing popularity and diversity of genomic data demand portable and versatile genome browsers. Here, we present an open source programming library called GIVE that facilitates the creation of personalized genome browsers without requiring a system administrator. By inserting HTML tags, one can add to a personal webpage interactive visualization of multiple types of genomics data, including genome annotation, "linear" quantitative data, and genome interaction data. GIVE includes a graphical interface called HUG (HTML Universal Generator) that automatically generates HTML code for displaying user chosen data, which can be copy-pasted into user's personal website or saved and shared with collaborators. GIVE is available at: https://www.givengine.org/

Crossref

Directory of Open Access Journals

eScholarship - University of California

A genomic data viewer for iPad

Author: Douglass Turner
H Li
H Thorvaldsdóttir
Helga Thorvaldsdóttir
James T Robinson
Jill P Mesirov
JT Robinson
MN Cabili
The 1000 Genomes Project Consortium
The ENCODE Project Consortium
WJ Kent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The Integrative Genomics Viewer (IGV) for iPad, based on the popular IGV application for desktop and laptop computers, supports researchers who wish to take advantage of the mobility of today’s tablet computers to view genomic data and present findings to colleagues

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

Author: Abi-Rached
Akahoshi
Bailey
Benjamin Dickins
Birney
Cadavid
Cathy Riemer
Chen
Chih-Hao Hsu
Chiu
Colobran
Datta
Degenhardt
Dewey
Dufayard
Edwards
Eric D. Green
Fitch
Fitch
Fitch
Giltae Song
Gish
Gonzalez
Goodstadt
Graef
Guethlein
Guethlein
Han
Hardies
Hardison
Hardison
Hardison
Harris
Hie Lim Kim
Hoffmann
Hou
Hou
Hsu
Hsu
Hu
Huerta-Cepas
Jensen
Johnson
Kim
Kristensen
Lee
Levy
Li
Li
Lopez-Vazquez
Louxin Zhang
Margulies
Martin
Matsuya
Mi
Miyata
Muller
Murphy
NISC Comparative Sequencing Program
Opazo
Opazo
Ostlund
Ouzounis
Parham
Pianezza
Rajalingam
Ross C. Hardison
Sambrook
Shilling
Siepel
Smit
Song
Song
Song
Sonnhammer
Su
Tatusov
The ENCODE Project Consortium
Uchiyama
van der Heijden
Vilella
Wang
Wapinski
Waterhouse
Webb Miller
Wilson
Wilson
Woelk
Yu Zhang
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events

Crossref

Nottingham Trent Institutional Repository (IRep)

PubMed Central

ScholarBank@NUS