Search CORE

149 research outputs found

Haplotype frequencies in a sub-region of chromosome 19q13.3, related to risk and prognosis of cancer, differ dramatically between ethnic groups

Author: A Helgason
A Vangsted
Anne Tjønneland
BA Nexo
BA Nexo
Bjørn A Nexø
CD Bustamante
CF Skjelbred
DJ Park
E Rockenbauer
G Gibson
GV Kryukov
Heng Li
J Novembre
J Yin
JC Barrett
Jun Wang
KA Frazer
KA Olaussen
Lars Bolund
M Dybdahl
MI McCarthy
Mikkel H Schierup
MJ Laska
P Scheet
P Sulem
PL Balaresque
R Blekhman
SB Gabriel
T Mailund
Thomas Mailund
U Vogel
U Vogel
U Vogel
Ulla Vogel
V Moreno
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract Background A small region of about 70 kb on human chromosome 19q13.3 encompasses 4 genes of which 3, <it>ERCC1</it>, <it>ERCC2</it>, and <it>PPP1R13L </it>(aka <it>RAI</it>) are related to DNA repair and cell survival, and one, <it>CD3EAP</it>, aka <it>ASE1</it>, may be related to cell proliferation. The whole region seems related to the cellular response to external damaging agents and markers in it are associated with risk of several cancers. Methods We downloaded the genotypes of all markers typed in the 19q13.3 region in the HapMap populations of European, Asian and African descent and inferred haplotypes. We combined the European HapMap individuals with a Danish breast cancer case-control data set and inferred the association between HapMap haplotypes and disease risk. Results We found that the susceptibility haplotype in our European sample had increased from 2 to 50 percent very recently in the European population, and to almost the same extent in the Asian population. The cause of this increase is unknown. The maximal proportion of overall genetic variation due to differences between groups for Europeans versus Africans and Europeans versus Asians (the Fst value) closely matched the putative location of the susceptibility variant as judged from haplotype-based association mapping. Conclusion The combined observation that a common haplotype causing an increased risk of cancer in Europeans and a high differentiation between human populations is highly unusual and suggests a causal relationship with a recent increase in Europeans caused either by genetic drift overruling selection against the susceptibility variant or a positive selection for the same haplotype. The data does not allow us to distinguish between these two scenarios. The analysis suggests that the region is not involved in cancer risk in Africans and that the susceptibility variants may be more finely mapped in Asian populations.</p

Crossref

Roskilde Universitet

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Syddansk Universitets Forskerportal

Online Research Database In Technology

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

The Francis Crick Institute

SNPFile – A software library and file format for large scale association mapping and population genetics studies

Author: D Arking
D Smyth
DF Easton
J Gudmundsson
JC Barrett
Jesper Nielsen
LT Amundadottir
R Saxena
S Purcell
T Mailund
Thomas Mailund
Publication venue: BioMed Central
Publication date: 01/12/2008
Field of study

Abstract Background High-throughput genotyping technology has enabled cost effective typing of thousands of individuals in hundred of thousands of markers for use in genome wide studies. This vast improvement in data acquisition technology makes it an informatics challenge to efficiently store and manipulate the data. While spreadsheets and at text files were adequate solutions earlier, the increased data size mandates more efficient solutions. Results We describe a new binary file format for SNP data, together with a software library for file manipulation. The file format stores genotype data together with any kind of additional data, using a flexible serialisation mechanism. The format is designed to be IO efficient for the access patterns of most multi-locus analysis methods. Conclusion The new file format has been very useful for our own studies where it has significantly reduced the informatics burden in keeping track of various secondary data, and where the memory and IO efficiency has greatly simplified analysis runs. A main limitation with the file format is that it is only supported by the very limited set of analysis tools developed in our own lab. This is somewhat alleviated by a scripting interfaces that makes it easy to write converters to and from the format.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Extreme genomic erosion after recurrent demographic bottlenecks in the highly endangered Iberian lynx

Author: Abascal F
Alioto T
Andres-Nieto M
Arango JR
Blanc J
Camara F
Capella-Gutierrez S
Casas-Marce M
Cheng JY
Corvelo A
Cozzuto L
Cruz F
Derdak S
Erb I
Frias L
Gabaldon T
Galan B
Garcia F
Garcia JL
Godoy JA
Guigo R
Gut I
Gut M
Li G
Lopez-Otin C
Lorente-Galdos B
Lowy E
Mailund T
Mar Alba M
Marcet-Houben M
Marques-Bonet T
Martinez-Cruz B
Murphy WJ
Notredame C
Prieto P
Quesada V
Quilez J
Reverter F
Ribeca P
Rodriguez JM
Rodriguez-Ales JL
Roma G
Rubio-Camarillo M
Ruiz-Herrera A
Ruiz-Orera J
Soriano L
Tress ML
Valencia A
Villanueva-Canas JL
Vlasova A
Publication venue: BioMed Central
Publication date: 01/01/2016
Field of study

Background: Genomic studies of endangered species provide insights into their evolution and demographic history, reveal patterns of genomic erosion that might limit their viability, and offer tools for their effective conservation. The Iberian lynx (Lynx pardinus) is the most endangered felid and a unique example of a species on the brink of extinction. Results: We generate the first annotated draft of the Iberian lynx genome and carry out genome-based analyses of lynx demography, evolution, and population genetics. We identify a series of severe population bottlenecks in the history of the Iberian lynx that predate its known demographic decline during the 20th century and have greatly impacted its genome evolution. We observe drastically reduced rates of weak-to-strong substitutions associated with GC-biased gene conversion and increased rates of fixation of transposable elements. We also find multiple signatures of genetic erosion in the two remnant Iberian lynx populations, including a high frequency of potentially deleterious variants and substitutions, as well as the lowest genome-wide genetic diversity reported so far in any species. Conclusions: The genomic features observed in the Iberian lynx genome may hamper short- and long-term viability through reduced fitness and adaptive potential. The knowledge and resources developed in this study will boost the research on felid evolution and conservation genomics and will benefit the ongoing conservation and management of this emblematic species

LJMU Research Online (Liverpool John Moores University)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

ZENODO

Repositorio Institucional de la Universidad de Oviedo

REPISALUD (Instituto de Salud Carlos III)

UPF Digital Repository

Digital.CSIC

Diposit Digital de Documents de la UAB

Crossref

Springer - Publisher Connector

OAKTrust Digital Repository (Texas A&M Univ)

PubMed Central

Local Genealogies in a Linear Mixed Model for Genome-Wide Association Mapping in Complex Pedigreed Populations

Author: Bernt Guldbrandtsen
Goutam Sahana
J Akey
JM Yu
JPA Ioannidis
L Crooks
Mogens Sandø Lund
P Scheet
PIW de Bakker
S Besenbacher
T Mailund
Thomas Mailund
Y Liu
ZH Ding
Zhongming Zhao
Publication venue: Public Library of Science
Publication date: 02/11/2011
Field of study

INTRODUCTION: The state-of-the-art for dealing with multiple levels of relationship among the samples in genome-wide association studies (GWAS) is unified mixed model analysis (MMA). This approach is very flexible, can be applied to both family-based and population-based samples, and can be extended to incorporate other effects in a straightforward and rigorous fashion. Here, we present a complementary approach, called 'GENMIX (genealogy based mixed model)' which combines advantages from two powerful GWAS methods: genealogy-based haplotype grouping and MMA. SUBJECTS AND METHODS: We validated GENMIX using genotyping data of Danish Jersey cattle and simulated phenotype and compared to the MMA. We simulated scenarios for three levels of heritability (0.21, 0.34, and 0.64), seven levels of MAF (0.05, 0.10, 0.15, 0.20, 0.25, 0.35, and 0.45) and five levels of QTL effect (0.1, 0.2, 0.5, 0.7 and 1.0 in phenotypic standard deviation unit). Each of these 105 possible combinations (3 h(2) x 7 MAF x 5 effects) of scenarios was replicated 25 times. RESULTS: GENMIX provides a better ranking of markers close to the causative locus' location. GENMIX outperformed MMA when the QTL effect was small and the MAF at the QTL was low. In scenarios where MAF was high or the QTL affecting the trait had a large effect both GENMIX and MMA performed similarly. CONCLUSION: In discovery studies, where high-ranking markers are identified and later examined in validation studies, we therefore expect GENMIX to enrich candidates brought to follow-up studies with true positives over false positives more than the MMA would

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

Author: A Helgason
A Kong
A Telenti
AD Børglum
Ali Syed
Anders D. Børglum
Anders E. Halager
Anders Krogh
Bent Petersen
BJ Stucky
Chen Ye
Christian N. S. Pedersen
Christian Theil Have
Christina M. Hultman
David Westergaard
DF Gudbjartsson
Esben Flindt
Francesco Lescai
G Lunter
GA Van der Auwera
GD Poznik
GM Cooper
H Cao
H Eiberg
H Kupfermann
H Li
H Li
H Li
Hans Eiberg
Hongzhi Cao
J Huddleston
Jacob Malte Jensen
Jakob Grove
Jette Bork-Jensen
Jihua Sun
Johan van Beusekom
Jonas Andreas Sibbesen
Jose M. G. Izarzugaza
JS Seo
JT Simpson
Jun Wang
Junhua Rao
K Katoh
K Tamura
Karsten Kristiansen
Kirstine Belling
KM Steinberg
L Paternoster
Lars Bolund
Lasse Maretty
Laurits Skov
LC Francioli
M Lek
M Nothnagel
M Oven
M Pendleton
MA Eberle
Maria Luisa Matey-Hernandez
Marie Grosjean
MC Frith
Mikkel Heide Schierup
MR Hoehe
Ning Li
Ole Lund
Ole Mors
Oluf Pedersen
P Rice
Palle Villesen
Patrick Sullivan
Peter Løngren
PH Sudmant
PL Auer
R Hubley
R Luo
Rachita Yadav
Ramneek Gupta
Ruiqi Xu
Rune M. Friborg
S Besenbacher
S Deorowicz
S Gnerre
S Liu
S Ripke
SF Altschul
Shengting Li
Shujia Huang
Simon Rasmussen
Siyang Liu
SM Kiełbasa
Stephanie Le Hellard
Søren Besenbacher
Søren Brunak
T Espeseth
T Magocˇ
Thomas D. Als
Thomas Espeseth
Thomas Mailund
Thomas Sicheritz-Pontén
Thorkild I. A. Sørensen
Torben Hansen
VA Schneider
Weijian Ye
WP Kloosterman
WS Wong
Xiaosen Guo
Xun Xu
Yuqi Chang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark

Crossref

Copenhagen University Research Information System

Carolina Digital Repository

Online Research Database In Technology

Extreme selective sweeps independently targeted the X chromosomes of the great apes

Author: Dutheil J.
Great Ape Genome Diversity Project
Hammer M.
Hobolth A.
Mailund T.
Munch K.
Nam K.
Schierup M.
Veeramah K.
Woerner A.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 19/05/2015
Field of study

The unique inheritance pattern of the X chromosome exposes it to natural selection in a way that is different from that of the autosomes, potentially resulting in accelerated evolution. We perform a comparative analysis of X chromosome polymorphism in 10 great ape species, including humans. In most species, we identify striking megabase-wide regions, where nucleotide diversity is less than 20% of the chromosomal average. Such regions are found exclusively on the X chromosome. The regions overlap partially among species, suggesting that the underlying targets are partly shared among species. The regions have higher proportions of singleton SNPs, higher levels of population differentiation, and a higher nonsynonymous-to-synonymous substitution ratio than the rest of the X chromosome. We show that the extent to which diversity is reduced is incompatible with direct selection or the action of background selection and soft selective sweeps alone, and therefore, we suggest that very strong selective sweeps have independently targeted these specific regions in several species. The only genomic feature that we can identify as strongly associated with loss of diversity is the location of testis-expressed ampliconic genes, which also have reduced diversity around them. We hypothesize that these genes may be responsible for selective sweeps in the form of meiotic drive caused by an intragenomic conflict in male meiosis

MPG.PuRe

Whole genome association mapping by incompatibilities and local perfect phylogenies

Author: A Raftery
AD Skol
AG Clark
AP Morris
AP Morris
AR Templeton
B Kerem
B Rannala
C Bardel
C Durrant
D Arking
D Gusfield
D Smyth
D Thomas
DE Reich
EJ Hannan
ERB Waldron
F Larribe
G Schwarz
H Akaike
H Matsuzaki
HT Toivonen
I Pe'er
International HapMap Consortium
J Hein
J Li
J Marchini
J Molitor
JC Barrett
JS Liu
LK Hosking
LT Amundadottir
M Kimura
Mikkel H Schierup
P Scheet
P Sevon
RC Griffiths
RR Hudson
S Zöllner
Søren Besenbacher
T Mailund
T Mailund
T Rafnar
Thomas Mailund
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. RESULTS: We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. CONCLUSION: Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A fast algorithm for genome-wide haplotype pattern mining

Author: AP Morris
Christian NS Pedersen
DE Arking
DJ Smyth
F Larribe
HT Toivonen
HTT Toivonen
I Pe'er
J Gudmundsson
J Gudmundsson
J Li
J Molitor
JS Liu
LT Amundadottir
MJ Minichiello
PIW de Bakker
R Saxena
S Zöllner
SR Browning
SR Browning
Søren Besenbacher
T Mailund
Thomas Mailund
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The <it>Haplotype Pattern Mining </it>(HPM) method is a machine learning approach to do exactly this. Results We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased. Conclusion The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

phenosim - A software to simulate phenotypes for testing in genome-wide association studies

Author: A Platt
B Peng
BE Stranger
BW Lambert
G Ewing
G Hellenthal
G van Rossum
GR Abecasis
GR Abecasis
HM Kang
HM Kang
Inka Gawenda
Karl J Schmid
L Liang
La Hindorff
M Chadeau-Hyam
M Nordborg
M Nordborg
PJ Bradbury
RR Hudson
RR Hudson
S Atwell
S Besenbacher
S Kim
S Neuenschwander
S Purcell
T Mailund
T Mailund
Torsten Günther
WYS Wang
Y Li
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background There is a great interest in understanding the genetic architecture of complex traits in natural populations. Genome-wide association studies (GWAS) are becoming routine in human, animal and plant genetics to understand the connection between naturally occurring genotypic and phenotypic variation. Coalescent simulations are commonly used in population genetics to simulate genotypes under different parameters and demographic models. Results Here, we present <monospace>phenosim</monospace>, a software to add a phenotype to genotypes generated in time-efficient coalescent simulations. Both qualitative and quantitative phenotypes can be generated and it is possible to partition phenotypic variation between additive effects and epistatic interactions between causal variants. The output formats of <monospace>phenosim</monospace> are directly usable as input for different GWAS tools. The applicability of <monospace>phenosim</monospace> is shown by simulating a genome-wide association study in <it>Arabidopsis thaliana</it>. Conclusions By using the coalescent approach to generate genotypes and <monospace>phenosim</monospace> to add phenotypes, the data sets can be used to assess the influence of various factors such as demography, genetic architecture or selection on the statistical power of association methods to detect causal genetic variants under a wide variety of population genetic scenarios. <monospace>phenosim</monospace> is freely available from the authors' website <url>http://evoplant.uni-hohenheim.de</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central