736 research outputs found

    HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data

    Get PDF
    As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.National Science Foundation (U.S.) (NSF/NIH BIGDATA Grant R01GM108348-01)National Science Foundation (U.S.) (Graduate Research Fellowship)Simons Foundatio

    Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds

    Get PDF
    Background The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Methods Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. Results In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Conclusions Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available

    Reconstructing Druze population history

    Get PDF
    The Druze are an aggregate of communities in the Levant and Near East living almost exclusively in the mountains of Syria, Lebanon and Israel whose ~1000 year old religion formally opposes mixed marriages and conversions. Despite increasing interest in genetics of the population structure of the Druze, their population history remains unknown. We investigated the genetic relationships between Israeli Druze and both modern and ancient populations. We evaluated our findings in light of three hypotheses purporting to explain Druze history that posit Arabian, Persian or mixed Near Eastern-Levantine roots. The biogeographical analysis localised proto-Druze to the mountainous regions of southeastern Turkey, northern Iraq and southeast Syria and their descendants clustered along a trajectory between these two regions. The mixed Near Eastern-Middle Eastern localisation of the Druze, shown using both modern and ancient DNA data, is distinct from that of neighbouring Syrians, Palestinians and most of the Lebanese, who exhibit a high affinity to the Levant. Druze biogeographic affinity, migration patterns, time of emergence and genetic similarity to Near Eastern populations are highly suggestive of Armenian-Turkish ancestries for the proto-Druze

    Genome-wide linkage analysis of 972 bipolar pedigrees using single-nucleotide polymorphisms.

    Get PDF
    Because of the high costs associated with ascertainment of families, most linkage studies of Bipolar I disorder (BPI) have used relatively small samples. Moreover, the genetic information content reported in most studies has been less than 0.6. Although microsatellite markers spaced every 10 cM typically extract most of the genetic information content for larger multiplex families, they can be less informative for smaller pedigrees especially for affected sib pair kindreds. For these reasons we collaborated to pool family resources and carried out higher density genotyping. Approximately 1100 pedigrees of European ancestry were initially selected for study and were genotyped by the Center for Inherited Disease Research using the Illumina Linkage Panel 12 set of 6090 single-nucleotide polymorphisms. Of the ~1100 families, 972 were informative for further analyses, and mean information content was 0.86 after pruning for linkage disequilibrium. The 972 kindreds include 2284 cases of BPI disorder, 498 individuals with bipolar II disorder (BPII) and 702 subjects with recurrent major depression. Three affection status models (ASMs) were considered: ASM1 (BPI and schizoaffective disorder, BP cases (SABP) only), ASM2 (ASM1 cases plus BPII) and ASM3 (ASM2 cases plus recurrent major depression). Both parametric and non-parametric linkage methods were carried out. The strongest findings occurred at 6q21 (non-parametric pairs LOD 3.4 for rs1046943 at 119 cM) and 9q21 (non-parametric pairs logarithm of odds (LOD) 3.4 for rs722642 at 78 cM) using only BPI and schizoaffective (SA), BP cases. Both results met genome-wide significant criteria, although neither was significant after correction for multiple analyses. We also inspected parametric scores for the larger multiplex families to identify possible rare susceptibility loci. In this analysis, we observed 59 parametric LODs of 2 or greater, many of which are likely to be close to maximum possible scores. Although some linkage findings may be false positives, the results could help prioritize the search for rare variants using whole exome or genome sequencing

    The geography of recent genetic ancestry across Europe

    Get PDF
    The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.Comment: Full size figures available from http://www.eve.ucdavis.edu/~plralph/research.html; or html version at http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm

    Breeding histories and selection criteria for oilseed rape in Europe and China identified by genome wide pedigree dissection

    Get PDF
    Selection breeding has played a key role in the improvement of seed yield and quality in oilseed rape (Brassica napus L.). We genotyped Tapidor (European), Ningyou7 (Chinese) and their progenitors with the Brassica 60 K Illumina Infinium SNP array and mapped a total of 29,347 SNP markers onto the reference genome of Darmor-bzh. Identity by descent (IBD) refers to a haplotype segment of a chromosome inherited from a shared common ancestor. IBDs identified on the C subgenome were larger than those on the A subgenome within both the Tapidor and Ningyou7 pedigrees. IBD number and length were greater in the Ningyou7 pedigree than in the Tapidor pedigree. Seventy nine QTLs for flowering time, seed quality and root morphology traits were identified in the IBDs of Tapidor and Ningyou7. Many more candidate genes had been selected within the Ningyou7 pedigree than within the Tapidor pedigree. These results highlight differences in the transfer of favorable gene clusters controlling key traits during selection breeding in Europe and China

    Genetic and environmental transactions underlying the associationbetween physical fitness/physical exercise and body composition

    Get PDF
    We examined mean effects and variance moderating effects of measures of physical activity and fitness on six measures of adiposity and their reciprocal effects in a subsample of the population-representative Danish Twin Registry. Consistent with prior studies, higher levels of physical activity suppressed variance in adiposity, but this study provided further insight. Variance suppression appeared to have both genetic and environmental pathways. Some mean effects appeared due to reciprocal influences of environmental circumstances differing among families but not between co-twins, suggesting these reciprocal effects are uniform. Some variance moderating effects also appeared due to biases in individual measures of adiposity, as well as to differences and inaccuracies in measures of physical activity. This suggests a need to avoid reliance on single measures of both physical activity and adiposity in attempting to understand the pathways involved in their linkages, and constraint in interpreting results if only single measures are available. Future research indications include identifying which physical activity-related environmental circumstances have relatively uniform effects on adiposity in everyone, and which should be individually tailored to maximize motivation to continue involvement.</p

    Reconstructing Roma History from Genome-Wide Data

    Get PDF
    The Roma people, living throughout Europe and West Asia, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1,000–1,500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry–derived from a combination of European and South Asian sources–and that the date of admixture of South Asian and European ancestry was about 850 years before present. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which appears to have been followed by a major demographic expansion after the arrival in Europe.Országos Tudományos Kutatási Alapprogramok (OTKA K 103983)Országos Tudományos Kutatási Alapprogramok (OTKA 73430)National Science Foundation (U.S.) (HOMINID grant 1032255)National Institutes of Health (U.S.) (grant GM100233

    Rapid haplotype inference for nuclear families

    Get PDF
    Hapi is a new dynamic programming algorithm that ignores uninformative states and state transitions in order to efficiently compute minimum-recombinant and maximum likelihood haplotypes. When applied to a dataset containing 103 families, Hapi performs 3.8 and 320 times faster than state-of-the-art algorithms. Because Hapi infers both minimum-recombinant and maximum likelihood haplotypes and applies to related individuals, the haplotypes it infers are highly accurate over extended genomic distances.National Institutes of Health (U.S.) (NIH grant 5-T90-DK070069)National Institutes of Health (U.S.) (Grant 5-P01-NS055923)National Science Foundation (U.S.) (Graduate Research Fellowship
    corecore