331 research outputs found
The importance of identity-by-state information for the accuracy of genomic selection
<p>Abstract</p> <p>Background</p> <p>It is commonly assumed that prediction of genome-wide breeding values in genomic selection is achieved by capitalizing on linkage disequilibrium between markers and QTL but also on genetic relationships. Here, we investigated the reliability of predicting genome-wide breeding values based on population-wide linkage disequilibrium information, based on identity-by-descent relationships within the known pedigree, and to what extent linkage disequilibrium information improves predictions based on identity-by-descent genomic relationship information.</p> <p>Methods</p> <p>The study was performed on milk, fat, and protein yield, using genotype data on 35 706 SNP and deregressed proofs of 1086 Italian Brown Swiss bulls. Genome-wide breeding values were predicted using a genomic identity-by-state relationship matrix and a genomic identity-by-descent relationship matrix (averaged over all marker loci). The identity-by-descent matrix was calculated by linkage analysis using one to five generations of pedigree data.</p> <p>Results</p> <p>We showed that genome-wide breeding values prediction based only on identity-by-descent genomic relationships within the known pedigree was as or more reliable than that based on identity-by-state, which implicitly also accounts for genomic relationships that occurred before the known pedigree. Furthermore, combining the two matrices did not improve the prediction compared to using identity-by-descent alone. Including different numbers of generations in the pedigree showed that most of the information in genome-wide breeding values prediction comes from animals with known common ancestors less than four generations back in the pedigree.</p> <p>Conclusions</p> <p>Our results show that, in pedigreed breeding populations, the accuracy of genome-wide breeding values obtained by identity-by-descent relationships was not improved by identity-by-state information. Although, in principle, genomic selection based on identity-by-state does not require pedigree data, it does use the available pedigree structure. Our findings may explain why the prediction equations derived for one breed may not predict accurate genome-wide breeding values when applied to other breeds, since family structures differ among breeds.</p
Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection
<p>Abstract</p> <p>A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals, for which they only knew the genotypes. The organisers used these genomic predictions to perform the final validation by comparison to the true breeding values, which were known only to the organisers. Methods used by the 5 groups fell in 3 classes 1) fixed effects models 2) BLUP models, and 3) Bayesian MCMC based models. The Bayesian analyses gave the highest accuracies, followed by the BLUP models, while the fixed effects models generally had low accuracies and large error variance. The best BLUP models as well as the best Bayesian models gave unbiased predictions. The BLUP models are clearly sensitive to the assumed SNP variance, because they do not estimate SNP variance, but take the specified variance as the true variance. The current comparison suggests that Bayesian analyses on haplotypes or SNPs are the most promising approach for Genomic selection although the BLUP models may provide a computationally attractive alternative with little loss of efficiency. On the other hand fixed effect type models are unlikely to provide any gain over traditional pedigree indexes for selection.</p
Extension of the bayesian alphabet for genomic selection
<p>Abstract</p> <p>Background</p> <p>Two Bayesian methods, BayesC<it>π </it>and BayesD<it>π</it>, were developed for genomic prediction to address the drawback of BayesA and BayesB regarding the impact of prior hyperparameters and treat the prior probability <it>π </it>that a SNP has zero effect as unknown. The methods were compared in terms of inference of the number of QTL and accuracy of genomic estimated breeding values (GEBVs), using simulated scenarios and real data from North American Holstein bulls.</p> <p>Results</p> <p>Estimates of <it>π </it>from BayesC<it>π</it>, in contrast to BayesD<it>π</it>, were sensitive to the number of simulated QTL and training data size, and provide information about genetic architecture. Milk yield and fat yield have QTL with larger effects than protein yield and somatic cell score. The drawback of BayesA and BayesB did not impair the accuracy of GEBVs. Accuracies of alternative Bayesian methods were similar. BayesA was a good choice for GEBV with the real data. Computing time was shorter for BayesC<it>π </it>than for BayesD<it>π</it>, and longest for our implementation of BayesA.</p> <p>Conclusions</p> <p>Collectively, accounting for computing effort, uncertainty as to the number of QTL (which affects the GEBV accuracy of alternative methods), and fundamental interest in the number of QTL underlying quantitative traits, we believe that BayesC<it>π </it>has merit for routine applications.</p
Strategies for implementing genomic selection in family-based aquaculture breeding schemes: double haploid sib test populations
<p>Abstract</p> <p>Background</p> <p>Simulation studies have shown that accuracy and genetic gain are increased in genomic selection schemes compared to traditional aquaculture sib-based schemes. In genomic selection, accuracy of selection can be maximized by increasing the precision of the estimation of SNP effects and by maximizing the relationships between test sibs and candidate sibs. Another means of increasing the accuracy of the estimation of SNP effects is to create individuals in the test population with extreme genotypes. The latter approach was studied here with creation of double haploids and use of non-random mating designs.</p> <p>Methods</p> <p>Six alternative breeding schemes were simulated in which the design of the test population was varied: test sibs inherited maternal (<it>Mat</it>), paternal (<it>Pat</it>) or a mixture of maternal and paternal (<it>MatPat</it>) double haploid genomes or test sibs were obtained by maximum coancestry mating (<it>MaxC</it>), minimum coancestry mating (<it>MinC</it>), or random (<it>RAND</it>) mating. Three thousand test sibs and 3000 candidate sibs were genotyped. The test sibs were recorded for a trait that could not be measured on the candidates and were used to estimate SNP effects. Selection was done by truncation on genome-wide estimated breeding values and 100 individuals were selected as parents each generation, equally divided between both sexes.</p> <p>Results</p> <p>Results showed a 7 to 19% increase in selection accuracy and a 6 to 22% increase in genetic gain in the <it>MatPat</it> scheme compared to the <it>RAND</it> scheme. These increases were greater with lower heritabilities. Among all other scenarios, i.e. <it>Mat, Pat, MaxC</it>, and <it>MinC</it>, no substantial differences in selection accuracy and genetic gain were observed.</p> <p>Conclusions</p> <p>In conclusion, a test population designed with a mixture of paternal and maternal double haploids, i.e. the <it>MatPat</it> scheme, increases substantially the accuracy of selection and genetic gain. This will be particularly interesting for traits that cannot be recorded on the selection candidates and require the use of sib tests, such as disease resistance and meat quality.</p
Genomic prediction in CIMMYT maize and wheat breeding programs
Genomic selection (GS) has been implemented in animal and plant species, and is regarded as a useful tool for accelerating genetic gains. Varying levels of genomic prediction accuracy have been obtained in plants, depending on the prediction problem assessed and on several other factors, such as trait heritability, the relationship between the individuals to be predicted and those used to train the models for prediction, number of markers, sample size and genotype × environment interaction (GE). The main objective of this article is to describe the results of genomic prediction in International Maize and Wheat Improvement Center's (CIMMYT's) maize and wheat breeding programs, from the initial assessment of the predictive ability of different models using pedigree and marker information to the present, when methods for implementing GS in practical global maize and wheat breeding programs are being studied and investigated. Results show that pedigree (population structure) accounts for a sizeable proportion of the prediction accuracy when a global population is the prediction problem to be assessed. However, when the prediction uses unrelated populations to train the prediction equations, prediction accuracy becomes negligible. When genomic prediction includes modeling GE, an increase in prediction accuracy can be achieved by borrowing information from correlated environments. Several questions on how to incorporate GS into CIMMYT's maize and wheat programs remain unanswered and subject to further investigation, for example, prediction within and between related bi-parental crosses. Further research on the quantification of breeding value components for GS in plant breeding populations is required.J Crossa, P Pérez, J Hickey, J Burgueño, L Ornella, J Cerón-Rojas, X Zhang, S Dreisigacker, R Babu, Y Li, D Bonnett and K Mathew
Evaluation of methods and marker systems in genomic selection of oil palm (Elaeis guineensis Jacq.)
Background
Genomic selection (GS) uses genome-wide markers as an attempt to accelerate genetic gain in breeding programs of both animals and plants. This approach is particularly useful for perennial crops such as oil palm, which have long breeding cycles, and for which the optimal method for GS is still under debate. In this study, we evaluated the effect of different marker systems and modeling methods for implementing GS in an introgressed dura family derived from a Deli dura x Nigerian dura (Deli x Nigerian) with 112 individuals. This family is an important breeding source for developing new mother palms for superior oil yield and bunch characters. The traits of interest selected for this study were fruit-to-bunch (F/B), shell-to-fruit (S/F), kernel-to-fruit (K/F), mesocarp-to-fruit (M/F), oil per palm (O/P) and oil-to-dry mesocarp (O/DM). The marker systems evaluated were simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). RR-BLUP, Bayesian A, B, Cπ, LASSO, Ridge Regression and two machine learning methods (SVM and Random Forest) were used to evaluate GS accuracy of the traits.
Results
The kinship coefficient between individuals in this family ranged from 0.35 to 0.62. S/F and O/DM had the highest genomic heritability, whereas F/B and O/P had the lowest. The accuracies using 135 SSRs were low, with accuracies of the traits around 0.20. The average accuracy of machine learning methods was 0.24, as compared to 0.20 achieved by other methods. The trait with the highest mean accuracy was F/B (0.28), while the lowest were both M/F and O/P (0.18). By using whole genomic SNPs, the accuracies for all traits, especially for O/DM (0.43), S/F (0.39) and M/F (0.30) were improved. The average accuracy of machine learning methods was 0.32, compared to 0.31 achieved by other methods.
Conclusion
Due to high genomic resolution, the use of whole-genome SNPs improved the efficiency of GS dramatically for oil palm and is recommended for dura breeding programs. Machine learning slightly outperformed other methods, but required parameters optimization for GS implementation
Persistence of accuracy of genomic estimated breeding values over generations in layer chickens
<p>Abstract</p> <p>Background</p> <p>The predictive ability of genomic estimated breeding values (GEBV) originates both from associations between high-density markers and QTL (Quantitative Trait Loci) and from pedigree information. Thus, GEBV are expected to provide more persistent accuracy over successive generations than breeding values estimated using pedigree-based methods. The objective of this study was to evaluate the accuracy of GEBV in a closed population of layer chickens and to quantify their persistence over five successive generations using marker or pedigree information.</p> <p>Methods</p> <p>The training data consisted of 16 traits and 777 genotyped animals from two generations of a brown-egg layer breeding line, 295 of which had individual phenotype records, while others had phenotypes on 2,738 non-genotyped relatives, or similar data accumulated over up to five generations. Validation data included phenotyped and genotyped birds from five subsequent generations (on average 306 birds/generation). Birds were genotyped for 23,356 segregating SNP. Animal models using genomic or pedigree relationship matrices and Bayesian model averaging methods were used for training analyses. Accuracy was evaluated as the correlation between EBV and phenotype in validation divided by the square root of trait heritability.</p> <p>Results</p> <p>Pedigree relationships in outbred populations are reduced by 50% at each meiosis, therefore accuracy is expected to decrease by the square root of 0.5 every generation, as observed for pedigree-based EBV (Estimated Breeding Values). In contrast the GEBV accuracy was more persistent, although the drop in accuracy was substantial in the first generation. Traits that were considered to be influenced by fewer QTL and to have a higher heritability maintained a higher GEBV accuracy over generations. In conclusion, GEBV capture information beyond pedigree relationships, but retraining every generation is recommended for genomic selection in closed breeding populations.</p
A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes
<p>Abstract</p> <p>Background</p> <p>Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data.</p> <p>Methods</p> <p>A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information.</p> <p>Results</p> <p>The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available.</p> <p>Conclusions</p> <p>The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.</p
Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model
<p>Abstract</p> <p>Background</p> <p>Genomic selection involves breeding value estimation of selection candidates based on high-density SNP genotypes. To quantify the potential benefit of genomic selection, accuracies of estimated breeding values (EBV) obtained with different methods using pedigree or high-density SNP genotypes were evaluated and compared in a commercial layer chicken breeding line.</p> <p>Methods</p> <p>The following traits were analyzed: egg production, egg weight, egg color, shell strength, age at sexual maturity, body weight, albumen height, and yolk weight. Predictions appropriate for early or late selection were compared. A total of 2,708 birds were genotyped for 23,356 segregating SNP, including 1,563 females with records. Phenotypes on relatives without genotypes were incorporated in the analysis (in total 13,049 production records).</p> <p>The data were analyzed with a Reduced Animal Model using a relationship matrix based on pedigree data or on marker genotypes and with a Bayesian method using model averaging. Using a validation set that consisted of individuals from the generation following training, these methods were compared by correlating EBV with phenotypes corrected for fixed effects, selecting the top 30 individuals based on EBV and evaluating their mean phenotype, and by regressing phenotypes on EBV.</p> <p>Results</p> <p>Using high-density SNP genotypes increased accuracies of EBV up to two-fold for selection at an early age and by up to 88% for selection at a later age. Accuracy increases at an early age can be mostly attributed to improved estimates of parental EBV for shell quality and egg production, while for other egg quality traits it is mostly due to improved estimates of Mendelian sampling effects. A relatively small number of markers was sufficient to explain most of the genetic variation for egg weight and body weight.</p
The National Lung Matrix Trial: translating the biology of stratification in advanced non-small-cell lung cancer
© The Author 2015.Background: The management of NSCLC has been transformed by stratified medicine. The National Lung Matrix Trial (NLMT) is a UK-wide study exploring the activity of rationally selected biomarker/targeted therapy combinations. Patients and methods: The Cancer Research UK (CRUK) Stratified Medicine Programme 2 is undertaking the large volume national molecular pre-screening which integrates with the NLMT. At study initiation, there are eight drugs being used to target 18 molecular cohorts. The aim is to determine whether there is sufficient signal of activity in any drug-biomarker combination to warrant further investigation. A Bayesian adaptive design that gives a more realistic approach to decision making and flexibility to make conclusions without fixing the sample size was chosen. The screening platform is an adaptable 28-gene Nextera next-generation sequencing platform designed by Illumina, covering the range of molecular abnormalities being targeted. The adaptive design allows new biomarker-drug combination cohorts to be incorporated by substantial amendment. The pre-clinical justification for each biomarker-drug combination has been rigorously assessed creating molecular exclusion rules and a trumping strategy in patients harbouring concomitant actionable genetic abnormalities. Discrete routes of pathway activation or inactivation determined by cancer genome aberrations are treated as separate cohorts. Key translational analyses include the deep genomic analysis of pre- and post-treatment biopsies, the establishment of patient-derived xenograft models and longitudinal ctDNA collection, in order to define predictive biomarkers, mechanisms of resistance and early markers of response and relapse. Conclusion: The SMP2 platform will provide large scale genetic screening to inform entry into the NLMT, a trial explicitly aimed at discovering novel actionable cohorts in NSCLC
- …
