18 research outputs found

    Statistical tools for genome-wide studies

    Get PDF
    The aim of genomic selection (GS) in livestock is to detect linkage disequilibrium between SNP and quantitative trait loci (QTL) across the whole genome, to improve the accuracy of the estimated breeding value (GEBV) in genetic improvement programs. Two main issues affect GS: the imbalance between the number of SNP and the number of involved animals and the high genotyping costs. In this thesis the principal component analysis (PCA) is proposed as a method to reduce the dimensionality of the SNP data. In particular, the study evaluated the effect of the rank of the variance-covariance matrix on the accuracy of GEBV when PCA was applied. In addition, a new approach is proposed to reduce the dimensionality of the data. First, this new method was used in a genomic wide association study to detect associations among markers and traits under study. Then the obtained results were used to reduce the number of SNPs useful to estimate the GEBV. Results show that, the accuracy of GEBV, when only the SNPs selected with the new method were used, was on average nearly equal to or sometimes greater than the accuracies obtained when all SNPs were used. This thesis also proposes the partial least squared regression (PLSR) to impute markers not present in economic chips and avoid a reduction in the accuracy of GEBV estimation. The study demonstrated that the PLSR imputation method can efficiently impute missing genotypes from low-density panels to HDP.</br

    Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds

    Get PDF
    Background The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Methods Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. Results In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Conclusions Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available

    Prediction of Milk Coagulation Properties and Individual Cheese Yield in Sheep Using Partial Least Squares Regression

    Get PDF
    The objectives of this study were (i) the prediction of sheep milk coagulation properties (MCP) and individual laboratory cheese yield (ILCY) from mid-infrared (MIR) spectra by using partial least squares (PLS) regression, and (ii) the comparison of different data pre-treatments on prediction accuracy. Individual milk samples of 970 Sarda breed ewes were analyzed for rennet coagulation time (RCT), curd-firming time (k20), and curd firmness (a30) using the Formagraph instrument; ILCY was measured by micro-manufacturing assays. An Furier-transform Infrared (FTIR) milk-analyzer was used for the estimation of the milk gross composition and the recording of MIR spectrum. The dataset (n = 859, after the exclusion of 111 noncoagulating samples) was divided into two sub-datasets: the data of 700 ewes were used to estimate prediction model parameters, and the data of 159 ewes were used to validate the model. Four prediction scenarios were compared in the validation, differing for the use of whole or reduced MIR spectrum and the use of raw or corrected data (locally weighted scatterplot smoothing). PLS prediction statistics were moderate. The use of the reduced MIR spectrum yielded the best results for the considered traits, whereas the data correction improved the prediction ability only when the whole MIR spectrum was used. In conclusion, PLS achieves good accuracy of prediction, in particular for ILCY and RCT, and it may enable increasing the number of traits to be included in breeding programs for dairy sheep without additional costs and logistics

    The Impact of the rank of marker variance–covariance matrix in principal component evaluation for genomic selection applications

    No full text
    In genomic selection (GS) programmes, direct genomic values (DGV) are evaluated using information provided by high-density SNP chip. Being DGV accuracy strictly dependent on SNP density, it is likely that an increase in the number of markers per chip will result in severe computational consequences. Aim of present work was to test the effectiveness of principal component analysis (PCA) carried out by chromosome in reducing the marker dimensionality for GS purposes. A simulated data set of 5700 individuals with an equal number of SNP distributed over six chromosomes was used. PCs were extracted both genome-wide (ALL) and separately by chromosome (CHR) and used to predict DGVs. In the ALL scenario, the SNP variance–covariance matrix (S) was singular, positive semi-definite and contained null information which introduces ‘spuriousness’ in the derived results. On the contrary, the S matrix for each chromosome (CHR scenario) had a full rank. Obtained DGV accuracies were always better for CHR than ALL. Moreover, in the latter scenario, DGV accuracies became soon unsettled as the number of animals decreases, whereas in CHR, they remain stable till 900–1000 individuals. In real applications where a 54k SNP chip is used, the largest number of markers per chromosome is approximately 2500. Thus, a number of around 3000 genotyped animals could lead to reliable results when the original SNP variables are replaced by a reduced number of PCs

    Dissection of genomic correlation matrices of US Holsteins using multivariate factor analysis

    Get PDF
    The aim of this study was to compare correlation matrices between direct genomic predictions for 31 traits at the genomic and chromosomal levels in US Holstein bulls. Multivariate factor analysis carried out at the genome level identified seven factors associated with conformation, longevity, yield, feet and legs, fat and protein content traits. Some differences were found at the chromosome level; variations in covariance structure on BTA 6, 14, 18 and 20 were interpreted as evidence of segregating QTL for different groups of traits. For example, milk yield and composition tended to join in a single factor on BTA 14, which is known to harbour the DGAT1 locus that affects these traits. Another example was on BTA 18, where a factor strongly correlated with sire calving ease and conformation traits was identified. It is known that in US Holstein, there is a segregating QTL on BTA18 influencing these traits. Moreover, a possible candidate gene for daughter pregnancy rate was suggested for BTA28. The methodology proposed in this study could be used to identify individual chromosomes, which have covariance structures that differ from the overall (whole genome) covariance structure. Such differences can be difficult to detect when a large number of traits are evaluated, and covariances may be affected by QTL that do not have large allele substitution effects
    corecore