20,677 research outputs found

    Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods.

    Get PDF
    Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful

    An Evaluation of Design-based Properties of Different Composite Estimators

    Full text link
    For the last several decades, the US Census Bureau has been using the AK composite estimation method to produce statistics on employment from the Current Population Survey (CPS) data. The CPS uses a rotating design and AK estimators are linear combinations of monthly survey weighted averages (called month-in-sample estimates) in each rotation groups. Denoting by XX the vector of month-in-sample estimates and by Σ\Sigma its design based variance, the coefficients of the linear combination were optimized by the Census Bureau after substituting Σ\Sigma by an estimate and under unrealistic stationarity assumptions. To show the limits of this approach, we compared the AK estimator with different competitors using three different synthetic populations that mimics the Current Population Survey (CPS) data and a simplified sample design that mimics the CPS design. In our simulation setup, empirically best estimators have larger mean square error than simple averages. In the real data analysis, the AK estimates are constantly below the survey-weighted estimates, indicating potential bias. Any attempt to improve on the estimated optimal estimator in either class would require a thorough investigation of the highly non-trivial problem of estimation of Σ\Sigma for a complex setting like the CPS (we did not entertain this problem in this paper). A different approach is to use a variant of the regression composite estimator used by Statistics Canada. The regression composite estimator does not require estimation of Σ\Sigma and is less sensitive to the rotation group bias in our simulations. Our study demonstrates that there is a great potential for improving the estimation of levels and month to month changes in the unemployment rates by using the regression composite estimator

    Quasiparticle Interference on the Surface of the Topological Insulator Bi2_2Te3_3

    Full text link
    The quasiparticle interference of the spectroscopic imaging scanning tunneling microscopy has been investigated for the surface states of the large gap topological insulator Bi2_2Te3_3 through the T-matrix formalism. Both the scalar potential scattering and the spin-orbit scattering on the warped hexagonal isoenergy contour are considered. While backscatterings are forbidden by time-reversal symmetry, other scatterings are allowed and exhibit strong dependence on the spin configurations of the eigenfunctions at k points over the isoenergy contour. The characteristic scattering wavevectors found in our analysis agree well with recent experiment results.Comment: 5 pages, 2 figures, Some typos are correcte

    A complete classification of which (n,k)(n,k)-star graphs are Cayley graphs

    Full text link
    The (n,k)(n,k)-star graphs are an important class of interconnection networks that generalize star graphs, which are superior to hypercubes. In this paper, we continue the work begun by Cheng et al.~(Graphs and Combinatorics 2017) and complete the classification of all the (n,k)(n,k)-star graphs that are Cayley.Comment: We have proved the conjecture in the first version, thus completed the classification of which (n,k)(n,k)-star graphs are Cayle
    corecore