20,677 research outputs found
Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods.
Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful
Recommended from our members
Lower bounds for the stable marriage problem and its variants
In an instance of the stable marriage problem of size n, n men and n women each ranks members of the opposite sex in order of preference. A stable marriage is a complete matching M = {(m_1, w_i_1), (m_2, w_i_2), ..., (m_n, w_i_n)} such that no unmatched man and woman prefer each other to their partners in M.A pair (m_i, w_j) is stable if it is contained in some stable marriage. In this paper, we prove that determining if an arbitrary pair is stable requires Ω(n^2) time in the worst case. We show, by an adversary argument, that there exists instances of the stable marriage problem such that it is possible to find at least one pair that exhibits the Ω(n^2) lower bound.As corollaries of our results, the lower bound of Ω(n^2) is established for several stable marriage related problems. Knuth, in his treatise on stable marriage, asks if there is an algorithm that finds a stable marriage in less than Θ(n^2) time. Our results show that such an algorithm does not exist
An Evaluation of Design-based Properties of Different Composite Estimators
For the last several decades, the US Census Bureau has been using the AK
composite estimation method to produce statistics on employment from the
Current Population Survey (CPS) data. The CPS uses a rotating design and AK
estimators are linear combinations of monthly survey weighted averages (called
month-in-sample estimates) in each rotation groups. Denoting by the vector
of month-in-sample estimates and by its design based variance, the
coefficients of the linear combination were optimized by the Census Bureau
after substituting by an estimate and under unrealistic stationarity
assumptions. To show the limits of this approach, we compared the AK estimator
with different competitors using three different synthetic populations that
mimics the Current Population Survey (CPS) data and a simplified sample design
that mimics the CPS design. In our simulation setup, empirically best
estimators have larger mean square error than simple averages. In the real data
analysis, the AK estimates are constantly below the survey-weighted estimates,
indicating potential bias. Any attempt to improve on the estimated optimal
estimator in either class would require a thorough investigation of the highly
non-trivial problem of estimation of for a complex setting like the
CPS (we did not entertain this problem in this paper). A different approach is
to use a variant of the regression composite estimator used by Statistics
Canada. The regression composite estimator does not require estimation of
and is less sensitive to the rotation group bias in our simulations.
Our study demonstrates that there is a great potential for improving the
estimation of levels and month to month changes in the unemployment rates by
using the regression composite estimator
Quasiparticle Interference on the Surface of the Topological Insulator BiTe
The quasiparticle interference of the spectroscopic imaging scanning
tunneling microscopy has been investigated for the surface states of the large
gap topological insulator BiTe through the T-matrix formalism. Both the
scalar potential scattering and the spin-orbit scattering on the warped
hexagonal isoenergy contour are considered. While backscatterings are forbidden
by time-reversal symmetry, other scatterings are allowed and exhibit strong
dependence on the spin configurations of the eigenfunctions at k points over
the isoenergy contour. The characteristic scattering wavevectors found in our
analysis agree well with recent experiment results.Comment: 5 pages, 2 figures, Some typos are correcte
A complete classification of which -star graphs are Cayley graphs
The -star graphs are an important class of interconnection networks
that generalize star graphs, which are superior to hypercubes. In this paper,
we continue the work begun by Cheng et al.~(Graphs and Combinatorics 2017) and
complete the classification of all the -star graphs that are Cayley.Comment: We have proved the conjecture in the first version, thus completed
the classification of which -star graphs are Cayle
Recommended from our members
Complexity of the stable marriage and stable roommate problems in three dimensions
The stable marriage problem is a matching problem that pairs members of two sets. The objective is to achieve a matching that satisfies all participants based on their preferences. The stable roommate problem is a variant involving only one set, which is partitioned into pairs with a similar objective. There exist asymptotically optimal algorithms that solve both problems.In this paper, we investigate the complexity of three dimensional extensions of these problems. This is one of twelve research directions suggested by Knuth in his book on the stable marriage problem. We show that these problems are NP-complete, and hence it is unlikely that there exist efficient algorithms for their solutions.Applying the polynomial tranformation developed in this paper, we extend the NP-completeness result to include the problem of matching couples - who are both medical school graduates - to pairs of hospital resident positions. This problem is important in practice and is dealth with annually by NRMP, the centralized program that matches all medical school graduates in the United States to available resident positions
- …
