5,243 research outputs found
A nested mixture model for protein identification using mass spectrometry
Mass spectrometry provides a high-throughput way to identify proteins in
biological samples. In a typical experiment, proteins in a sample are first
broken into their constituent peptides. The resulting mixture of peptides is
then subjected to mass spectrometry, which generates thousands of spectra, each
characteristic of its generating peptide. Here we consider the problem of
inferring, from these spectra, which proteins and peptides are present in the
sample. We develop a statistical approach to the problem, based on a nested
mixture model. In contrast to commonly used two-stage approaches, this model
provides a one-stage solution that simultaneously identifies which proteins are
present, and which peptides are correctly identified. In this way our model
incorporates the evidence feedback between proteins and their constituent
peptides. Using simulated data and a yeast data set, we compare and contrast
our method with existing widely used approaches (PeptideProphet/ProteinProphet)
and with a recently published new approach, HSM. For peptide identification,
our single-stage approach yields consistently more accurate results. For
protein identification the methods have similar accuracy in most settings,
although we exhibit some scenarios in which the existing methods perform
poorly.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS316 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An investigation into the numerical prediction of boundary layer transition using the K.Y. Chien turbulence model
Assessments were made of the simulation capabilities of transition models developed at the University of Minnesota, as applied to the Launder-Sharma and Lam-Bremhorst two-equation turbulence models, and at The University of Texas at Austin, as applied to the K. Y. Chien two-equation turbulence model. A major shortcoming in the use of the basic K. Y. Chien turbulence model for low-Reynolds number flows was identified. The problem with the Chien model involved premature start of natural transition and a damped response as the simulation moved to fully turbulent flow at the end of transition. This is in contrast to the other two-equation turbulence models at comparable freestream turbulence conditions. The damping of the transition response of the Chien turbulence model leads to an inaccurate estimate of the start and end of transition for freestream turbulence levels greater than 1.0 percent and to difficulty in calculating proper model constants for the transition model
Goodness of fit tests for the skew-Laplace distribution
The skew-Laplace distribution is frequently used to fit the logarithm of particle sizes and it is also used in Economics, Engineering, Finance and Biology. We show the Anderson-Darling and Cramér-von Mises goodness of fit tests for this distribution
User’s Guide to For2R: A Module of Fortran 95 Output Routines Compatible with the R Statistics Language
For2R is a collection of Fortran routines for saving complex data structures into a file that can be read in the R statistics environment with a single command.1 For2R provides both the means to transfer data structures significantly more complex than simple tables, and an
archive mechanism to store data for future reference.
We developed this software because we write and run computationally intensive numerical models in Fortran, C++, and AD Model Builder. We then analyse results with R. We desired to automate data transfer to speed diagnostics during working-group meetings.
We thus developed the For2R interface to write an R data object (of type list) to a plain-text file. The master list can contain any number of matrices, values, dataframes, vectors or lists,
all of which can be read into R with a single call to the dget function. This allows easy transfer of structured data from compiled models to R.
Having the capacity to transfer model data, metadata, and results has sharply reduced the time spent on diagnostics, and at the same time, our diagnostic capabilities have improved tremendously. The simplicity of this interface and the capabilities of R have enabled us to automate graph and table creation for formal reports. Finally, the persistent storage in files makes it easier to treat model results in analyses or meta-analyses devised months—or even
years—later.
We offer For2R to others in the hope that they will find it useful. (PDF contains 31 pages
Projecting a High-School Quarterback’s Performance at the Collegiate Level: A Comparison of the Rivals, 247 Sports, and ESPN Recruiting Ratings
We examine recruiting ratings for high-school quarterbacks over the period 2006-2012 from Rivals, 247 Sports, and ESPN. Using Lee & Preacher’s (2013) test of the difference between two dependent correlations with one variable in common and ordinary least squares regression, we determine that the Rivals ratings have the strongest correlation with quarterback performance over the time-period examined. The 247 Sports ratings follow closely behind the Rivals ratings; however, the ESPN ratings correlate more weakly with a quarterback’s career performance in college
Photonic architecture for scalable quantum information processing in NV-diamond
Physics and information are intimately connected, and the ultimate
information processing devices will be those that harness the principles of
quantum mechanics. Many physical systems have been identified as candidates for
quantum information processing, but none of them are immune from errors. The
challenge remains to find a path from the experiments of today to a reliable
and scalable quantum computer. Here, we develop an architecture based on a
simple module comprising an optical cavity containing a single
negatively-charged nitrogen vacancy centre in diamond. Modules are connected by
photons propagating in a fiber-optical network and collectively used to
generate a topological cluster state, a robust substrate for quantum
information processing. In principle, all processes in the architecture can be
deterministic, but current limitations lead to processes that are probabilistic
but heralded. We find that the architecture enables large-scale quantum
information processing with existing technology.Comment: 24 pages, 14 Figures. Comment welcom
User’s Guide to C2R: A Set of C Language Output Routines Compatible with the R Statistics Language
C2R is a collection of C routines for saving complex data structures into a file that can be read in the R statistics environment with a single command.1 C2R provides both the means to transfer data structures significantly more complex than simple tables, and an archive mechanism
to store data for future reference.
We developed this software because we write and run computationally intensive numerical models in Fortran, C++, and AD Model Builder. We then analyse results with R. We desired to automate data transfer to speed diagnostics during working-group meetings.
We thus developed the C2R interface to write an R data object (of type list) to a plain-text file. The master list can contain any number of matrices, values, dataframes, vectors or lists, all of which can be read into R with a single call to the dget function. This allows easy transfer
of structured data from compiled models to R.
Having the capacity to transfer model data, metadata, and results has sharply reduced the time spent on diagnostics, and at the same time, our diagnostic capabilities have improved tremendously. The simplicity of this interface and the capabilities of R have enabled us to automate graph and table creation for formal reports. Finally, the persistent storage in files makes it easier to treat model results in analyses or meta-analyses devised months—or even years—later.
We offer C2R to others in the hope that they will find it useful. (PDF contains 27 pages
- …
