42 research outputs found

    The analysis and advanced extensions of canonical correlation analysis

    Get PDF
    Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A problem that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested. In this dissertation I will provide an approach to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and several extensions which use kernel and spectral learning ideas. Specifically these methods will be applied to the protein ligand matching problem. Additionally, theoretical results analyzing the behavior of CCA in the High Dimension Low Sample Size (HDLSS) setting will be provided

    Local kernel canonical correlation analysis with application to virtual drug screening

    Get PDF
    Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested

    On-the-fly Autonomous Control of Neutron Diffraction via Physics-Informed Bayesian Active Learning

    Full text link
    Neutron scattering is a unique and versatile characterization technique for probing the magnetic structure and dynamics of materials. However, instruments at neutron scattering facilities in the world is limited, and instruments at such facilities are perennially oversubscribed. We demonstrate a significant reduction in experimental time required for neutron diffraction experiments by implementation of autonomous navigation of measurement parameter space through machine learning. Prior scientific knowledge and Bayesian active learning are used to dynamically steer the sequence of measurements. We developed the autonomous neutron diffraction explorer (ANDiE) and used it to determine the magnetic order of MnO and Fe1.09Te. ANDiE can determine the Neel temperature of the materials with 5-fold enhancement in efficiency and correctly identify the transition dynamics via physics-informed Bayesian inference. ANDiE's active learning approach is broadly applicable to a variety of neutron-based experiments and can open the door for neutron scattering as a tool of accelerated materials discovery

    PEPR: pipelines for evaluating prokaryotic references

    Get PDF

    The Fast RODEO for Local Polynomial Regression

    Full text link

    The Fast RODEO for Local Polynomial Regression

    No full text
    <div><p>An open challenge in nonparametric regression is finding fast, computationally efficient approaches to estimating local bandwidths for large data sets, in particular in two or more dimensions. In the work presented here we introduce a novel local bandwidth estimation procedure for local polynomial regression which combines the greedy search of the RODEO algorithm with linear binning. The result is a fast, computationally efficient algorithm we refer to as the <i>fast RODEO</i>. We motivate the development of our algorithm by using a novel scale-space approach to derive the RODEO. We conclude with a toy example and real world example using data from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite validation study, where we show the fast RODEO’s improvement in accuracy and computational speed over two other standard approaches.</p></div

    Limit of detection determination for censored samples

    Full text link

    Using Replicates in Information Retrieval Evaluation

    Full text link
    This article explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons. Randomly partitioning the test document collection allows for multiple tests of a given system and topic (replicates). Bootstrap ANOVA can use these replicates to extract system-topic interactions—something not possible without replicates—yielding a more precise value for the system effect and a narrower confidence interval around that value. Experiments using multiple TREC collections demonstrate that removing the topic-system interactions substantially reduces the confidence intervals around the system effect as well as increases the number of significant pairwise differences found. Further, the method is robust against small changes in the number of partitions used, against variability in the documents that constitute the partitions, and the measure of effectiveness used to quantify system effectiveness.</jats:p
    corecore