Search CORE

42 research outputs found

The analysis and advanced extensions of canonical correlation analysis

Author: Samarov Daniel V.
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/05/2009
Field of study

Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A problem that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested. In this dissertation I will provide an approach to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and several extensions which use kernel and spectral learning ideas. Specifically these methods will be applied to the protein ligand matching problem. Additionally, theoretical results analyzing the behavior of CCA in the High Dimension Low Sample Size (HDLSS) setting will be provided

Carolina Digital Repository

Local kernel canonical correlation analysis with application to virtual drug screening

Author: Grulke Christopher
Liu Yufeng
Marron J. S.
Samarov Daniel
Tropsha Alexander
Publication venue
Publication date: 01/01/2011
Field of study

Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested

arXiv.org e-Print Archive

Crossref

PubMed Central

Carolina Digital Repository

On-the-fly Autonomous Control of Neutron Diffraction via Physics-Informed Bayesian Active Learning

Author: Doucet Mathieu
Frontzek Matthias
Kusne A. Gilad
McDannald Austin
Meuse Kate
Opsahl-Ong Jessica
Ratcliff William
Rodriguez Efrain E.
Samarov Daniel
Savici Andrei T.
Takeuchi Ichiro
Publication venue
Publication date: 07/03/2022
Field of study

Neutron scattering is a unique and versatile characterization technique for probing the magnetic structure and dynamics of materials. However, instruments at neutron scattering facilities in the world is limited, and instruments at such facilities are perennially oversubscribed. We demonstrate a significant reduction in experimental time required for neutron diffraction experiments by implementation of autonomous navigation of measurement parameter space through machine learning. Prior scientific knowledge and Bayesian active learning are used to dynamically steer the sequence of measurements. We developed the autonomous neutron diffraction explorer (ANDiE) and used it to determine the magnetic order of MnO and Fe1.09Te. ANDiE can determine the Neel temperature of the materials with 5-fold enhancement in efficiency and correctly identify the transition dynamics via physics-informed Bayesian inference. ANDiE's active learning approach is broadly applicable to a variety of neutron-based experiments and can open the door for neutron scattering as a tool of accelerated materials discovery

arXiv.org e-Print Archive

PEPR: pipelines for evaluating prokaryotic references

Author: Daniel V. Samarov
Justin M. Zook
Marc L. Salit
Nathan D. Olson
Scott A. Jackson
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The Fast RODEO for Local Polynomial Regression

Author: Daniel V. Samarov
Publication venue: Informa UK Limited
Publication date: 02/10/2015
Field of study

Crossref

The Fast RODEO for Local Polynomial Regression

Author: Daniel V. Samarov (611805)
Publication venue
Publication date
Field of study

<div><p>An open challenge in nonparametric regression is finding fast, computationally efficient approaches to estimating local bandwidths for large data sets, in particular in two or more dimensions. In the work presented here we introduce a novel local bandwidth estimation procedure for local polynomial regression which combines the greedy search of the RODEO algorithm with linear binning. The result is a fast, computationally efficient algorithm we refer to as the <i>fast RODEO</i>. We motivate the development of our algorithm by using a novel scale-space approach to derive the RODEO. We conclude with a toy example and real world example using data from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite validation study, where we show the fast RODEO’s improvement in accuracy and computational speed over two other standard approaches.</p></div

The Francis Crick Institute

Limit of detection determination for censored samples

Author: Andrew L. Rukhin
Daniel V. Samarov
Publication venue: Elsevier BV
Publication date: 01/02/2011
Field of study

Crossref

The Spatial LASSO With Applications to Unmixing Hyperspectral Biomedical Images

Author: Daniel V. Samarov
Jeeseong Hwang
Maritoni Litorja
Publication venue: Informa UK Limited
Publication date: 02/10/2015
Field of study

Crossref

Using Replicates in Information Retrieval Evaluation

Author: Daniel Samarov
Ellen M. Voorhees
Ian Soboroff
Publication venue: Association for Computing Machinery (ACM)
Publication date: 29/08/2017
Field of study

This article explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons. Randomly partitioning the test document collection allows for multiple tests of a given system and topic (replicates). Bootstrap ANOVA can use these replicates to extract system-topic interactions—something not possible without replicates—yielding a more precise value for the system effect and a narrower confidence interval around that value. Experiments using multiple TREC collections demonstrate that removing the topic-system interactions substantially reduces the confidence intervals around the system effect as well as increases the number of significant pairwise differences found. Further, the method is robust against small changes in the number of partitions used, against variability in the documents that constitute the partitions, and the measure of effectiveness used to quantify system effectiveness.</jats:p

Crossref