740 research outputs found
Multi-fidelity classification using Gaussian processes: accelerating the prediction of large-scale computational models
Machine learning techniques typically rely on large datasets to create
accurate classifiers. However, there are situations when data is scarce and
expensive to acquire. This is the case of studies that rely on state-of-the-art
computational models which typically take days to run, thus hindering the
potential of machine learning tools. In this work, we present a novel
classifier that takes advantage of lower fidelity models and inexpensive
approximations to predict the binary output of expensive computer simulations.
We postulate an autoregressive model between the different levels of fidelity
with Gaussian process priors. We adopt a fully Bayesian treatment for the
hyper-parameters and use Markov Chain Mont Carlo samplers. We take advantage of
the probabilistic nature of the classifier to implement active learning
strategies. We also introduce a sparse approximation to enhance the ability of
themulti-fidelity classifier to handle large datasets. We test these
multi-fidelity classifiers against their single-fidelity counterpart with
synthetic data, showing a median computational cost reduction of 23% for a
target accuracy of 90%. In an application to cardiac electrophysiology, the
multi-fidelity classifier achieves an F1 score, the harmonic mean of precision
and recall, of 99.6% compared to 74.1% of a single-fidelity classifier when
both are trained with 50 samples. In general, our results show that the
multi-fidelity classifiers outperform their single-fidelity counterpart in
terms of accuracy in all cases. We envision that this new tool will enable
researchers to study classification problems that would otherwise be
prohibitively expensive. Source code is available at
https://github.com/fsahli/MFclass
Multifidelity Information Fusion Algorithms for High-Dimensional Systems and Massive Data sets
We develop a framework for multifidelity information fusion and predictive inference in high-dimensional input spaces and in the presence of massive data sets. Hence, we tackle simultaneously the “big N" problem for big data and the curse of dimensionality in multivariate parametric problems. The proposed methodology establishes a new paradigm for constructing response surfaces of high-dimensional stochastic dynamical systems, simultaneously accounting for multifidelity in physical models as well as multifidelity in probability space. Scaling to high dimensions is achieved by data-driven dimensionality reduction techniques based on hierarchical functional decompositions and a graph-theoretic approach for encoding custom autocorrelation structure in Gaussian process priors. Multifidelity information fusion is facilitated through stochastic autoregressive schemes and frequency-domain machine learning algorithms that scale linearly with the data. Taking together these new developments leads to linear complexity algorithms as demonstrated in benchmark problems involving deterministic and stochastic fields in up to 10⁵ input dimensions and 10⁵ training points on a standard desktop computer
- …
