Search CORE

287 research outputs found

Probabilistic performance estimators for computational chemistry methods: Systematic Improvement Probability and Ranking Probability Matrix. I. Theory

Author: Pernot Pascal
Savin Andreas
Publication venue: 'AIP Publishing'
Publication date: 01/01/2020
Field of study

The comparison of benchmark error sets is an essential tool for the evaluation of theories in computational chemistry. The standard ranking of methods by their Mean Unsigned Error is unsatisfactory for several reasons linked to the non-normality of the error distributions and the presence of underlying trends. Complementary statistics have recently been proposed to palliate such deficiencies, such as quantiles of the absolute errors distribution or the mean prediction uncertainty. We introduce here a new score, the systematic improvement probability (SIP), based on the direct system-wise comparison of absolute errors. Independently of the chosen scoring rule, the uncertainty of the statistics due to the incompleteness of the benchmark data sets is also generally overlooked. However, this uncertainty is essential to appreciate the robustness of rankings. In the present article, we develop two indicators based on robust statistics to address this problem: P_{inv}, the inversion probability between two values of a statistic, and \mathbf{P}_{r}, the ranking probability matrix. We demonstrate also the essential contribution of the correlations between error sets in these scores comparisons

arXiv.org e-Print Archive

Crossref

HAL Descartes

HAL: Hyper Article en Ligne

Hal-Diderot

Probabilistic performance estimators for computational chemistry methods: Systematic Improvement Probability and Ranking Probability Matrix. II. Applications

Author: Pernot Pascal
Savin Andreas
Publication venue: 'AIP Publishing'
Publication date: 01/01/2020
Field of study

In the first part of this study (Paper I), we introduced the systematic improvement probability (SIP) as a tool to assess the level of improvement on absolute errors to be expected when switching between two computational chemistry methods. We developed also two indicators based on robust statistics to address the uncertainty of ranking in computational chemistry benchmarks: Pinv , the inversion probability between two values of a statistic, and Pr , the ranking probability matrix. In this second part, these indicators are applied to nine data sets extracted from the recent benchmarking literature. We illustrate also how the correlation between the error sets might contain useful information on the benchmark dataset quality, notably when experimental data are used as reference

arXiv.org e-Print Archive

Crossref

HAL Descartes

HAL: Hyper Article en Ligne

Hal-Diderot

Probabilistic performance estimators for computational chemistry methods: the empirical cumulative distribution function of absolute errors

Author: Pernot Pascal
Savin Andreas
Publication venue: 'AIP Publishing'
Publication date: 01/01/2018
Field of study

Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely (1) the probability for a new calculation to have an absolute error below a chosen threshold, and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.Comment: Supplementary material: https://github.com/ppernot/ECDF

arXiv.org e-Print Archive

Crossref

HAL: Hyper Article en Ligne

Hal-Diderot

Investigating the performance of shear wave elastography for cardiac stiffness assessment through finite element simulations

Author: Caenen Annette
Pernot Mathieu
Segers Patrick
Shcherbakova Darya
Swillens Abigail
Verdonck Pascal
Publication venue
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

Stratification of uncertainties recalibrated by isotonic regression and its impact on calibration error statistics

Author: Pernot Pascal
Publication venue
Publication date: 08/06/2023
Field of study

Abstract Post hoc recalibration of prediction uncertainties of machine learning regression problems by isotonic regression might present a problem for bin-based calibration error statistics (e.g. ENCE). Isotonic regression often produces stratified uncertainties, i.e. subsets of uncertainties with identical numerical values. Partitioning of the resulting data into equal-sized bins introduces an aleatoric component to the estimation of bin-based calibration statistics. The partitioning of stratified data into bins depends on the order of the data, which is typically an uncontrolled property of calibration test/validation sets. The tie-braking method of the ordering algorithm used for binning might also introduce an aleatoric component. I show on an example how this might significantly affect the calibration diagnostics

arXiv.org e-Print Archive

Can bin-wise scaling improve consistency and adaptivity of prediction uncertainty for machine learning regression ?

Author: Pernot Pascal
Publication venue
Publication date: 24/10/2023
Field of study

Binwise Variance Scaling (BVS) has recently been proposed as a post hoc recalibration method for prediction uncertainties of machine learning regression problems that is able of more efficient corrections than uniform variance (or temperature) scaling. The original version of BVS uses uncertainty-based binning, which is aimed to improve calibration conditionally on uncertainty, i.e. consistency. I explore here several adaptations of BVS, in particular with alternative loss functions and a binning scheme based on an input-feature (X) in order to improve adaptivity, i.e. calibration conditional on X. The performances of BVS and its proposed variants are tested on a benchmark dataset for the prediction of atomization energies and compared to the results of isotonic regression.Comment: This version corrects an error in the estimation of the Sx scores for the test set, affecting Fig. 2 and Tables I-III of the initial version. The main points of the discussion and the conclusions are unchange

arXiv.org e-Print Archive

Validation of ML-UQ calibration statistics using simulated reference values: a sensitivity analysis

Author: Pernot Pascal
Publication venue
Publication date: 24/06/2024
Field of study

Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values and are mostly used in comparative studies. In consequence, calibration is almost never validated and the diagnostic is left to the appreciation of the reader. Simulated reference values, based on synthetic calibrated datasets derived from actual uncertainties, have been proposed to palliate this problem. As the generative probability distribution for the simulation of synthetic errors is often not constrained, the sensitivity of simulated reference values to the choice of generative distribution might be problematic, shedding a doubt on the calibration diagnostic. This study explores various facets of this problem, and shows that some statistics are excessively sensitive to the choice of generative distribution to be used for validation when the generative distribution is unknown. This is the case, for instance, of the correlation coefficient between absolute errors and uncertainties (CC) and of the expected normalized calibration error (ENCE). A robust validation workflow to deal with simulated reference values is proposed

arXiv.org e-Print Archive