53 research outputs found
Making Neural Networks Confidence-Calibrated and Practical
Neural networks (NNs) have become powerful tools due to their predictive accuracy. However, NNs' real-world applicability depends on accuracy and the alignment between confidence and accuracy, known as confidence calibration. Bayesian NNs (BNNs) and NN ensembles achieve good confidence calibration but are computationally expensive. In contrast, pointwise NNs are computationally efficient but poorly calibrated. Addressing these issues, this thesis proposes methods to enhance confidence calibration while maintaining or improving computational efficiency. For users preferring pointwise NNs, we propose methodology for regularising the NNs' training by using single or multiple artificial noises to improve confidence calibration and accuracy relative to standard training up to 12% without additional operations at runtime. For users able to modify the NN architecture, we propose the Single Architecture Ensemble (SAE) framework, which generalises multi-input and multi-exit architectures to embed multiple predictors into a single NN, emulating an ensemble, maintaining or improving confidence calibration and accuracy while reducing the number of compute operations or parameters by 1.5 to 3.7 times. For users who already trained an NN ensemble, we propose knowledge distillation to transfer the ensemble's predictive distribution to a single NN, marginally improving confidence calibration and accuracy, while halving the number of parameters or compute operations. We proposed uniform quantisation for BNNs, and benchmarked its impact on confidence calibration of pointwise NNs and BNNs, showing that e.g. 8-bit quantisation does not harm confidence calibration, but it reduces the memory footprint by 4 times in comparison to 32-bit floating-point precision. Lastly, we proposed an optimisation framework and a Dropout block to enable BNNs on existing field-programmable gate array-based accelerators, improving their inference latency or energy efficiency 2 to 100 times and algorithmic performance across tasks. This thesis presents methods to reduce NNs' computational costs while maintaining or improving their algorithmic performance, making confidence-calibrated NNs practical in real-world applications
Simple Regularisation for Uncertainty-Aware Knowledge Distillation
Considering uncertainty estimation of modern neural networks (NNs) is one of
the most important steps towards deploying machine learning systems to
meaningful real-world applications such as in medicine, finance or autonomous
systems. At the moment, ensembles of different NNs constitute the
state-of-the-art in both accuracy and uncertainty estimation in different
tasks. However, ensembles of NNs are unpractical under real-world constraints,
since their computation and memory consumption scale linearly with the size of
the ensemble, which increase their latency and deployment cost. In this work,
we examine a simple regularisation approach for distribution-free knowledge
distillation of ensemble of machine learning models into a single NN. The aim
of the regularisation is to preserve the diversity, accuracy and uncertainty
estimation characteristics of the original ensemble without any intricacies,
such as fine-tuning. We demonstrate the generality of the approach on
combinations of toy data, SVHN/CIFAR-10, simple to complex NN architectures and
different tasks
ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation
Fully convolutional U-shaped neural networks have largely been the dominant approach for pixel-wise image segmentation. In this work, we tackle two defects that hinder their deployment in real-world applications: 1) Predictions lack uncertainty quantification that may be crucial to many decision-making systems; 2) Large memory storage and computational consumption demanding extensive hardware resources. To address these issues and improve their practicality we demonstrate a few-parameter compact Bayesian convolutional architecture, that achieves a marginal improvement in accuracy in comparison to related work using significantly fewer parameters and compute operations. The architecture combines parameter-efficient operations such as separable convolutions, bilinear interpolation, multi-scale feature propagation and Bayesian inference for per-pixel uncertainty quantification through Monte Carlo Dropout. The best performing configurations required fewer than 2.5 million parameters on diverse challenging datasets with few observations
An Online Learning Method for Microgrid Energy Management Control*
We propose a novel Model Predictive Control (MPC) scheme based on online-learning (OL) for microgrid energy management, where the control optimisation is embedded as the last layer of the neural network. The proposed MPC scheme deals with uncertainty on the load and renewable generation power profiles and on electricity prices, by employing the predictions provided by an online trained neural network in the optimisation problem. In order to adapt to possible changes in the environment, the neural network is online trained based on continuously received data. The network hyperparameters are selected by performing a hyperparameter optimisation before the deployment of the controller, using a pretraining dataset. We show the effectiveness of the proposed method for microgrid energy management through extensive experiments on real microgrid datasets. Moreover, we show that the proposed algorithm has good transfer learning (TL) capabilities among different microgrids
Navigating Noise: A Study of How Noise Influences Generalisation and Calibration of Neural Networks
Enhancing the generalisation abilities of neural networks (NNs) through
integrating noise such as MixUp or Dropout during training has emerged as a
powerful and adaptable technique. Despite the proven efficacy of noise in NN
training, there is no consensus regarding which noise sources, types and
placements yield maximal benefits in generalisation and confidence calibration.
This study thoroughly explores diverse noise modalities to evaluate their
impacts on NN's generalisation and calibration under in-distribution or
out-of-distribution settings, paired with experiments investigating the metric
landscapes of the learnt representations across a spectrum of NN architectures,
tasks, and datasets. Our study shows that AugMix and weak augmentation exhibit
cross-task effectiveness in computer vision, emphasising the need to tailor
noise to specific domains. Our findings emphasise the efficacy of combining
noises and successful hyperparameter transfer within a single domain but the
difficulties in transferring the benefits to other domains. Furthermore, the
study underscores the complexity of simultaneously optimising for both
generalisation and calibration, emphasising the need for practitioners to
carefully consider noise combinations and hyperparameter tuning for optimal
performance in specific tasks and datasets.Comment: Accepted at Transactions on Machine Learning Research (April 2024).
Martin and Ondrej contributed equall
ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation
Fully convolutional U-shaped neural networks have largely been the dominant
approach for pixel-wise image segmentation. In this work, we tackle two defects
that hinder their deployment in real-world applications: 1) Predictions lack
uncertainty quantification that may be crucial to many decision-making systems;
2) Large memory storage and computational consumption demanding extensive
hardware resources. To address these issues and improve their practicality we
demonstrate a few-parameter compact Bayesian convolutional architecture, that
achieves a marginal improvement in accuracy in comparison to related work using
significantly fewer parameters and compute operations. The architecture
combines parameter-efficient operations such as separable convolutions,
bilinear interpolation, multi-scale feature propagation and Bayesian inference
for per-pixel uncertainty quantification through Monte Carlo Dropout. The best
performing configurations required fewer than 2.5 million parameters on diverse
challenging datasets with few observations.Comment: Accepted for publication at ICANN 2021. Code at:
https://github.com/martinferianc/ComBiNe
Online End-to-End Learning-Based Predictive Control for Microgrid Energy Management
This article proposes an innovative Online Learning (OL) algorithm designed for efficient microgrid energy management, integrating Recurrent Neural Networks (RNNs), and Model Predictive Control (MPC) in an End-to-End (E2E) learning-based control architecture. The algorithm leverages the RNN capabilities to predict uncertain and possibly evolving profiles of electricity price, load demand, and renewable generation. These are then exploited in an integrated MPC optimization problem to minimize the overall microgrid electricity consumption cost while guaranteeing operation constraints. The proposed methodology incorporates a specifically designed online version of the Stochastic Weight Averaging (O-SWA) and Experience Replay (ER) methods to enhance OL capabilities, ensuring more robust and adaptive learning in real-time scenarios. In addition, to address the challenge of model uncertainty, a task-based loss approach is proposed by integrating the MPC optimization as a differentiable optimization layer within the Neural Network (NN), allowing the OL architecture to jointly optimize prediction and control performance. The performance of the proposed methodology is evaluated through extensive simulation results, showcasing its Transfer Learning (TL) capabilities across different microgrid sites, which are crucial for deployment in real microgrids. We finally show that our OL algorithm can be used to estimate the prediction uncertainty of the unknown profiles
On the effects of quantisation on model uncertainty in Bayesian neural networks
Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation. Being able to quantify uncertainty while making decisions is essential for understanding when the model is over-/under-confident, and hence BNNs are attracting interest in safety-critical applications, such as autonomous driving, healthcare, and robotics. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their increased memory and compute costs. In this work, we investigate quantisation of BNNs by compressing 32-bit floating-point weights and activations to their integer counterparts, that has already been successful in reducing the compute demand in standard pointwise neural networks. We study three types of quantised BNNs, we evaluate them under a wide range of different settings, and we empirically demonstrate that a uniform quantisation scheme applied to BNNs does not substantially decrease their quality of uncertainty estimation
Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions
The deployment of large language models (LLMs) raises concerns regarding
their cultural misalignment and potential ramifications on individuals from
various cultural norms. Existing work investigated political and social biases
and public opinions rather than their cultural values. To address this
limitation, the proposed Cultural Alignment Test (CAT) quantifies cultural
alignment using Hofstede's cultural dimension framework, which offers an
explanatory cross-cultural comparison through the latent variable analysis. We
apply our approach to assess the cultural values embedded in state-of-the-art
LLMs, such as: ChatGPT and Bard, across diverse cultures of countries: United
States (US), Saudi Arabia, China, and Slovakia, using different prompting
styles and hyperparameter settings. Our results not only quantify cultural
alignment of LLMs with certain countries, but also reveal the difference
between LLMs in explanatory cultural dimensions. While all LLMs did not provide
satisfactory results in understanding cultural values, GPT-4 exhibited the
highest CAT score for the cultural values of the US.Comment: 31 page
- …
