53 research outputs found

    Making Neural Networks Confidence-Calibrated and Practical

    Get PDF
    Neural networks (NNs) have become powerful tools due to their predictive accuracy. However, NNs' real-world applicability depends on accuracy and the alignment between confidence and accuracy, known as confidence calibration. Bayesian NNs (BNNs) and NN ensembles achieve good confidence calibration but are computationally expensive. In contrast, pointwise NNs are computationally efficient but poorly calibrated. Addressing these issues, this thesis proposes methods to enhance confidence calibration while maintaining or improving computational efficiency. For users preferring pointwise NNs, we propose methodology for regularising the NNs' training by using single or multiple artificial noises to improve confidence calibration and accuracy relative to standard training up to 12% without additional operations at runtime. For users able to modify the NN architecture, we propose the Single Architecture Ensemble (SAE) framework, which generalises multi-input and multi-exit architectures to embed multiple predictors into a single NN, emulating an ensemble, maintaining or improving confidence calibration and accuracy while reducing the number of compute operations or parameters by 1.5 to 3.7 times. For users who already trained an NN ensemble, we propose knowledge distillation to transfer the ensemble's predictive distribution to a single NN, marginally improving confidence calibration and accuracy, while halving the number of parameters or compute operations. We proposed uniform quantisation for BNNs, and benchmarked its impact on confidence calibration of pointwise NNs and BNNs, showing that e.g. 8-bit quantisation does not harm confidence calibration, but it reduces the memory footprint by 4 times in comparison to 32-bit floating-point precision. Lastly, we proposed an optimisation framework and a Dropout block to enable BNNs on existing field-programmable gate array-based accelerators, improving their inference latency or energy efficiency 2 to 100 times and algorithmic performance across tasks. This thesis presents methods to reduce NNs' computational costs while maintaining or improving their algorithmic performance, making confidence-calibrated NNs practical in real-world applications

    Simple Regularisation for Uncertainty-Aware Knowledge Distillation

    Get PDF
    Considering uncertainty estimation of modern neural networks (NNs) is one of the most important steps towards deploying machine learning systems to meaningful real-world applications such as in medicine, finance or autonomous systems. At the moment, ensembles of different NNs constitute the state-of-the-art in both accuracy and uncertainty estimation in different tasks. However, ensembles of NNs are unpractical under real-world constraints, since their computation and memory consumption scale linearly with the size of the ensemble, which increase their latency and deployment cost. In this work, we examine a simple regularisation approach for distribution-free knowledge distillation of ensemble of machine learning models into a single NN. The aim of the regularisation is to preserve the diversity, accuracy and uncertainty estimation characteristics of the original ensemble without any intricacies, such as fine-tuning. We demonstrate the generality of the approach on combinations of toy data, SVHN/CIFAR-10, simple to complex NN architectures and different tasks

    ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation

    Get PDF
    Fully convolutional U-shaped neural networks have largely been the dominant approach for pixel-wise image segmentation. In this work, we tackle two defects that hinder their deployment in real-world applications: 1) Predictions lack uncertainty quantification that may be crucial to many decision-making systems; 2) Large memory storage and computational consumption demanding extensive hardware resources. To address these issues and improve their practicality we demonstrate a few-parameter compact Bayesian convolutional architecture, that achieves a marginal improvement in accuracy in comparison to related work using significantly fewer parameters and compute operations. The architecture combines parameter-efficient operations such as separable convolutions, bilinear interpolation, multi-scale feature propagation and Bayesian inference for per-pixel uncertainty quantification through Monte Carlo Dropout. The best performing configurations required fewer than 2.5 million parameters on diverse challenging datasets with few observations

    An Online Learning Method for Microgrid Energy Management Control*

    Get PDF
    We propose a novel Model Predictive Control (MPC) scheme based on online-learning (OL) for microgrid energy management, where the control optimisation is embedded as the last layer of the neural network. The proposed MPC scheme deals with uncertainty on the load and renewable generation power profiles and on electricity prices, by employing the predictions provided by an online trained neural network in the optimisation problem. In order to adapt to possible changes in the environment, the neural network is online trained based on continuously received data. The network hyperparameters are selected by performing a hyperparameter optimisation before the deployment of the controller, using a pretraining dataset. We show the effectiveness of the proposed method for microgrid energy management through extensive experiments on real microgrid datasets. Moreover, we show that the proposed algorithm has good transfer learning (TL) capabilities among different microgrids

    Navigating Noise: A Study of How Noise Influences Generalisation and Calibration of Neural Networks

    Full text link
    Enhancing the generalisation abilities of neural networks (NNs) through integrating noise such as MixUp or Dropout during training has emerged as a powerful and adaptable technique. Despite the proven efficacy of noise in NN training, there is no consensus regarding which noise sources, types and placements yield maximal benefits in generalisation and confidence calibration. This study thoroughly explores diverse noise modalities to evaluate their impacts on NN's generalisation and calibration under in-distribution or out-of-distribution settings, paired with experiments investigating the metric landscapes of the learnt representations across a spectrum of NN architectures, tasks, and datasets. Our study shows that AugMix and weak augmentation exhibit cross-task effectiveness in computer vision, emphasising the need to tailor noise to specific domains. Our findings emphasise the efficacy of combining noises and successful hyperparameter transfer within a single domain but the difficulties in transferring the benefits to other domains. Furthermore, the study underscores the complexity of simultaneously optimising for both generalisation and calibration, emphasising the need for practitioners to carefully consider noise combinations and hyperparameter tuning for optimal performance in specific tasks and datasets.Comment: Accepted at Transactions on Machine Learning Research (April 2024). Martin and Ondrej contributed equall

    ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation

    Get PDF
    Fully convolutional U-shaped neural networks have largely been the dominant approach for pixel-wise image segmentation. In this work, we tackle two defects that hinder their deployment in real-world applications: 1) Predictions lack uncertainty quantification that may be crucial to many decision-making systems; 2) Large memory storage and computational consumption demanding extensive hardware resources. To address these issues and improve their practicality we demonstrate a few-parameter compact Bayesian convolutional architecture, that achieves a marginal improvement in accuracy in comparison to related work using significantly fewer parameters and compute operations. The architecture combines parameter-efficient operations such as separable convolutions, bilinear interpolation, multi-scale feature propagation and Bayesian inference for per-pixel uncertainty quantification through Monte Carlo Dropout. The best performing configurations required fewer than 2.5 million parameters on diverse challenging datasets with few observations.Comment: Accepted for publication at ICANN 2021. Code at: https://github.com/martinferianc/ComBiNe

    Online End-to-End Learning-Based Predictive Control for Microgrid Energy Management

    Get PDF
    This article proposes an innovative Online Learning (OL) algorithm designed for efficient microgrid energy management, integrating Recurrent Neural Networks (RNNs), and Model Predictive Control (MPC) in an End-to-End (E2E) learning-based control architecture. The algorithm leverages the RNN capabilities to predict uncertain and possibly evolving profiles of electricity price, load demand, and renewable generation. These are then exploited in an integrated MPC optimization problem to minimize the overall microgrid electricity consumption cost while guaranteeing operation constraints. The proposed methodology incorporates a specifically designed online version of the Stochastic Weight Averaging (O-SWA) and Experience Replay (ER) methods to enhance OL capabilities, ensuring more robust and adaptive learning in real-time scenarios. In addition, to address the challenge of model uncertainty, a task-based loss approach is proposed by integrating the MPC optimization as a differentiable optimization layer within the Neural Network (NN), allowing the OL architecture to jointly optimize prediction and control performance. The performance of the proposed methodology is evaluated through extensive simulation results, showcasing its Transfer Learning (TL) capabilities across different microgrid sites, which are crucial for deployment in real microgrids. We finally show that our OL algorithm can be used to estimate the prediction uncertainty of the unknown profiles

    On the effects of quantisation on model uncertainty in Bayesian neural networks

    Get PDF
    Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation. Being able to quantify uncertainty while making decisions is essential for understanding when the model is over-/under-confident, and hence BNNs are attracting interest in safety-critical applications, such as autonomous driving, healthcare, and robotics. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their increased memory and compute costs. In this work, we investigate quantisation of BNNs by compressing 32-bit floating-point weights and activations to their integer counterparts, that has already been successful in reducing the compute demand in standard pointwise neural networks. We study three types of quantised BNNs, we evaluate them under a wide range of different settings, and we empirically demonstrate that a uniform quantisation scheme applied to BNNs does not substantially decrease their quality of uncertainty estimation

    Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions

    Full text link
    The deployment of large language models (LLMs) raises concerns regarding their cultural misalignment and potential ramifications on individuals from various cultural norms. Existing work investigated political and social biases and public opinions rather than their cultural values. To address this limitation, the proposed Cultural Alignment Test (CAT) quantifies cultural alignment using Hofstede's cultural dimension framework, which offers an explanatory cross-cultural comparison through the latent variable analysis. We apply our approach to assess the cultural values embedded in state-of-the-art LLMs, such as: ChatGPT and Bard, across diverse cultures of countries: United States (US), Saudi Arabia, China, and Slovakia, using different prompting styles and hyperparameter settings. Our results not only quantify cultural alignment of LLMs with certain countries, but also reveal the difference between LLMs in explanatory cultural dimensions. While all LLMs did not provide satisfactory results in understanding cultural values, GPT-4 exhibited the highest CAT score for the cultural values of the US.Comment: 31 page
    corecore