300 research outputs found

    Multivariate kernel density estimation applied to sensitive geo-referenced administrative data protected via measurement error

    Get PDF
    Modern systems of official statistics require the timely estimation of area- specific densities of sub-populations. Ideally estimates should be based on precise geo-coded information, which is not available due to confidentiality constraints. One approach for ensuring confidentiality is by rounding the geo- coordinates. We propose multivariate non-parametric kernel density estimation that reverses the rounding process by using a Bayesian measurement error model. The methodology is applied to the Berlin register of residents for deriving density estimates of ethnic minorities and aged people. Estimates are used for identifying areas with a need for new advisory centres for migrants and infrastructure for older people

    Amortised likelihood-free inference for expensive time-series simulators with signatured ratio estimation

    Get PDF
    Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have introduced novel algorithms for estimating otherwise intractable likelihood functions using a likelihood ratio trick based on binary classifiers. Consequently, efficient likelihood approximations can be obtained whenever good probabilistic classifiers can be constructed. We propose a kernel classifier for sequential data using path signatures based on the recently introduced signature kernel. We demonstrate that the representative power of signatures yields a highly performant classifier, even in the crucially important case where sample numbers are low. In such scenarios, our approach can outperform sophisticated neural networks for common posterior inference tasks

    Approximate Bayesian computation with path signatures

    Get PDF
    Simulation models often lack tractable likelihood functions, making likelihood-free inference methods indispensable. Approximate Bayesian computation generates likelihood-free posterior samples by comparing simulated and observed data through some distance measure, but existing approaches are often poorly suited to time series simulators, for example due to an independent and identically distributed data assumption. In this paper, we propose to use path signatures in approximate Bayesian computation to handle the sequential nature of time series. We provide theoretical guarantees on the resultant posteriors and demonstrate competitive Bayesian parameter inference for simulators generating univariate, multivariate, and irregularly spaced sequences of non-iid data

    Black-box Bayesian inference for agent-based models

    Get PDF
    Simulation models, in particular agent-based models, are gaining popularity in economics and the social sciences. The considerable flexibility they offer, as well as their capacity to reproduce a variety of empirically observed behaviours of complex systems, give them broad appeal, and the increasing availability of cheap computing power has made their use feasible. Yet a widespread adoption in real-world modelling and decision-making scenarios has been hindered by the difficulty of performing parameter estimation for such models. In general, simulation models lack a tractable likelihood function, which precludes a straightforward application of standard statistical inference techniques. A number of recent works have sought to address this problem through the application of likelihood-free inference techniques, in which parameter estimates are determined by performing some form of comparison between the observed data and simulation output. However, these approaches are (a) founded on restrictive assumptions, and/or (b) typically require many hundreds of thousands of simulations. These qualities make them unsuitable for large-scale simulations in economics and the social sciences, and can cast doubt on the validity of these inference methods in such scenarios. In this paper, we investigate the efficacy of two classes of simulation-efficient black-box approximate Bayesian inference methods that have recently drawn significant attention within the probabilistic machine learning community: neural posterior estimation and neural density ratio estimation. We present a number of benchmarking experiments in which we demonstrate that neural network-based black-box methods provide state of the art parameter inference for economic simulation models, and crucially are compatible with generic multivariate or even non-Euclidean time-series data. In addition, we suggest appropriate assessment criteria for use in future benchmarking of approximate Bayesian inference procedures for simulation models in economics and the social sciences

    Generalized Posteriors in Approximate Bayesian Computation

    Get PDF
    Complex simulators have become a ubiquitous tool in many scientific disciplines, providing high-fidelity, implicit probabilistic models of natural and social phenomena. Unfortunately, they typically lack the tractability required for conventional statistical analysis. Approximate Bayesian computation (ABC) has emerged as a key method in simulation-based inference, wherein the true model likelihood and posterior are approximated using samples from the simulator. In this paper, we draw connections between ABC and generalized Bayesian inference (GBI). First, we re-interpret the accept/reject step in ABC as an implicitly defined error model. We then argue that these implicit error models will invariably be misspecified. While ABC posteriors are often treated as a necessary evil for approximating the standard Bayesian posterior, this allows us to re-interpret ABC as a potential robustification strategy. This leads us to suggest the use of GBI within ABC, a use case we explore empirically.Comment: Accepted at Advances in Approximate Bayesian Inference, AABI 202

    Large Sample Asymptotics of the Pseudo-Marginal Method

    Get PDF
    The pseudo-marginal algorithm is a variant of the Metropolis–Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade-off the computational resources used to obtain this estimator against the asymptotic variances of the ergodic averages obtained by the pseudo-marginal algorithm. Recent works optimizing this trade-off rely on some strong assumptions which can cast doubts over their practical relevance. In particular, they all assume that the distribution of the difference between the log-density and its estimate is independent of the parameter value at which it is evaluated. Under regularity conditions we show here that, as the number of data points tends to infinity, a space-rescaled version of the pseudo-marginal chain converges weakly towards another pseudo-marginal chain for which this assumption indeed holds. A study of this limiting chain allows us to provide parameter dimension-dependent guidelines on how to optimally scale a normal random walk proposal and the number of Monte Carlo samples for the pseudo-marginal method in the large-sample regime. This complements and validates currently available results
    corecore