300 research outputs found
Multivariate kernel density estimation applied to sensitive geo-referenced administrative data protected via measurement error
Modern systems of official statistics require the timely estimation of area-
specific densities of sub-populations. Ideally estimates should be based on
precise geo-coded information, which is not available due to confidentiality
constraints. One approach for ensuring confidentiality is by rounding the geo-
coordinates. We propose multivariate non-parametric kernel density estimation
that reverses the rounding process by using a Bayesian measurement error
model. The methodology is applied to the Berlin register of residents for
deriving density estimates of ethnic minorities and aged people. Estimates are
used for identifying areas with a need for new advisory centres for migrants
and infrastructure for older people
Amortised likelihood-free inference for expensive time-series simulators with signatured ratio estimation
Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have introduced novel algorithms for estimating otherwise intractable likelihood functions using a likelihood ratio trick based on binary classifiers. Consequently, efficient likelihood approximations can be obtained whenever good probabilistic classifiers can be constructed. We propose a kernel classifier for sequential data using path signatures based on the recently introduced signature kernel. We demonstrate that the representative power of signatures yields a highly performant classifier, even in the crucially important case where sample numbers are low. In such scenarios, our approach can outperform sophisticated neural networks for common posterior inference tasks
Approximate Bayesian computation with path signatures
Simulation models often lack tractable likelihood functions, making likelihood-free inference methods indispensable. Approximate Bayesian computation generates likelihood-free posterior samples by comparing simulated and observed data through some distance measure, but existing approaches are often poorly suited to time series simulators, for example due to an independent and identically distributed data assumption. In this paper, we propose to use path signatures in approximate Bayesian computation to handle the sequential nature of time series. We provide theoretical guarantees on the resultant posteriors and demonstrate competitive Bayesian parameter inference for simulators generating univariate, multivariate, and irregularly spaced sequences of non-iid data
Kunst und Gesellschaft. Das Kunstwerk als getreuer Korrepetitor der herrschenden Verhältnisse oder als emanzipatorischer Aufklärer?
Black-box Bayesian inference for agent-based models
Simulation models, in particular agent-based models, are gaining popularity in economics and the social sciences. The considerable flexibility they offer, as well as their capacity to reproduce a variety of empirically observed behaviours of complex systems, give them broad appeal, and the increasing availability of cheap computing power has made their use feasible. Yet a widespread adoption in real-world modelling and decision-making scenarios has been hindered by the difficulty of performing parameter estimation for such models. In general, simulation models lack a tractable likelihood function, which precludes a straightforward application of standard statistical inference techniques. A number of recent works have sought to address this problem through the application of likelihood-free inference techniques, in which parameter estimates are determined by performing some form of comparison between the observed data and simulation output. However, these approaches are (a) founded on restrictive assumptions, and/or (b) typically require many hundreds of thousands of simulations. These qualities make them unsuitable for large-scale simulations in economics and the social sciences, and can cast doubt on the validity of these inference methods in such scenarios. In this paper, we investigate the efficacy of two classes of simulation-efficient black-box approximate Bayesian inference methods that have recently drawn significant attention within the probabilistic machine learning community: neural posterior estimation and neural density ratio estimation. We present a number of benchmarking experiments in which we demonstrate that neural network-based black-box methods provide state of the art parameter inference for economic simulation models, and crucially are compatible with generic multivariate or even non-Euclidean time-series data. In addition, we suggest appropriate assessment criteria for use in future benchmarking of approximate Bayesian inference procedures for simulation models in economics and the social sciences
Generalized Posteriors in Approximate Bayesian Computation
Complex simulators have become a ubiquitous tool in many scientific
disciplines, providing high-fidelity, implicit probabilistic models of natural
and social phenomena. Unfortunately, they typically lack the tractability
required for conventional statistical analysis. Approximate Bayesian
computation (ABC) has emerged as a key method in simulation-based inference,
wherein the true model likelihood and posterior are approximated using samples
from the simulator. In this paper, we draw connections between ABC and
generalized Bayesian inference (GBI). First, we re-interpret the accept/reject
step in ABC as an implicitly defined error model. We then argue that these
implicit error models will invariably be misspecified. While ABC posteriors are
often treated as a necessary evil for approximating the standard Bayesian
posterior, this allows us to re-interpret ABC as a potential robustification
strategy. This leads us to suggest the use of GBI within ABC, a use case we
explore empirically.Comment: Accepted at Advances in Approximate Bayesian Inference, AABI 202
Untersuchungen zur Implementierung eines kontrollierten Verfahrens zum Selektiven Trockenstellen in bayerischen Milchviehbetrieben
Untersuchungen zur Implementierung eines kontrollierten Verfahrens zum Selektiven Trockenstellen in bayerischen Milchviehbetrieben
Large Sample Asymptotics of the Pseudo-Marginal Method
The pseudo-marginal algorithm is a variant of the Metropolis–Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade-off the computational resources used to obtain this estimator against the asymptotic variances of the ergodic averages obtained by the pseudo-marginal algorithm. Recent works optimizing this trade-off rely on some strong assumptions which can cast doubts over their practical relevance. In particular, they all assume that the distribution of the difference between the log-density and its estimate is independent of the parameter value at which it is evaluated. Under regularity conditions we show here that, as the number of data points tends to infinity, a space-rescaled version of the pseudo-marginal chain converges weakly towards another pseudo-marginal chain for which this assumption indeed holds. A study of this limiting chain allows us to provide parameter dimension-dependent guidelines on how to optimally scale a normal random walk proposal and the number of Monte Carlo samples for the pseudo-marginal method in the large-sample regime. This complements and validates currently available results
- …
