3,570 research outputs found
Geoadditive hazard regression for interval censored survival times
The Cox proportional hazards model is the most commonly used method when analyzing the impact of covariates on continuous survival times. In its classical form, the Cox model was introduced in the setting of right-censored observations. However, in practice other sampling schemes are frequently encountered and therefore extensions allowing for interval and left censoring or left truncation are clearly desired. Furthermore, many applications require a more flexible modeling of covariate information than the usual linear predictor. For example, effects of continuous covariates are likely to be of nonlinear form or spatial information is to be included appropriately. Further extensions should allow for time-varying effects of covariates or covariates that are themselves time-varying. Such models relax the assumption of proportional hazards. We propose a regression model for the hazard rate that combines and extends the above-mentioned features on the basis of a unifying Bayesian model formulation. Nonlinear and time-varying effects as well as the baseline hazard rate are modeled by penalized splines. Spatial effects can be included based on either Markov random fields or stationary Gaussian random fields. The model allows for arbitrary combinations of left, right and interval censoring as well as left truncation. Estimation is based on a reparameterisation of the model as a variance components mixed model. The variance parameters corresponding to inverse smoothing parameters can then be estimated based on an approximate marginal likelihood approach. As an application we present an analysis on childhood mortality in Nigeria, where the interval censoring framework also allows to deal with the problem of heaped survival times caused by memory effects. In a simulation study we investigate the effect of ignoring the impact of interval censored observations
Connecting Cluster Substructure in Galaxy Cluster Cores at z=0.2 With Cluster Assembly Histories
We use semi-analytic models of structure formation to interpret gravitational
lensing measurements of substructure in galaxy cluster cores (R<=250kpc/h) at
z=0.2. The dynamic range of the lensing-based substructure fraction
measurements is well matched to the theoretical predictions, both spanning
f_sub~0.05-0.65. The structure formation model predicts that f_sub is
correlated with cluster assembly history. We use simple fitting formulae to
parameterize the predicted correlations: Delta_90 = tau_90 + alpha_90 *
log(f_sub) and Delta_50 = tau_50 + alpha_50 * log(f_sub), where Delta_90 and
Delta_50 are the predicted lookback times from z=0.2 to when each theoretical
cluster had acquired 90% and 50% respectively of the mass it had at z=0.2. The
best-fit parameter values are: alpha_90 = (-1.34+/-0.79)Gyr, tau_90 =
(0.31+/-0.56)Gyr and alpha_50 = (-2.77+/-1.66)Gyr, tau_50 = (0.99+/-1.18)Gyr.
Therefore (i) observed clusters with f_sub<~0.1 (e.g. A383, A1835) are
interpreted, on average, to have formed at z>~0.8 and to have suffered <=10%
mass growth since z~0.4, (ii) observed clusters with f_sub>~0.4 (e.g. A68,
A773) are interpreted as, on average, forming since z~0.4 and suffering >10%
mass growth in the ~500Myr preceding z=0.2, i.e. since z=0.25. In summary,
observational measurements of f_sub can be combined with structure formation
models to estimate the age and assembly history of observed clusters. The
ability to ``age-date'' approximately clusters in this way has numerous
applications to the large clusters samples that are becoming available.Comment: Accepted by ApJL, 4 pages, 2 figure
Bayesian Semiparametric Multi-State Models
Multi-state models provide a unified framework for the description of the evolution of discrete phenomena in continuous time. One particular example are Markov processes which can be characterised by a set of time-constant transition intensities between the states. In this paper, we will extend such parametric approaches to semiparametric models with flexible transition intensities based on Bayesian versions of penalised splines. The transition intensities will be modelled as smooth functions of time and can further be related to parametric as well as nonparametric covariate effects. Covariates with time-varying effects and frailty terms can be included in addition. Inference will be conducted either fully Bayesian using Markov chain Monte Carlo simulation techniques or empirically Bayesian based on a mixed model representation. A counting process representation of semiparametric multi-state models provides the likelihood formula and also forms the basis for model validation via martingale residual processes. As an application, we will consider human sleep data with a discrete set of sleep states such as REM and Non-REM phases. In this case, simple parametric approaches are inappropriate since the dynamics underlying human sleep are strongly varying throughout the night and individual-specific variation has to be accounted for using covariate information and frailty terms
High-dimensional Structured Additive Regression Models: Bayesian Regularisation, Smoothing and Predictive Performance
Data structures in modern applications frequently combine the necessity of flexible regression techniques such as nonlinear and spatial effects with high-dimensional covariate vectors. While estimation of the former is typically achieved by supplementing the likelihood with a suitable smoothness penalty, the latter are usually assigned shrinkage penalties that enforce sparse models.
In this paper, we consider a Bayesian unifying perspective, where conditionally Gaussian priors can be assigned to all types of regression effects. Suitable hyperprior assumptions on the variances of the Gaussian distributions then induce the desired smoothness or sparseness properties. As a major advantage, general Markov chain Monte Carlo simulation algorithms can be developed that allow for the joint estimation of smooth and spatial effects
and regularised coefficient vectors. Two applications demonstrate the usefulness of the proposed procedure: A geoadditive regression model for data from the Munich rental guide and an additive probit model for the prediction of consumer credit defaults. In both cases, high-dimensional vectors of categorical covariates will be included in the regression models. The predictive ability of the resulting high-dimensional structure additive regression models compared to expert models will be of particular relevance and will be evaluated on cross-validation test data
A General Approach for the Analysis of Habitat Selection
Investigating habitat selection of animals aims at the detection of preferred and avoided habitat types as well as at the identification of covariates influencing the choice of certain habitat types. The final goal of such analyses is an improvement of the conservation of animals. Usually, habitat selection by larger animals is assessed by radio-tracking or visual observation studies, where the chosen habitat is determined for a number of animals at a set of time points. Hence the resulting data often have the following structure: A categorical variable indicating the habitat type selected by an animal at a specific time point is repeatedly observed and shall be explained by covariates. These may either describe properties of the habitat types currently available and / or properties of the animal. In this paper, we present a general approach for the analysis of such data in a categorical regression setup. The proposed model generalises and improves upon several of the approaches previously discussed in the literature and in particular allows to account for changing habitat availability due to the movement of animals within the observation area. It incorporates both habitat- and animal-specific covariates, and includes individual-specific random effects in order to account for correlations introduced by the repeated measurements on single animals. The methodology is implemented in a freely available software package. We demonstrate the general applicability and the capabilities of the proposed approach in two case studies: The analysis of a songbird in South-America and a study on brown bears in Central Europe
BayesX: Analysing Bayesian structured additive regression models
There has been much recent interest in Bayesian inference for generalized additive and related models. The increasing popularity of Bayesian methods for these and other model classes is mainly caused by the introduction of Markov chain Monte Carlo (MCMC) simulation techniques which allow the estimation of very complex and realistic models. This paper describes the capabilities of the public domain software BayesX for estimating complex regression models with structured additive predictor. The program extends the capabilities of existing software for semiparametric regression. Many model classes well known from the literature are special cases of the models supported by BayesX. Examples are Generalized Additive (Mixed) Models, Dynamic Models, Varying Coefficient Models, Geoadditive Models, Geographically Weighted Regression and models for space-time regression. BayesX supports the most common distributions for the response variable. For univariate responses these are Gaussian, Binomial, Poisson, Gamma and negative Binomial. For multicategorical responses, both multinomial logit and probit models for unordered categories of the response as well as cumulative threshold models for ordered categories may be estimated. Moreover, BayesX allows the estimation of complex continuous time survival and hazardrate models
Penalized additive regression for space-time data: a Bayesian perspective
We propose extensions of penalized spline generalized additive models for analysing space-time regression data and study them from a Bayesian perspective. Non-linear effects of continuous covariates and time trends are modelled through Bayesian versions of penalized splines, while correlated spatial effects follow a Markov random field prior. This allows to treat all functions and effects within a unified general framework by assigning appropriate priors with different forms and degrees of smoothness. Inference can be performed either with full (FB) or empirical Bayes (EB) posterior analysis. FB inference using MCMC techniques is a slight extension of own previous work. For EB inference, a computationally efficient solution is developed on the basis of a generalized linear mixed model representation. The second approach can be viewed as posterior mode estimation and is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to smoothing parameters, are then estimated by using marginal likelihood. We carefully compare both inferential procedures in simulation studies and illustrate them through real data applications. The methodology is available in the open domain statistical package BayesX and as an S-plus/R function
Variable Selection and Model Choice in Geoadditive Regression Models
Model choice and variable selection are issues of major concern in practical regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, random effects, and varying coefficient terms. The major modelling component are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a remaining smooth component with one degree of freedom to obtain a fair comparison between all model terms. A generic representation of the geoadditive model allows to devise a general boosting algorithm that implements automatic model choice and variable selection. We demonstrate the versatility of our approach with two examples: a geoadditive Poisson regression
model for species counts in habitat suitability analyses and a geoadditive logit model for the analysis of forest health
Gradient boosting in Markov-switching generalized additive models for location, scale and shape
We propose a novel class of flexible latent-state time series regression
models which we call Markov-switching generalized additive models for location,
scale and shape. In contrast to conventional Markov-switching regression
models, the presented methodology allows us to model different state-dependent
parameters of the response distribution - not only the mean, but also variance,
skewness and kurtosis parameters - as potentially smooth functions of a given
set of explanatory variables. In addition, the set of possible distributions
that can be specified for the response is not limited to the exponential family
but additionally includes, for instance, a variety of Box-Cox-transformed,
zero-inflated and mixture distributions. We propose an estimation approach
based on the EM algorithm, where we use the gradient boosting framework to
prevent overfitting while simultaneously performing variable selection. The
feasibility of the suggested approach is assessed in simulation experiments and
illustrated in a real-data setting, where we model the conditional distribution
of the daily average price of energy in Spain over time
- …
