240 research outputs found

    A Parametric Framework for the Comparison of Methods of Very Robust Regression

    Full text link
    There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50% of outliers in the data. Differences in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter λ\lambda that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of λ\lambda, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Finding an unknown number of multivariate outliers

    Get PDF
    We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests using other robust Mahalanobis distances show the good size and high power of our procedure. We also provide a unification of results on correction factors for estimation from truncated samples

    Volatility in the Italian Stock Market: An Empirical Study

    Get PDF
    We study the volatility of the MIB30–stock–index high–frequency data from November 28, 1994 through September 15, 1995. Our aim is to empirically characterize the volatility random walk in the framework of continuous–time finance. To this end, we compute the index volatility by means of the log–return standard deviation. We choose an hourly time window in order to investigate intraday properties of volatility. A periodic component is found for the hourly time window, in agreement with previous observations. Fluctuations are studied by means of detrended fluctuation analysis, and we detect long–range correlations. Volatility values are log–stable distributed. We discuss the implications of these results for stochastic volatility modelling.volatility; stochastic processes; random walk; statistical finance

    The Forward Search for Very Large Datasets

    Get PDF
    The identification of atypical observations and the immunization of data analysis against both outliers and failures of modeling are important aspects of modern statistics. The forward search is a graphics rich approach that leads to the formal detection of outliers and to the detection of model inadequacy combined with suggestions for model enhancement. The key idea is to monitor quantities of interest, such as parameter estimates and test statistics, as the model is fitted to data subsets of increasing size. In this paper we propose some computational improvements of the forward search algorithm and we provide a recursive implementation of the procedure which exploits the information of the previous step. The output is a set of efficient routines for fast updating of the model parameter estimates, which do not require any data sorting, and fast computation of likelihood contributions, which do not require matrix inversion or qr decomposition. It is shown that the new algorithms enable a reduction of the computation time by more than 80%. Furthemore, the running time now increases almost linearly with the sample size. All the routines described in this paper are included in the FSDA toolbox for MATLAB which is freely downloadable from the internet.JRC.G.2-Global security and crisis managemen

    Robust Bayesian regression with the forward search: theory and data analysis

    Get PDF
    The frequentist forward search yields a flexible and informative form of robust regression. The device of fictitious observations provides a natural way to include prior information in the search. However, this extension is not straightforward, requiring weighted regression. Bayesian versions of forward plots are used to exhibit the presence of multiple outliers in a data set from banking with 1903 observations and nine explanatory variables which shows, in this case, the clear advantages from including prior information in the forward search. Use of observation weights from frequentist robust regression is shown to provide a simple general method for robust Bayesian regression

    Volatility in the Italian Stock Market: an Empirical Study

    Full text link
    We study the volatility of the MIB30-stock-index high-frequency data from November 28, 1994 through September 15, 1995. Our aim is to empirically characterize the volatility random walk in the framework of continuous-time finance. To this end, we compute the index volatility by means of the log-return standard deviation. We choose an hourly time window in order to investigate intraday properties of volatility. A periodic component is found for the hourly time window, in agreement with previous observations. Fluctuations are studied by means of detrended fluctuation analysis, and we detect long-range correlations. Volatility values are log-stable distributed. We discuss the implications of these results for stochastic volatility modelling.Comment: 9 pages, 4 figures, LaTeX2e, to be published in Physica

    The analysis of transformations for profit-and-loss data

    Get PDF
    We analyse data on the performance of investment funds, 99 out of 309 of which report a loss, and on the profitability of 1405 firms, 407 of which report losses. The problem in both cases is to use regression to predict performance from sets of explanatory variables. In one case, it is clear from scatter plots of the data that the negative responses have a lower variance than the positive responses and a different relationship with the explanatory variables. Because the data include negative responses, the Box–Cox transformation cannot be used. We develop a robust version of an extension to the Yeo–Johnson transformation which allows different transformations for positive and negative responses. Tests and graphical methods from our robust analysis enable the detection of outliers, the assessment of values of the two transformation parameters and the building of simple regression models. Performance comparisons are made with non-parametric transformations

    The box-cox transformation: review and extensions

    Get PDF
    The Box-Cox power transformation family for non-negative responses in linear models has a long and interesting history in both statistical practice and theory, which we summarize. The relationship between generalized linear models and log transformed data is illustrated. Extensions investigated include the transform both sides model and the Yeo-Johnson transformation for observations that can be positive or negative. The paper also describes an extended Yeo-Johnson transformation that allows positive and negative responses to have different power transformations. Analyses of data show this to be necessary. Robustness enters in the fan plot for which the forward search provides an ordering of the data. Plausible transformations are checked with an extended fan plot. These procedures are used to compare parametric power transformations with nonparametric transformations produced by smoothing
    corecore