1,787 research outputs found
ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data
There are many different ways in which change point analysis can be
performed, from purely parametric methods to those that are distribution free.
The ecp package is designed to perform multiple change point analysis while
making as few assumptions as possible. While many other change point methods
are applicable only for univariate data, this R package is suitable for both
univariate and multivariate observations. Estimation can be based upon either a
hierarchical divisive or agglomerative algorithm. Divisive estimation
sequentially identifies change points via a bisection algorithm. The
agglomerative algorithm estimates change point locations by determining an
optimal segmentation. Both approaches are able to detect any type of
distributional change within the data. This provides an advantage over many
existing change point algorithms which are only able to detect changes within
the marginal distributions
Sparse Identification and Estimation of Large-Scale Vector AutoRegressive Moving Averages
The Vector AutoRegressive Moving Average (VARMA) model is fundamental to the
theory of multivariate time series; however, in practice, identifiability
issues have led many authors to abandon VARMA modeling in favor of the simpler
Vector AutoRegressive (VAR) model. Such a practice is unfortunate since even
very simple VARMA models can have quite complicated VAR representations. We
narrow this gap with a new optimization-based approach to VARMA identification
that is built upon the principle of parsimony. Among all equivalent
data-generating models, we seek the parameterization that is "simplest" in a
certain sense. A user-specified strongly convex penalty is used to measure
model simplicity, and that same penalty is then used to define an estimator
that can be efficiently computed. We show that our estimator converges to a
parsimonious element in the set of all equivalent data-generating models, in a
double asymptotic regime where the number of component time series is allowed
to grow with sample size. Further, we derive non-asymptotic upper bounds on the
estimation error of our method relative to our specially identified target.
Novel theoretical machinery includes non-asymptotic analysis of infinite-order
VAR, elastic net estimation under a singular covariance structure of
regressors, and new concentration inequalities for quadratic forms of random
variables from Gaussian time series. We illustrate the competitive performance
of our methods in simulation and several application domains, including
macro-economic forecasting, demand forecasting, and volatility forecasting
Mixed Data and Classification of Transit Stops
An analysis of the characteristics and behavior of individual bus stops can
reveal clusters of similar stops, which can be of use in making routing and
scheduling decisions, as well as determining what facilities to provide at each
stop. This paper provides an exploratory analysis, including several possible
clustering results, of a dataset provided by the Regional Transit Service of
Rochester, NY. The dataset describes ridership on public buses, recording the
time, location, and number of entering and exiting passengers each time a bus
stops. A description of the overall behavior of bus ridership is followed by a
stop-level analysis. We compare multiple measures of stop similarity, based on
location, route information, and ridership volume over time
- …
