1,649 research outputs found
A Unifying review of linear gaussian models
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model.We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models
On the Reduction of Errors in DNA Computation
In this paper, we discuss techniques for reducing errors in DNA computation. We investigate several methods for achieving acceptable overall error rates for a computation using basic operations that are error prone. We analyze a single essential biotechnology, sequence-specific separation, and show that separation errors theoretically can be reduced to tolerable levels by invoking a tradeoff between time, space, and error rates at the level of algorithm design. These tradeoffs do not depend upon improvement of the underlying biotechnology which implements the separation step. We outline several specific ways in which error reduction can be done and present numerical calculations of their performance
Astrometry.net: Blind astrometric calibration of arbitrary astronomical images
We have built a reliable and robust system that takes as input an
astronomical image, and returns as output the pointing, scale, and orientation
of that image (the astrometric calibration or WCS information). The system
requires no first guess, and works with the information in the image pixels
alone; that is, the problem is a generalization of the "lost in space" problem
in which nothing--not even the image scale--is known. After robust source
detection is performed in the input image, asterisms (sets of four or five
stars) are geometrically hashed and compared to pre-indexed hashes to generate
hypotheses about the astrometric calibration. A hypothesis is only accepted as
true if it passes a Bayesian decision theory test against a background
hypothesis. With indices built from the USNO-B Catalog and designed for
uniformity of coverage and redundancy, the success rate is 99.9% for
contemporary near-ultraviolet and visual imaging survey data, with no false
positives. The failure rate is consistent with the incompleteness of the USNO-B
Catalog; augmentation with indices built from the 2MASS Catalog brings the
completeness to 100% with no false positives. We are using this system to
generate consistent and standards-compliant meta-data for digital and digitized
imaging from plate repositories, automated observatories, individual scientific
investigators, and hobbyists. This is the first step in a program of making it
possible to trust calibration meta-data for astronomical data of arbitrary
provenance.Comment: submitted to A
Bi-stochastic kernels via asymmetric affinity functions
In this short letter we present the construction of a bi-stochastic kernel p
for an arbitrary data set X that is derived from an asymmetric affinity
function {\alpha}. The affinity function {\alpha} measures the similarity
between points in X and some reference set Y. Unlike other methods that
construct bi-stochastic kernels via some convergent iteration process or
through solving an optimization problem, the construction presented here is
quite simple. Furthermore, it can be viewed through the lens of out of sample
extensions, making it useful for massive data sets.Comment: 5 pages. v2: Expanded upon the first paragraph of subsection 2.1. v3:
Minor changes and edits. v4: Edited comments and added DO
Inference with Constrained Hidden Markov Models in PRISM
A Hidden Markov Model (HMM) is a common statistical model which is widely
used for analysis of biological sequence data and other sequential phenomena.
In the present paper we show how HMMs can be extended with side-constraints and
present constraint solving techniques for efficient inference. Defining HMMs
with side-constraints in Constraint Logic Programming have advantages in terms
of more compact expression and pruning opportunities during inference.
We present a PRISM-based framework for extending HMMs with side-constraints
and show how well-known constraints such as cardinality and all different are
integrated. We experimentally validate our approach on the biologically
motivated problem of global pairwise alignment
Diffusion maps for changing data
Graph Laplacians and related nonlinear mappings into low dimensional spaces
have been shown to be powerful tools for organizing high dimensional data. Here
we consider a data set X in which the graph associated with it changes
depending on some set of parameters. We analyze this type of data in terms of
the diffusion distance and the corresponding diffusion map. As the data changes
over the parameter space, the low dimensional embedding changes as well. We
give a way to go between these embeddings, and furthermore, map them all into a
common space, allowing one to track the evolution of X in its intrinsic
geometry. A global diffusion distance is also defined, which gives a measure of
the global behavior of the data over the parameter space. Approximation
theorems in terms of randomly sampled data are presented, as are potential
applications.Comment: 38 pages. 9 figures. To appear in Applied and Computational Harmonic
Analysis. v2: Several minor changes beyond just typos. v3: Minor typo
corrected, added DO
Cleaning the USNO-B Catalog through automatic detection of optical artifacts
The USNO-B Catalog contains spurious entries that are caused by diffraction
spikes and circular reflection halos around bright stars in the original
imaging data. These spurious entries appear in the Catalog as if they were real
stars; they are confusing for some scientific tasks. The spurious entries can
be identified by simple computer vision techniques because they produce
repeatable patterns on the sky. Some techniques employed here are variants of
the Hough transform, one of which is sensitive to (two-dimensional)
overdensities of faint stars in thin right-angle cross patterns centered on
bright (<13 \mag) stars, and one of which is sensitive to thin annular
overdensities centered on very bright (<7 \mag) stars. After enforcing
conservative statistical requirements on spurious-entry identifications, we
find that of the 1,042,618,261 entries in the USNO-B Catalog, 24,148,382 of
them (2.3 \percent) are identified as spurious by diffraction-spike criteria
and 196,133 (0.02 \percent) are identified as spurious by reflection-halo
criteria. The spurious entries are often detected in more than 2 bands and are
not overwhelmingly outliers in any photometric properties; they therefore
cannot be rejected easily on other grounds, i.e., without the use of computer
vision techniques. We demonstrate our method, and return to the community in
electronic form a table of spurious entries in the Catalog.Comment: published in A
Blind Normalization of Speech From Different Channels
We show how to construct a channel-independent representation of speech that
has propagated through a noisy reverberant channel. This is done by blindly
rescaling the cepstral time series by a non-linear function, with the form of
this scale function being determined by previously encountered cepstra from
that channel. The rescaled form of the time series is an invariant property of
it in the following sense: it is unaffected if the time series is transformed
by any time-independent invertible distortion. Because a linear channel with
stationary noise and impulse response transforms cepstra in this way, the new
technique can be used to remove the channel dependence of a cepstral time
series. In experiments, the method achieved greater channel-independence than
cepstral mean normalization, and it was comparable to the combination of
cepstral mean normalization and spectral subtraction, despite the fact that no
measurements of channel noise or reverberations were required (unlike spectral
subtraction).Comment: 25 pages, 7 figure
- …
