1,684 research outputs found
Streaming Coreset Constructions for M-Estimators
We introduce a new method of maintaining a (k,epsilon)-coreset for clustering M-estimators over insertion-only streams. Let (P,w) be a weighted set (where w : P - > [0,infty) is the weight function) of points in a rho-metric space (meaning a set X equipped with a positive-semidefinite symmetric function D such that D(x,z) <=rho(D(x,y) + D(y,z)) for all x,y,z in X). For any set of points C, we define COST(P,w,C) = sum_{p in P} w(p) min_{c in C} D(p,c). A (k,epsilon)-coreset for (P,w) is a weighted set (Q,v) such that for every set C of k points, (1-epsilon)COST(P,w,C) <= COST(Q,v,C) <= (1+epsilon)COST(P,w,C). Essentially, the coreset (Q,v) can be used in place of (P,w) for all operations concerning the COST function. Coresets, as a method of data reduction, are used to solve fundamental problems in machine learning of streaming and distributed data.
M-estimators are functions D(x,y) that can be written as psi(d(x,y)) where ({X}, d) is a true metric (i.e. 1-metric) space. Special cases of M-estimators include the well-known k-median (psi(x) =x) and k-means (psi(x) = x^2) functions. Our technique takes an existing offline construction for an M-estimator coreset and converts it into the streaming setting, where n data points arrive sequentially. To our knowledge, this is the first streaming construction for any M-estimator that does not rely on the merge-and-reduce tree. For example, our coreset for streaming metric k-means uses O(epsilon^{-2} k log k log n) points of storage. The previous state-of-the-art required storing at least O(epsilon^{-2} k log k log^{4} n) points
New Frameworks for Offline and Streaming Coreset Constructions
A coreset for a set of points is a small subset of weighted points that
approximately preserves important properties of the original set. Specifically,
if is a set of points, is a set of queries, and is a cost function, then a set with weights
is an -coreset for some parameter if
is a multiplicative approximation to
for all . Coresets are used to solve fundamental
problems in machine learning under various big data models of computation. Many
of the suggested coresets in the recent decade used, or could have used a
general framework for constructing coresets whose size depends quadratically on
what is known as total sensitivity .
In this paper we improve this bound from to . Thus our
results imply more space efficient solutions to a number of problems, including
projective clustering, -line clustering, and subspace approximation.
Moreover, we generalize the notion of sensitivity sampling for sup-sampling
that supports non-multiplicative approximations, negative cost functions and
more. The main technical result is a generic reduction to the sample complexity
of learning a class of functions with bounded VC dimension. We show that
obtaining an -sample for this class of functions with appropriate
parameters and suffices to achieve space efficient
-coresets.
Our result implies more efficient coreset constructions for a number of
interesting problems in machine learning; we show applications to
-median/-means, -line clustering, -subspace approximation, and the
integer -projective clustering problem
Improved Algorithms for Time Decay Streams
In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions.
We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well
Clustering on Sliding Windows in Polylogarithmic Space
In PODS 2003, Babcock, Datar, Motwani and O\u27Callaghan gave the first streaming solution for the k-median problem on sliding windows using
O(frack k tau^4 W^2tau log^2 W) space, with a O(2^O(1/tau)) approximation factor, where W is the window size and tau in (0,1/2) is a user-specified parameter. They left as an open question whether it is possible to improve this to polylogarithmic space. Despite much progress on clustering and sliding windows, this question has remained open for more than a decade.
In this paper, we partially answer the main open question posed by Babcock, Datar, Motwani and O\u27Callaghan. We present an algorithm yielding an exponential improvement in space compared to the previous result given in Babcock, et al. In particular, we give the first polylogarithmic space (alpha,beta)-approximation for metric k-median clustering in the sliding window model, where alpha and beta are constants, under the assumption, also made by Babcock et al., that the optimal k-median cost on any given window is bounded by a polynomial in the window size. We justify this assumption by showing that when the cost is exponential in the window size, no sublinear space approximation is possible. Our main technical contribution is a simple but elegant extension of smooth functions as introduced by Braverman and Ostrovsky, which allows us to apply well-known techniques for solving problems in the sliding window model
to functions that are not smooth, such as the k-median cost
A computer based analysis of the effects of rhythm modification on the intelligibility of the speech of hearing and deaf subjects
The speech of profoundly deaf persons often exhibits acquired unnatural rhythms, or a random pattern of rhythms. Inappropriate pause-time and speech-time durations are common in their speech. Specific rhythm deficiencies include abnormal rate of syllable utterance, improper grouping, poor timing and phrasing of syllables and unnatural stress for accent and emphasis. Assuming that temporal features are fundamental to the naturalness of spoken language, these abnormal timing patterns are often detractive. They may even be important factors in the decreased intelligibility of the speech. This thesis explores the significance of temporal cues in the rhythmic patterns of speech. An analysis-synthesis approach was employed based on the encoding and decoding of speech by a tandem chain of digital computer operations. Rhythm as a factor in the speech intelligibility of deaf and normal-hearing subjects was investigated. The results of this study support the general hypothesis that rhythm and rhythmic intuition are important to the perception of speech
CTEQ Parton Distributions and Flavor Dependence of Sea Quarks
This paper describes salient features of new sets of parton distributions
obtained by the CTEQ Collaboration based on a comprehensive QCD global analysis
of all available data. The accuracy of the new data on deep inelastic
scattering structure functions obtained by the very high statistics NMC and
CCFR experiments provides unprecedented sensitivity to the flavor dependence of
the sea-quark distributions. In addition to much better determination of the
small x dependence of all parton distributions, we found: (i) the strange quark
distribution is much softer than the non-strange sea quarks and rises above the
latter at small-x; and (ii) the difference changes sign as a
function of x. A few alternative sets of viable distributions with conventional
assumptions are also discussed.Comment: 13 pages with figures, MSUHEP-92-27, Fermilab-Pub-92/371,
FSU-HEP-92-1225, ISU-NP-92-1
Deaf characters and deafness in science fiction
Through the years, many individual reports have been published which review the treatment of deafness and deaf characters in various literary works. More recently, comprehensive anthologies have also addressed this topic. The authors add a new dimension to this area of Deaf Studies with their review of science fiction literature. Selected nineteenth and twentieth-century works of science fiction are discussed, and several deaf writerd in this genre are introduced
- …
