6,365 research outputs found
Reducing UK-means to k-means
This paper proposes an optimisation to the UK-means algorithm, which generalises the k-means algorithm to handle objects whose locations are uncertain. The location of each object is described by a probability density function (pdf). The UK-means algorithm needs to compute expected distances (EDs) between each object and the cluster representatives. The evaluation of ED from first principles is very costly operation, because the pdf's are different and arbitrary. But UK-means needs to evaluate a lot of EDs. This is a major performance burden of the algorithm. In this paper, we derive a formula for evaluating EDs efficiently. This tremendously reduces the execution time of UK-means, as demonstrated by our preliminary experiments. We also illustrate that this optimised formula effectively reduces the UK-means problem to the traditional clustering algorithm addressed by the k-means algorithm. © 2007 IEEE.published_or_final_versionThe 7th IEEE International Conference on Data Mining (ICDM) Workshops 2007, Omaha, NE., 28-31 October 2007. In Proceedings of the 7th ICDM, 2007, p. 483-48
Efficient mining of frequent item sets on large uncertain databases
The data handled in emerging applications like location-based services, sensor monitoring systems, and data integration, are often inexact in nature. In this paper, we study the important problem of extracting frequent item sets from a large uncertain database, interpreted under the Possible World Semantics (PWS). This issue is technically challenging, since an uncertain database contains an exponential number of possible worlds. By observing that the mining process can be modeled as a Poisson binomial distribution, we develop an approximate algorithm, which can efficiently and accurately discover frequent item sets in a large uncertain database. We also study the important issue of maintaining the mining result for a database that is evolving (e.g., by inserting a tuple). Specifically, we propose incremental mining algorithms, which enable Probabilistic Frequent Item set (PFI) results to be refreshed. This reduces the need of re-executing the whole mining algorithm on the new database, which is often more expensive and unnecessary. We examine how an existing algorithm that extracts exact item sets, as well as our approximate algorithm, can support incremental mining. All our approaches support both tuple and attribute uncertainty, which are two common uncertain database models. We also perform extensive evaluation on real and synthetic data sets to validate our approaches. © 1989-2012 IEEE.published_or_final_versio
Predictive Crime Mapping: Arbitrary Grids or Street Networks?
OBJECTIVES: Decades of empirical research demonstrate that crime is concentrated at a range of spatial scales, including street segments. Further, the degree of clustering at particular geographic units remains noticeably stable and consistent; a finding that Weisburd (Criminology 53:133–157, 2015) has recently termed the ‘law of crime concentration at places’. Such findings suggest that the future locations of crime should—to some extent at least—be predictable. To date, methods of forecasting where crime is most likely to next occur have focused either on area-level or grid-based predictions. No studies of which we are aware have developed and tested the accuracy of methods for predicting the future risk of crime at the street segment level. This is surprising given that it is at this level of place that many crimes are committed and policing resources are deployed. METHODS: Using data for property crimes for a large UK metropolitan police force area, we introduce and calibrate a network-based version of prospective crime mapping [e.g. Bowers et al. (Br J Criminol 44:641–658, 2004)], and compare its performance against grid-based alternatives. We also examine how measures of predictive accuracy can be translated to the network context, and show how differences in performance between the two cases can be quantified and tested. RESULTS: Findings demonstrate that the calibrated network-based model substantially outperforms a grid-based alternative in terms of predictive accuracy, with, for example, approximately 20 % more crime identified at a coverage level of 5 %. The improvement in accuracy is highly statistically significant at all coverage levels tested (from 1 to 10 %). CONCLUSIONS: This study suggests that, for property crime at least, network-based methods of crime forecasting are likely to outperform grid-based alternatives, and hence should be used in operational policing. More sophisticated variations of the model tested are possible and should be developed and tested in future research
Naive bayes classification of uncertain data
Traditional machine learning algorithms assume that data are exact or precise. However, this assumption may not hold in some situations because of data uncertainty arising from measurement errors, data staleness, and repeated measurements, etc. With uncertainty, the value of each data item is represented by a probability distribution function (pdf). In this paper, we propose a novel naive Bayes classification algorithm for uncertain data with a pdf. Our key solution is to extend the class conditional probability estimation in the Bayes model to handle pdf's. Extensive experiments on UCI datasets show that the accuracy of naive Bayes model can be improved by taking into account the uncertainty information. © 2009 IEEE.published_or_final_versionThe 9th IEEE International Conference on Data Mining (ICDM), Miami, FL., 6-9 December 2009. In Proceedings of the 9th ICDM, 2009, p. 944-94
The mass area of jets
We introduce a new characteristic of jets called mass area. It is defined so
as to measure the susceptibility of the jet's mass to contamination from soft
background. The mass area is a close relative of the recently introduced
catchment area of jets. We define it also in two variants: passive and active.
As a preparatory step, we generalise the results for passive and active areas
of two-particle jets to the case where the two constituent particles have
arbitrary transverse momenta. As a main part of our study, we use the mass area
to analyse a range of modern jet algorithms acting on simple one and
two-particle systems. We find a whole variety of behaviours of passive and
active mass areas depending on the algorithm, relative hardness of particles or
their separation. We also study mass areas of jets from Monte Carlo simulations
as well as give an example of how the concept of mass area can be used to
correct jets for contamination from pileup. Our results show that the
information provided by the mass area can be very useful in a range of
jet-based analyses.Comment: 36 pages, 12 figures; v2: improved quality of two plots, added entry
in acknowledgments, nicer form of formulae in appendix A; v3: added section
with MC study and pileup correction, version accepted by JHE
Revisiting Scalar and Pseudoscalar Couplings with Nucleons
Certain dark matter interactions with nuclei are mediated possibly by a
scalar or pseudoscalar Higgs boson. The estimation of the corresponding cross
sections requires a correct evaluation of the couplings between the scalar or
pseudoscalar Higgs boson and the nucleons. Progress has been made in two
aspects relevant to this study in the past few years. First, recent lattice
calculations show that the strange-quark sigma term and the
strange-quark content in the nucleon are much smaller than what are expected
previously. Second, lattice and model analyses imply sizable SU(3) breaking
effects in the determination on the axial-vector coupling constant that
in turn affect the extraction of the isosinglet coupling and the
strange quark spin component from polarized deep inelastic
scattering experiments. Based on these new developments, we re-evaluate the
relevant nucleon matrix elements and compute the scalar and pseudoscalar
couplings of the proton and neutron. We also find that the strange quark
contribution in both types of couplings is smaller than previously thought.Comment: 17 pages, Sec. II is revised and the pion-nucleon sigma term
extracted from the scattering data is discussed. Version to appear in JHE
Parity Doubling and the S Parameter Below the Conformal Window
We describe a lattice simulation of the masses and decay constants of the
lowest-lying vector and axial resonances, and the electroweak S parameter, in
an SU(3) gauge theory with and 6 fermions in the fundamental
representation. The spectrum becomes more parity doubled and the S parameter
per electroweak doublet decreases when is increased from 2 to 6,
motivating study of these trends as is increased further, toward the
critical value for transition from confinement to infrared conformality.Comment: 4 pages, 5 figures; to be submitted to PR
New poly(amino acid methacrylate) brush supports the formation of well-defined lipid membranes
A novel poly(amino acid methacrylate) brush comprising zwitterionic cysteine groups (PCysMA) was utilized as a support for lipid bilayers. The polymer brush provides a 12-nm-thick cushion between the underlying hard support and the aqueous phase. At neutral pH, the zeta potential of the PCysMA brush was ∼-10 mV. Cationic vesicles containing >25% DOTAP were found to form a homogeneous lipid bilayer, as determined by a combination of surface analytical techniques. The lipid mobility as measured by FRAP (fluorescence recovery after photobleaching) gave diffusion coefficients of ∼1.5 μm2 s-1, which are comparable to those observed for lipid bilayers on glass substrates
Model-based probabilistic frequent itemset mining
Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s).published_or_final_versio
The Nuts and Bolts of Einstein-Maxwell Solutions
We find new non-supersymmetric solutions of five-dimensional ungauged
supergravity coupled to two vector multiplets. The solutions are regular,
horizonless and have the same asymptotic charges as non-extremal charged black
holes. An essential ingredient in our construction is a four-dimensional
Euclidean base which is a solution to Einstein-Maxwell equations. We construct
stationary solutions based on the Euclidean dyonic Reissner-Nordstrom black
hole as well as a six-parameter family with a dyonic Kerr-Newman-NUT base.
These solutions can be viewed as compactifications of eleven-dimensional
supergravity on a six-torus and we discuss their brane interpretation.Comment: 29 pages, 3 figure
- …
