332 research outputs found
Distributional Sentence Entailment Using Density Matrices
Categorical compositional distributional model of Coecke et al. (2010)
suggests a way to combine grammatical composition of the formal, type logical
models with the corpus based, empirical word representations of distributional
semantics. This paper contributes to the project by expanding the model to also
capture entailment relations. This is achieved by extending the representations
of words from points in meaning space to density operators, which are
probability distributions on the subspaces of the space. A symmetric measure of
similarity and an asymmetric measure of entailment is defined, where lexical
entailment is measured using von Neumann entropy, the quantum variant of
Kullback-Leibler divergence. Lexical entailment, combined with the composition
map on word representations, provides a method to obtain entailment relations
on the level of sentences. Truth theoretic and corpus-based examples are
provided.Comment: 11 page
DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters
When will a server fail catastrophically in an industrial datacenter? Is it
possible to forecast these failures so preventive actions can be taken to
increase the reliability of a datacenter? To answer these questions, we have
studied what are probably the largest, publicly available datacenter traces,
containing more than 104 million events from 12,500 machines. Among these
samples, we observe and categorize three types of machine failures, all of
which are catastrophic and may lead to information loss, or even worse,
reliability degradation of a datacenter. We further propose a two-stage
framework-DC-Prophet-based on One-Class Support Vector Machine and Random
Forest. DC-Prophet extracts surprising patterns and accurately predicts the
next failure of a machine. Experimental results show that DC-Prophet achieves
an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88
(out of 1). On average, DC-Prophet outperforms other classical machine learning
methods by 39.45% in F3-score.Comment: 13 pages, 5 figures, accepted by 2017 ECML PKD
Using the quantum probability ranking principle to rank interdependent documents
A known limitation of the Probability Ranking Principle (PRP) is that it does not cater for dependence between documents. Recently, the Quantum Probability Ranking Principle (QPRP) has been proposed, which implicitly captures dependencies between documents through “quantum interference”. This paper explores whether this new ranking principle leads to improved performance for subtopic retrieval, where novelty and diversity is required. In a thorough empirical investigation, models based on the PRP, as well as other recently proposed ranking strategies for subtopic retrieval (i.e. Maximal Marginal Relevance (MMR) and Portfolio Theory(PT)), are compared against the QPRP. On the given task, it is shown that the QPRP outperforms these other ranking strategies. And unlike MMR and PT, one of the main advantages of the QPRP is that no parameter estimation/tuning is required; making the QPRP both simple and effective. This research demonstrates that the application of quantum theory to problems within information retrieval can lead to significant improvements
Looking at Vector Space and Language Models for IR using Density Matrices
In this work, we conduct a joint analysis of both Vector Space and Language
Models for IR using the mathematical framework of Quantum Theory. We shed light
on how both models allocate the space of density matrices. A density matrix is
shown to be a general representational tool capable of leveraging capabilities
of both VSM and LM representations thus paving the way for a new generation of
retrieval models. We analyze the possible implications suggested by our
findings.Comment: In Proceedings of Quantum Interaction 201
Algorithms for Hierarchical Clustering: An Overview, II
We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm. This review adds to the earlier version, Murtagh and Contreras (2012)
The good, the bad and the implicit: a comprehensive approach to annotating explicit and implicit sentiment
We present a fine-grained scheme for the annotation of polar sentiment in text, that accounts for explicit sentiment (so-called private states), as well as implicit expressions of sentiment (polar facts). Polar expressions are annotated below sentence level and classified according to their subjectivity status. Additionally, they are linked to one or more targets with a specific polar orientation and intensity. Other components of the annotation scheme include source attribution and the identification and classification of expressions that modify polarity. In previous research, little attention has been given to implicit sentiment, which represents a substantial amount of the polar expressions encountered in our data. An English and Dutch corpus of financial newswire, consisting of over 45,000 words each, was annotated using our scheme. A subset of this corpus was used to conduct an inter-annotator agreement study, which demonstrated that the proposed scheme can be used to reliably annotate explicit and implicit sentiment in real-world textual data, making the created corpora a useful resource for sentiment analysis
Quantum Particles as Conceptual Entities: A Possible Explanatory Framework for Quantum Theory
We put forward a possible new interpretation and explanatory framework for
quantum theory. The basic hypothesis underlying this new framework is that
quantum particles are conceptual entities. More concretely, we propose that
quantum particles interact with ordinary matter, nuclei, atoms, molecules,
macroscopic material entities, measuring apparatuses, ..., in a similar way to
how human concepts interact with memory structures, human minds or artificial
memories. We analyze the most characteristic aspects of quantum theory, i.e.
entanglement and non-locality, interference and superposition, identity and
individuality in the light of this new interpretation, and we put forward a
specific explanation and understanding of these aspects. The basic hypothesis
of our framework gives rise in a natural way to a Heisenberg uncertainty
principle which introduces an understanding of the general situation of 'the
one and the many' in quantum physics. A specific view on macro and micro
different from the common one follows from the basic hypothesis and leads to an
analysis of Schrodinger's Cat paradox and the measurement problem different
from the existing ones. We reflect about the influence of this new quantum
interpretation and explanatory framework on the global nature and evolutionary
aspects of the world and human worldviews, and point out potential explanations
for specific situations, such as the generation problem in particle physics,
the confinement of quarks and the existence of dark matter.Comment: 45 pages, 10 figure
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Utilizing Online Social Network and Location-Based Data to Recommend Products and Categories in Online Marketplaces
Recent research has unveiled the importance of online social networks for
improving the quality of recommender systems and encouraged the research
community to investigate better ways of exploiting the social information for
recommendations. To contribute to this sparse field of research, in this paper
we exploit users' interactions along three data sources (marketplace, social
network and location-based) to assess their performance in a barely studied
domain: recommending products and domains of interests (i.e., product
categories) to people in an online marketplace environment. To that end we
defined sets of content- and network-based user similarity features for each
data source and studied them isolated using an user-based Collaborative
Filtering (CF) approach and in combination via a hybrid recommender algorithm,
to assess which one provides the best recommendation performance.
Interestingly, in our experiments conducted on a rich dataset collected from
SecondLife, a popular online virtual world, we found that recommenders relying
on user similarity features obtained from the social network data clearly
yielded the best results in terms of accuracy in case of predicting products,
whereas the features obtained from the marketplace and location-based data
sources also obtained very good results in case of predicting categories. This
finding indicates that all three types of data sources are important and should
be taken into account depending on the level of specialization of the
recommendation task.Comment: 20 pages book chapte
Washington meets Wall Street: a closer examination of the presidential cycle puzzle
We show that average excess returns during the last two years of the presidential cycle are significantly higher than during the first two years: 9.8 percent over the period 1948 – 2008. This pattern in returns cannot be explained by business-cycle variables capturing time-varying risk premia, differences in risk levels, or by consumer and investor sentiment. In this paper, we formally test the presidential election cycle (PEC) hypothesis as the alternative explanation found in the literature for explaining the presidential cycle anomaly. PEC states that incumbent parties and presidents have an incentive to manipulate the economy (via budget expansions and taxes) to remain in power. We formulate eight empirically testable propositions relating to the fiscal, monetary, tax, unexpected inflation and political implications of the PEC hypothesis. We do not find statistically significant evidence confirming the PEC hypothesis as a plausible explanation for the presidential cycle effect. The existence of the presidential cycle effect in U.S. financial markets thus remains a puzzle that cannot be easily explained by politicians employing their economic influence to remain in power. JEL Classification: E32; G14; P16 Keywords: Political Economy, Market Efficiency, Anomalies, Calendar Effect
- …
