303 research outputs found
Folksonomies and clustering in the collaborative system CiteULike
We analyze CiteULike, an online collaborative tagging system where users
bookmark and annotate scientific papers. Such a system can be naturally
represented as a tripartite graph whose nodes represent papers, users and tags
connected by individual tag assignments. The semantics of tags is studied here,
in order to uncover the hidden relationships between tags. We find that the
clustering coefficient reflects the semantical patterns among tags, providing
useful ideas for the designing of more efficient methods of data classification
and spam detection.Comment: 9 pages, 5 figures, iop style; corrected typo
Evaluating the semantic web: a task-based approach
The increased availability of online knowledge has led to the design of several algorithms that solve a variety of tasks by harvesting the Semantic Web, i.e. by dynamically selecting and exploring a multitude of online ontologies. Our hypothesis is that the performance of such novel algorithms implicity provides an insight into the quality of the used ontologies and thus opens the way to a task-based evaluation of the Semantic Web. We have investigated this hypothesis by studying the lessons learnt about online ontologies when used to solve three tasks: ontology matching, folksonomy enrichment, and word sense disambiguation. Our analysis leads to a suit of conclusions about the status of the Semantic Web, which highlight a number of strengths and weaknesses of the semantic information available online and complement the findings of other analysis of the Semantic Web landscape
A study on text-score disagreement in online reviews
In this paper, we focus on online reviews and employ artificial intelligence
tools, taken from the cognitive computing field, to help understanding the
relationships between the textual part of the review and the assigned numerical
score. We move from the intuitions that 1) a set of textual reviews expressing
different sentiments may feature the same score (and vice-versa); and 2)
detecting and analyzing the mismatches between the review content and the
actual score may benefit both service providers and consumers, by highlighting
specific factors of satisfaction (and dissatisfaction) in texts.
To prove the intuitions, we adopt sentiment analysis techniques and we
concentrate on hotel reviews, to find polarity mismatches therein. In
particular, we first train a text classifier with a set of annotated hotel
reviews, taken from the Booking website. Then, we analyze a large dataset, with
around 160k hotel reviews collected from Tripadvisor, with the aim of detecting
a polarity mismatch, indicating if the textual content of the review is in
line, or not, with the associated score.
Using well established artificial intelligence techniques and analyzing in
depth the reviews featuring a mismatch between the text polarity and the score,
we find that -on a scale of five stars- those reviews ranked with middle scores
include a mixture of positive and negative aspects.
The approach proposed here, beside acting as a polarity detector, provides an
effective selection of reviews -on an initial very large dataset- that may
allow both consumers and providers to focus directly on the review subset
featuring a text/score disagreement, which conveniently convey to the user a
summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be
published in the Journal of Cognitive Computation, available at Springer via
http://dx.doi.org/10.1007/s12559-017-9496-
Using Semantic Technologies in Digital Libraries- A Roadmap to Quality Evaluation
Abstract. In digital libraries semantic techniques are often deployed to reduce the expensive manual overhead for indexing documents, maintaining metadata, or caching for future search. However, using such techniques may cause a decrease in a collection’s quality due to their statistical nature. Since data quality is a major concern in digital libraries, it is important to be able to measure the (loss of) quality of metadata automatically generated by semantic techniques. In this paper we present a user study based on a typical semantic technique use
Tag-Aware Recommender Systems: A State-of-the-art Survey
In the past decade, Social Tagging Systems have attracted increasing
attention from both physical and computer science communities. Besides the
underlying structure and dynamics of tagging systems, many efforts have been
addressed to unify tagging information to reveal user behaviors and
preferences, extract the latent semantic relations among items, make
recommendations, and so on. Specifically, this article summarizes recent
progress about tag-aware recommender systems, emphasizing on the contributions
from three mainstream perspectives and approaches: network-based methods,
tensor-based methods, and the topic-based methods. Finally, we outline some
other tag-related works and future challenges of tag-aware recommendation
algorithms.Comment: 19 pages, 3 figure
Niche as a determinant of word fate in online groups
Patterns of word use both reflect and influence a myriad of human activities
and interactions. Like other entities that are reproduced and evolve, words
rise or decline depending upon a complex interplay between {their intrinsic
properties and the environments in which they function}. Using Internet
discussion communities as model systems, we define the concept of a word niche
as the relationship between the word and the characteristic features of the
environments in which it is used. We develop a method to quantify two important
aspects of the size of the word niche: the range of individuals using the word
and the range of topics it is used to discuss. Controlling for word frequency,
we show that these aspects of the word niche are strong determinants of changes
in word frequency. Previous studies have already indicated that word frequency
itself is a correlate of word success at historical time scales. Our analysis
of changes in word frequencies over time reveals that the relative sizes of
word niches are far more important than word frequencies in the dynamics of the
entire vocabulary at shorter time scales, as the language adapts to new
concepts and social groupings. We also distinguish endogenous versus exogenous
factors as additional contributors to the fates of words, and demonstrate the
force of this distinction in the rise of novel words. Our results indicate that
short-term nonstationarity in word statistics is strongly driven by individual
proclivities, including inclinations to provide novel information and to
project a distinctive social identity.Comment: Supporting Information is available here:
http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0019009.s00
- …
