2,075 research outputs found
A large multilingual and multi-domain dataset for recommender systems
This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset
from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books,
movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of
users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees
representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles
describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting
available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others
A Topic Recommender for Journalists
The way in which people acquire information on events and form their own
opinion on them has changed dramatically with the advent of social media. For many
readers, the news gathered from online sources become an opportunity to share points
of view and information within micro-blogging platforms such as Twitter, mainly
aimed at satisfying their communication needs. Furthermore, the need to deepen the
aspects related to news stimulates a demand for additional information which is often
met through online encyclopedias, such as Wikipedia. This behaviour has also
influenced the way in which journalists write their articles, requiring a careful assessment
of what actually interests the readers. The goal of this paper is to present
a recommender system, What to Write and Why, capable of suggesting to a journalist,
for a given event, the aspects still uncovered in news articles on which the
readers focus their interest. The basic idea is to characterize an event according to
the echo it receives in online news sources and associate it with the corresponding
readers’ communicative and informative patterns, detected through the analysis of
Twitter and Wikipedia, respectively. Our methodology temporally aligns the results
of this analysis and recommends the concepts that emerge as topics of interest from
Twitter and Wikipedia, either not covered or poorly covered in the published news
articles
Efficient pruning of large knowledge graphs
In this paper we present an efficient and highly accurate algorithm to prune noisy or over-ambiguous knowledge graphs given as input an extensional definition of a domain of interest, namely as a set
of instances or concepts. Our method climbs the graph in a bottom-up fashion, iteratively layering
the graph and pruning nodes and edges in each layer while not compromising the connectivity of the set of input nodes. Iterative layering and protection of pre-defined nodes allow to extract semantically coherent DAG structures from noisy or over-ambiguous cyclic graphs, without loss of information and without incurring in computational bottlenecks, which are the main problem of stateof- the-art methods for cleaning large, i.e., Webscale,
knowledge graphs. We apply our algorithm to the tasks of pruning automatically acquired taxonomies using benchmarking data from a SemEval evaluation exercise, as well as the extraction of a domain-adapted taxonomy from theWikipedia category hierarchy. The results show the superiority of our approach over state-of-art algorithms in terms of both output quality and computational efficiency
Large scale homophily analysis in twitter using a twixonomy
In this paper we perform a large-scale homophily analysis on Twitter using a hierarchical representation of users' interests which we call a Twixonomy. In order to build a population, community, or single-user Twixonomy we first associate "topical" friends in users' friendship lists (i.e. friends representing an interest rather than a social relation between peers) with Wikipedia categories. A wordsense disambiguation algorithm is used to select the appropriate wikipage for each topical friend. Starting from the set of wikipages representing "primitive" interests, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph G efficiently so as to induce a direct acyclic graph. This graph is the Twixonomy. Then, to analyze homophily, we compare different methods to detect communities in a peer friends Twitter network, and then for each community we compute the degree of homophily on the basis of a measure of pairwise semantic similarity. We show that the Twixonomy provides a means for describing users' interests in a compact and readable way and allows for a fine-grained homophily analysis. Furthermore, we show that midlow level categories in the Twixonomy represent the best balance between informativeness and compactness of the representation
Freeze-drying modeling and monitoring using a new neuro-evolutive technique
This paper is focused on the design of a black-box model for the process of freeze-drying of pharmaceuticals. A new methodology based on a self-adaptive differential evolution scheme is combined with a back-propagation algorithm, as local search method, for the simultaneous structural and parametric optimization of the model represented by a neural network. Using the model of the freeze-drying process, both the temperature and the residual ice content in the product vs. time can be determine off-line, given the values of the operating conditions (the temperature of the heating shelf and the pressure in the drying chamber). This makes possible to understand if the maximum temperature allowed by the product is trespassed and when the sublimation drying is complete, thus providing a valuable tool for recipe design and optimization. Besides, the black box model can be applied to monitor the freeze-drying process: in this case, the measurement of product temperature is used as input variable of the neural network in order to provide in-line estimation of the state of the product (temperature and residual amount of ice). Various examples are presented and discussed, thus pointing out the strength of the too
Can Twitter be a source of information on allergy? Correlation of pollen counts with tweets reporting symptoms of allergic rhinoconjunctivitis and names of antihistamine drugs
Pollen forecasts are in use everywhere to inform therapeutic decisions for patients with allergic rhinoconjunctivitis (ARC). We exploited data derived from Twitter in order to identify tweets reporting a combination of symptoms consistent with a case definition of ARC and those reporting the name of an antihistamine drug. In order to increase the sensitivity of the system, we applied an algorithm aimed at automatically identifying jargon expressions related to medical terms. We compared weekly Twitter trends with National Allergy Bureau weekly pollen counts derived from US stations, and found a high correlation of the sum of the total pollen counts from each stations with tweets reporting ARC symptoms (Pearson's correlation coefficient: 0.95) and with tweets reporting antihistamine drug names (Pearson's correlation coefficient: 0.93). Longitude and latitude of the pollen stations affected the strength of the correlation. Twitter and other social networks may play a role in allergic disease surveillance and in signaling drug consumptions trends
Modelling of methanol synthesis in a network of forced unsteady-state ring reactors by artificial neural networks for control purposes
A numerical model based on artificial neural networks (ANN) was developed to simulate the dynamic behaviour of a three reactors network (or ring reactor), with periodic change of the feed position, when low-pressure methanol synthesis is carried out. A multilayer, feedforward, fully connected ANN was designed and the history stack adaptation algorithm was implemented and tested with quite good results both in terms of model identification and learning rates. The influence of the ANN parameters was addressed, leading to simple guidelines for the selection of their values. A detailed model was used to generate the patterns adopted for the learning and testing phases. The simplified model was finalised to develop a model predictive control scheme in order to maximise methanol yield and to fulfil process constraints
Monitoring of the primary drying of a lyophilization process in vials
An innovative and modular system (LyoMonitor) for monitoring the primary drying of a lyophilization process in vials is illustrated: it integrates some commercial devices (pressure gauges, moisture sensor and mass spectrometer), an innovative balance and a manometric temperature measurement system based on an improved algorithm (DPE) to estimate sublimating interface temperature and position, product temperature profile, heat and mass transfer coefficients. A soft-sensor using a multipoint wireless thermometer can also estimate the previous parameters in a large number of vials. The performances of the previous devices for the determination of the end of the primary drying are compared. Finally, all these sensors can be used for control purposes and for the optimization of the process recipe; the use of DPE in a control loop will be shown as an exampl
- …
