366 research outputs found

    An efficient closed frequent itemset miner for the MOA stream mining system

    Get PDF
    Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version

    Cross-domain recommendations without overlapping data: Myth or reality?

    Get PDF
    Cross-domain recommender systems adopt different tech- niques to transfer learning from source domain to target domain in order to alleviate the sparsity problem and im- prove accuracy of recommendations. Traditional techniques require the two domains to be linked by shared character- istics associated to either users or items. In collaborative filtering (CF) this happens when the two domains have over- lapping users or item (at least partially). Recently, Li et al. [7] introduced codebook transfer (CBT), a cross-domain CF technique based on co-clustering, and presented experimen- tal results showing that CBT is able to transfer knowledge between non-overlapping domains. In this paper, we dis- prove these results and show that CBT does not transfer knowledge when source and target domains do not overlap

    Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks

    Full text link
    Session-based recommendations are highly relevant in many modern on-line services (e.g. e-commerce, video streaming) and recommendation settings. Recently, Recurrent Neural Networks have been shown to perform very well in session-based settings. While in many session-based recommendation domains user identifiers are hard to come by, there are also domains in which user profiles are readily available. We propose a seamless way to personalize RNN models with cross-session information transfer and devise a Hierarchical RNN model that relays end evolves latent hidden states of the RNNs across user sessions. Results on two industry datasets show large improvements over the session-only RNNs

    Toward building a content-based video recommendation system based on low-level features

    Get PDF
    One of the challenges in video recommendation systems is the New Item problem, which happens when the system is unable to recommend video items, that no information is available about them. For example, in the popular movie-sharing websites, such as Youtube, every-day, hundred millions of hours of videos are uploaded and big portion of these videos may not contain any meta-data, to be used by the system to generate recommendations. In this paper, we address this problem by proposing a method, that is based on automatic analysis of the video content in order to extract a number representative low-level visual features. Such features are then used to generate personalized content-based recommendations. Our evaluation shows that our proposed method can outperform the baselines, by producing more relevant recommendations. Hence, a set low-level features extracted automatically can be more descriptive and informative of the video content than a set of high-level expert annotated features

    The Arabidopsis thaliana mobilome and its impact at the species level

    Get PDF
    Transposable elements (TEs) are powerful motors of genome evolution yet a comprehensive assessment of recent transposition activity at the species level is lacking for most organisms. Here, using genome sequencing data for 211 Arabidopsis thaliana accessions taken from across the globe, we identify thousands of recent transposition events involving half of the 326 TE families annotated in this plant species. We further show that the composition and activity of the 'mobilome' vary extensively between accessions in relation to climate and genetic factors. Moreover, TEs insert equally throughout the genome and are rapidly purged by natural selection from gene-rich regions because they frequently affect genes, in multiple ways. Remarkably, loci controlling adaptive responses to the environment are the most frequent transposition targets observed. These findings demonstrate the pervasive, species-wide impact that a rich mobilome can have and the importance of transposition as a recurrent generator of large-effect alleles

    Algorithms for Sequence-Aware Recommender Systems

    Get PDF
    I sistemi di raccomandazione sono sicuramente tra le applicazioni di maggiore successo del data-mining e machine-learning; molte innovazioni tecnologiche significative su questo fronte sono state sviluppate negli ultimi due decenni. La ricerca accademica in questo campo è stata fortemente sospinta dalla disponibilità di grandi dataset composti da matrici user-item. La vasta maggioranza di questi lavori si è quindi focalizzata su di un'astrazione del problema basata su singole interazioni user-item. Il problema della raccomandazione si presenta quindi come completamento di matrici fortemente sparse, in cui le interazioni user-item mancanti devono essere predette. Ciò nonostante, in molti domini si registrano multiple interazioni di tipo diverso tra user e items nel corso del tempo. La maggior parte degli algoritmi ottimizzati per questa formulazione del problema non sono in grado di utilizzare l'informazione contenuta nelle sequenze ordinate di interazioni che sono frequentemente registrate nei log di molte applicazioni reali. Esistono inoltre domini nei quali i prodotti devono essere raccomandati in un certo ordine. Anche queste situazioni non sono gestite dagli algoritmi basati sulle sole matrici user-item. Per rispondere a queste esigenze, è stata recentemente introdotta una nuova classe di algoritmi detti sequence-aware recommender systems (SARS). Questi algoritmi possono gestire l'informazione contenuta nei log di interazioni degli utenti senza dover ricorrere ad ulteriori astrazioni come quella della matrice user-item. Questa tesi si focalizza sullo studio e definizione di nuovi algoritmi di raccomandazione sequence-aware e sulle rispettive applicazioni. Viene inizialmente presentata una caratterizzazione dettagliata del problema, delle sue relazioni e differenze rispetto ad altri problemi di raccomandazione correlati (nello specifico, la raccomandazione basata sulla matrice user-item, i sistemi di raccomandazione context-aware e time-aware). Viene infine fornita un'analisi dello stato dell'arte, degli algoritmi esistenti e delle procedure di valutazione. La seconda parte si focalizza su due problemi specifici, quelli di raccomandazione session-based e session-aware. Questi problemi hanno ricevuto particolare attenzione da parte della comunità solo di recente data la loro rilevanza in molti scenari pratici. Viene inizialmente presentato uno user-study atto a validare l'utilità di algoritmi sequence-aware personalizzati nel contesto delle prenotazioni di hotel. Dopodiché vengono presentati due nuovi algoritmi per la raccomandazione session-based e session-aware. In questi scenari è disponibile la sequenza di azioni più recenti dell'utente (quelli relativi alla sessione corrente); l'obiettivo è quello di determinare gli item rilevanti per l'utente nella sessione corrente, considerando anche gli interessi storici dello stesso quando questi sono disponibili. A tale scopo abbiamo studiato modelli basati su Recurrent Neural Networks (RNN), modelli neurali studiati espressamente per processare sequenze di informazioni. I nostri esperimenti mostrano che nuovi sistemi di raccomandazione sequence-aware basati su RNN sono efficaci in numerosi scenari applicativi reali, quali la generazione di raccomandazioni session-based basate su descrittori dei prodotti, la personalizzazione delle raccomandazioni session-based per utenti che riutilizzano il servizio, la raccomandazione di stazioni musicali e la generazione automatica di playlist. Questi modelli ci hanno permesso di studiare anche l'importanza dell'ordine delle canzoni in una playlist, un problema ancora largamente irrisolto per la comunità del Music Information Retrieval. Gli approcci presentati in questa tesi sono stati validati utilizzando diversi grandi dataset di domini differenti, quali video, annunci pubblicitari e lavorativi, hotel e musica. Viene inoltre presentato un nuovo dataset per la raccomandazione musicale all'interno delle sessioni di ascolto degli utenti. I risultati sperimentali mostrano la validità dei modelli sequence-aware presentati in questa tesi.Recommender Systems are one of the most successful applications of data mining and machine learning technology in practice and significant technological advances have been made over the last two decades. Academic research in the field in the recent past was strongly fueled by the increasing availability of large datasets containing user-item rating matrices. Many of these works were therefore based on a problem abstraction where only one single user-item interaction is considered in the recommendation process. The recommendation problem is therefore framed as matrix-completion, in which the missing entries in the user-interaction matrix have to be predicted. In many application domains, however, multiple user-item interactions of different types can be recorded over time. Most algorithms that are optimized for this particular problem setting cannot make use of the rich information that is hidden in the sequentially-ordered user interaction logs which are often available in practical applications. In addition, there are application domains, in which the items have to be recommended in a certain order. Such situations are typically not covered as well in research setups that rely on a user-item rating matrix. To address this problem, in the recent years researchers have developed a new breed of algorithms named sequence-aware recommender systems (SARS). Such algorithms can handle the information in user interaction logs by design without resorting on abstractions such as the user-item matrix. This thesis focuses on the study of novel algorithms for sequence-aware recommender systems and their applications. We first provide a characterization of the problem; we highlight the relations and differences with respect to other related recommendation problems, namely recommendation based on matrix-completion, and with respect to context-aware and time-aware recommender systems. We provide an in-depth review of the state of the art, a categorization of the existing approaches and evaluation methodologies. We then focus on the problems of session-based and session-aware recommendation. These problems have gained attention recently, given their proximity with many real-world recommendation scenarios. We first validate the usefulness of personalized sequence-aware recommendations in session-based scenarios through a user study run in the hotel booking domain. We then present novel sequence-aware algorithms for session-based and session-aware recommendation. In such a setting, we are given the sequence of the most recent actions of a user and the problem is to find items that are relevant in the context of the session and, when historical information on the user is available, that also match the user's general interests and taste. In particular, we investigate models based on Recurrent Neural Networks (RNN), the neural network configuration of choice for processing sequentially-ordered data. We show the effectiveness of sequence-aware recommenders based on RNNs in several real-life scenarios, namely session-based recommendation with rich product descriptors, personalized session-based recommendation for returning users, music station recommendation and automated playlist generation. We also investigate the importance of the track order in automated playlist generation, shedding some light on this long debated issue by the Music Information Retrieval community. In our experimental evaluation, we empirically evaluate the proposed models on large datasets from several domains, namely video, classified advertisement, hotel, job and music recommendation. We also present a novel large-scale dataset for music recommendation over user listening sessions. The empirical results show that our sequence-aware models are indeed effective in several session-based recommendation scenarios in terms of recommendation accuracy.DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIAComputer Science and Engineering29CERI, STEFANOBONARINI, ANDRE

    E-tourism recommender systems

    Get PDF
    LAUREA MAGISTRALEI Sistemi di Raccomandazione (RSs) sono utilizzati in numerosi sistemi di e-commerce per aiutare gli utenti a trovare prodotti interessati in grandi cataloghi fornendo loro delle raccomandazioni personalizzate che si adattino ai loro gusti ed interessi. Lo scopo di questa tesi è di sviluppare un RS per il dominio dell’e-tourism. Nell’e-tourism la disponibilità degli hotel dipende dalle circostanze in cui avviene la prenotazione e varia col tempo. Gli hotel “migliori” sono solitamente i primi a non essere disponibili. Mentre tradizionalmente i RSs presumono che la disponibilità dei prodotti sia potenzialmente illimitata, noi studiamo l’effetto della disponibilità variabile degli hotel sulle raccomandazioni fornite agli utenti. Altro aspetto innovativo è l’utilizzo di un meccanismo di implicit elicitation per estrarre le preferenze del cliente per poter cos`ı fornire loro delle raccomandazioni personalizzate. Abbiamo quindi valutato la qualità delle raccomandazioni utilizzando sia esperimenti offline che online, basandoci su differenti condizioni di disponibilità degli hotel. Abbiamo potuto osservare come fornire raccomandazioni personalizzate mantenga gli utenti più soddisfatti anche in condizioni di scarsa disponibilità di hotel. Infatti più del 70% degli utenti si è dichiarato soddisfatto delle propria esperienza quando il sistema ha fornito loro raccomandazioni personalizzate. Questo può essere di grande utilità, anche economica, per le agenzie di viaggio online. Inoltre, occorre notare che l’e-tourism comunemente implica un maggior rischio da parte degli utenti rispetto ad altri sistemi di e-commerce. Gli utenti sono quindi più propensi a fidarsi delle recensioni di altri utenti rispetto alle descrizioni fornite dallo stesso gestore del servizio su un determinato hotel. Abbiamo quindi introdotto una tecnica per analizzare le recensioni degli utenti e riassumerne il contenuto in un singolo valore numerico. Le informazioni cos`ı estratte dalle recensioni testuali potranno essere utilizzate potenzialmente per costruire dei RSs più accurati.Recommender Systems (RSs) are used in many e-commerce applications to help users in discovering interesting items over huge catalogs of products by providing them personalized recommendations that fit with their interests and preferences. The purpose of this work is to develop a RS for the e-tourism domain. In e-tourism hotel availability depends on contextual circumstances and varies over time. The hotels that are missing are often the “best” ones. Traditionally RSs assume that items are potentially always available. We studied the influence of variable item availability over the recommendation process. Differently from many current applications in e-tourism, we used implicit elicitation to assess users’ preferences and to provide them personalized recommendations. We evaluated the recommendations using both offline and online methods in different experimental conditions of item availability. We obtained that personalized recommendations make users the most satisfied in condition of scarcity of hotels, with more than the 70% of satisfied users. This fact can be of great utility and economical impact for online travel agencies. Moreover, e-tourism implies an higher risk for users than e-commerce applications. Hence, users tend to rely more on other users’ reviews than on the descriptions given by service providers over hotels. We describe here a technique analyze user reviews and to summarize them into numerical ratings. This information can be potentially used to build more accurate RSs

    Natural occurring epialleles determine vitamin E accumulation in tomato fruits

    Get PDF
    Vitamin E (VTE) content is a low heritability nutritional trait for which the genetic determinants are poorly understood. Here, we focus on a previously detected major tomato VTE quantitative trait loci (QTL; mQTL9-2-6) and identify the causal gene as one encoding a 2-methyl-6-phytylquinol methyltransferase (namely VTE3(1)) that catalyses one of the final steps in the biosynthesis of γ- and α-tocopherols, which are the main forms of VTE. By reverse genetic approaches, expression analyses, siRNA profiling and DNA methylation assays, we demonstrate that mQTL9-2-6 is an expression QTL associated with differential methylation of a SINE retrotransposon located in the promoter region of VTE3(1). Promoter DNA methylation can be spontaneously reverted leading to different epialleles affecting VTE3(1) expression and VTE content in fruits. These findings indicate therefore that naturally occurring epialleles are responsible for regulation of a nutritionally important metabolic QTL and provide direct evidence of a role for epigenetics in the determination of agronomic traits.L.Q. was recipient of a fellowship of Agencia Nacional de Promoción Científica y Tecnológica and Consejo Nacional de Investigaciones Científicas y Técnicas in Argentina and supported by a postdoctral fellowship from Investissements d’Avenir ANR-10-LABX-54 MEMO LIFE in France. J.A. and L.B. were recipients of a fellowship of Fundação à Amparo da Pesquisa do Estado de São Paulo (Brazil). J.V.C.d.S. was recipient of a fellowship of Conselho Nacional de Desenvolvimento Científico e Tecnológico (Brazil). R.A., L.B. and F.C. are members of Consejo Nacional de Investigaciones Científicas y Técnicas (Argentina). This work was carried out in compliance with current laws governing genetic experimentation in Brazil and in Argentina. This work was supported with grants from Instituto Nacional de Tecnologia Agropecuária, Consejo Nacional de Investigaciones Científicas y Técnicas and Agencia Nacional de Promoción Científica y Tecnológica (Argentina), Fundação à Amparo da Pesquisa do Estado de São Paulo, Conselho Nacional de Desenvolvimento Científico e Tecnológico and Universidade de São Paulo (Brazil); Max Planck Society (Germany); the Agence Nationale de la Recherche (Investissements d’Avenir ANR-10-LABX-54 MEMO LIFE and ANR-11-IDEX-0001-02 PSL* Research University to V.C.); and the European Union (EpiGeneSys FP7 Network of Excellence number 257082 to V.C. and the European Solanaceae Integrated Project FOOD-CT-2006-016214 to F.C., M.R. and A.R.F.)

    Methods for frequent pattern mining in data streams within the MOA system

    Get PDF
    IncMine is a robust, efficient, practical, usable and extendable solution to perform Frequent Itemset mining over data streams. It is implementend under the Massive Online Analysis framework. It includes an analysis over its performances and its reaction to synthetic and real concept drift
    corecore