Search CORE

616 research outputs found

Detecting Large Concept Extensions for Conceptual Analysis

Author: C Dutilh Novaes
DJ Chalmers
DM Blei
F Jackson
KL Gwet
S Deerwester
S Haslanger
S Laurence
TL Griffiths
U Fayyad
Publication venue
Publication date: 18/06/2017
Field of study

When performing a conceptual analysis of a concept, philosophers are interested in all forms of expression of a concept in a text---be it direct or indirect, explicit or implicit. In this paper, we experiment with topic-based methods of automating the detection of concept expressions in order to facilitate philosophical conceptual analysis. We propose six methods based on LDA, and evaluate them on a new corpus of court decision that we had annotated by experts and non-experts. Our results indicate that these methods can yield important improvements over the keyword heuristic, which is often used as a concept detection heuristic in many contexts. While more work remains to be done, this indicates that detecting concepts through topics can serve as a general-purpose method for at least some forms of concept expression that are not captured using naive keyword approaches

arXiv.org e-Print Archive

Crossref

Community detection based on links and node features in social networks

Author: A. Pothen
D.M.. Blei
J. Xie
J.M. Kleinberg
M. Girvan
S.. Fortunato
S.C. Deerwester
Publication venue
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms

Crossref

OPUS - University of Technology Sydney

Word Embeddings for Entity-annotated Texts

Author: A Das
A Spitz
CD Manning
D Nadeau
E Bruni
F Hill
F Hill
H Abdi
H Rubenstein
J Mitchell
J Strötgen
JG Moreno
L Maaten
P Bojanowski
P Goyal
S Deerwester
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/02/2020
Field of study

Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information Retrieva

arXiv.org e-Print Archive

Crossref

Effect of Tuned Parameters on a LSA MCQ Answering Model

Author: A. C. Graesser
Alain Lifchitz
C. H. Q. Ding
D. I. Martin
G. Denhière
G. Salton
G. Salton
Guy Denhière
J. Diaz
J. Diaz
J. Quesada
M. Efron
M. F. Porter
M. W. Berry
S. Deerwester
S. T. Dumais
S. T. Dumais
Sandra Jhean-Larose
W. Kintsch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper presents the current state of a work in progress, whose objective is to better understand the effects of factors that significantly influence the performance of Latent Semantic Analysis (LSA). A difficult task, which consists in answering (French) biology Multiple Choice Questions, is used to test the semantic properties of the truncated singular space and to study the relative influence of main parameters. A dedicated software has been designed to fine tune the LSA semantic space for the Multiple Choice Questions task. With optimal parameters, the performances of our simple model are quite surprisingly equal or superior to those of 7th and 8th grades students. This indicates that semantic spaces were quite good despite their low dimensions and the small sizes of training data sets. Besides, we present an original entropy global weighting of answers' terms of each question of the Multiple Choice Questions which was necessary to achieve the model's success.Comment: 9 page

arXiv.org e-Print Archive

HAL: Hyper Article en Ligne

Hal-Diderot

Meaning-focused and Quantum-inspired Information Retrieval

Author: AY Khrennikov
D Aerts
D Aerts
D Aerts
D Aerts
D Aerts
D Aerts
D Aerts
D Osherson
D Widdows
D. Aerts
DM Blei
EM Pothos
G Zuccon
JA Hampton
JA Hampton
JA Hampton
JR Busemeyer
JR Busemeyer
K Lund
K Rijsbergen Van
M Melucci
S Deerwester
S Dumais
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/03/2013
Field of study

In recent years, quantum-based methods have promisingly integrated the traditional procedures in information retrieval (IR) and natural language processing (NLP). Inspired by our research on the identification and application of quantum structures in cognition, more specifically our work on the representation of concepts and their combinations, we put forward a 'quantum meaning based' framework for structured query retrieval in text corpora and standardized testing corpora. This scheme for IR rests on considering as basic notions, (i) 'entities of meaning', e.g., concepts and their combinations and (ii) traces of such entities of meaning, which is how documents are considered in this approach. The meaning content of these 'entities of meaning' is reconstructed by solving an 'inverse problem' in the quantum formalism, consisting of reconstructing the full states of the entities of meaning from their collapsed states identified as traces in relevant documents. The advantages with respect to traditional approaches, such as Latent Semantic Analysis (LSA), are discussed by means of concrete examples.Comment: 11 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Looking at Vector Space and Language Models for IR using Density Matrices

Author: A Gleason
AI Lvovsky
B Piwowarski
C Carpineto
ChX Zhai
G Birkhoff
G Salton
G Zuccon
G Zuccon
J Rocchio
J Zobel
K Rijsbergen van
K Tsuda
M Melucci
M Melucci
M Melucci
M Melucci
MA Nielsen
MK Warmuth
S Deerwester
SKM Wong
T Hofmann
X Zhao
Publication venue
Publication date: 08/01/2014
Field of study

In this work, we conduct a joint analysis of both Vector Space and Language Models for IR using the mathematical framework of Quantum Theory. We shed light on how both models allocate the space of density matrices. A density matrix is shown to be a general representational tool capable of leveraging capabilities of both VSM and LM representations thus paving the way for a new generation of retrieval models. We analyze the possible implications suggested by our findings.Comment: In Proceedings of Quantum Interaction 201

arXiv.org e-Print Archive

Crossref

Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles

Author: AP Dawid
C Genest
DH Wolpert
ED Sontag
F Pedregosa
GB Giannakis
I Zezula
J Kittler
J Kittler
L Breiman
L Xu
LK Hansen
M Wozniak
OP Faugeras
S Deerwester
TK Ho
V Tresp
Y Freund
Y Koren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2019
Field of study

We examine a network of learners which address the same classification task but must learn from different data sets. The learners cannot share data but instead share their models. Models are shared only one time so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach allowing to aggregate the predictions of the classifiers trained by each learner. The proposed method aggregates the base classifiers using a probabilistic model relying on Gaussian copulas. Experiments on logistic regressor ensembles demonstrate competing accuracy and increased robustness in case of dependent classifiers. A companion python implementation can be downloaded at https://github.com/john-klein/DELC

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

UCL Discovery

HAL: Hyper Article en Ligne

Hal-Diderot

Context as a non-ontological determinant of semantics

Author: A.J. Greimas
E.D. Hirsch
J. Fodor
J. Lacan
J.-F. Lyotard
L. Wittgenstein
S. Deerwester
S. Santini
S. Santini
S. Stich
T. Juhn
U. Eco
V. Moya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The final publication is available at Springer via http://dx.doi.org/110.1007/978-3-540-92235-3_11Proceedings of Third International Conference on Semantic and Digital Media Technologies, SAMT 2008, Koblenz, Germany, December 3-5, 2008.This paper proposes an alternative to formal annotation for the representation of semantics. Drawing on the position of most of last century’s linguistics and interpretation theory, the article argues that meaning is not a property of a document, but an outcome of a contextualized and situated process of interpretation. The consequence of this position is that one should not quite try to represent the meaning of a document (the way formal annotation does), but the context of the activity of which search is part. We present some general considerations on the representation and use of the context, and a simple example of a technique to encode the context represented by the documents collected in the computer in which one is working, and to use them to direct search. We show preliminary results showing that even this rather simpleminded context representation can lead to considerable improvements with respect to commercial search engines

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Error threshold in optimal coding, numerical criteria and classes of universalities for complexity

Author: A. E. Allahverdyan
A. E. Allakhverdyan
C. H. Bennet
C. Tsallis
D. B. Saakian
D. B. Saakian
D. B. Saakian
D. Dhar
D. MacKay
E. J. Chaisson
E. T. Jayenes
H. Nishimori
I. Chisar
J. L. Cardy
J. P. Crutchfield
L. D. Landau
M. Eigen
M. Eigen
M. Eigen
M. Gell-Mann
M. Gell-Mann
M. Mitchell
N. Skantzos
R. Benzi
R. S. Ingarden
S. A. Kauffmann
S. Deerwester
T. K. Landauer
T. Koski
T. M. Cover
V. S. Pande
W. Hilberg
Publication venue: 'American Physical Society (APS)'
Publication date: 05/09/2004
Field of study

The free energy of the Random Energy Model at the transition point between ferromagnetic and spin glass phases is calculated. At this point, equivalent to the decoding error threshold in optimal codes, free energy has finite size corrections proportional to the square root of the number of degrees. The response of the magnetization to the ferromagnetic couplings is maximal at the values of magnetization equal to half. We give several criteria of complexity and define different universality classes. According to our classification, at the lowest class of complexity are random graph, Markov Models and Hidden Markov Models. At the next level is Sherrington-Kirkpatrick spin glass, connected with neuron-network models. On a higher level are critical theories, spin glass phase of Random Energy Model, percolation, self organized criticality (SOC). The top level class involves HOT design, error threshold in optimal coding, language, and, maybe, financial market. Alive systems are also related with the last class. A concept of anti-resonance is suggested for the complex systems.Comment: 17 page

arXiv.org e-Print Archive

Crossref

A non-intrusive movie recommendation system

Author: A. Kontostathis
G. Adomavicius
G. Linden
G. Salton
G. Yunhua
L. Po
M. Balabanovic
R. Burke
R. Gemulla
R. Trillo
R.J. Mooney
S. Debnath
S. Deerwester
S. Sorrentino
S.T. Dumais
Y. Koren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Several recommendation systems have been developed to support the user in choosing an interesting movie from multimedia repositories. The widely utilized collaborative-filtering systems focus on the analysis of user profiles or user ratings of the items. However, these systems decrease their performance at the start-up phase and due to privacy issues, when a user hides most of his personal data. On the other hand, content-based recommendation systems compare movie features to suggest similar multimedia contents; these systems are based on less invasive observations, however they find some difficulties to supply tailored suggestions. In this paper, we propose a plot-based recommendation system, which is based upon an evaluation of similarity among the plot of a video that was watched by the user and a large amount of plots that is stored in a movie database. Since it is independent from the number of user ratings, it is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched. We experimented different methodologies to compare natural language descriptions of movies (plots) and evaluated the Latent Semantic Analysis (LSA) to be the superior one in supporting the selection of similar plots. In order to increase the efficiency of LSA, different models have been experimented and in the end, a recommendation system that is able to compare about two hundred thousands movie plots in less than a minute has been developed

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia