Search CORE

500 research outputs found

Measuring Thematic Fit with Distributional Feature Overlap

Author: Blache Philippe
Chersoni Emmanuele
Lenci Alessandro
Santus Enrico
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we introduce a new distributional method for modeling predicate-argument thematic fit judgments. We use a syntax-based DSM to build a prototypical representation of verb-specific roles: for every verb, we extract the most salient second order contexts for each of its roles (i.e. the most salient dimensions of typical role fillers), and then we compute thematic fit as a weighted overlap between the top features of candidate fillers and role prototypes. Our experiments show that our method consistently outperforms a baseline re-implementing a state-of-the-art system, and achieves better or comparable results to those reported in the literature for the other unsupervised systems. Moreover, it provides an explicit representation of the features characterizing verb-specific semantic roles.Comment: 9 pages, 2 figures, 5 tables, EMNLP, 2017, thematic fit, selectional preference, semantic role, DSMs, Distributional Semantic Models, Vector Space Models, VSMs, cosine, APSyn, similarity, prototyp

arXiv.org e-Print Archive

Crossref

HAL AMU

Archivio della Ricerca - Università di Pisa

Is Structure Necessary for Modeling Argument Expectations in Distributional Semantics?

Author: Blache Philippe
Chersoni Emmanuele
Lenci Alessandro
Santus Enrico
Publication venue
Publication date: 01/01/2017
Field of study

Despite the number of NLP studies dedicated to thematic fit estimation, little attention has been paid to the related task of composing and updating verb argument expectations. The few exceptions have mostly modeled this phenomenon with structured distributional models, implicitly assuming a similarly structured representation of events. Recent experimental evidence, however, suggests that human processing system could also exploit an unstructured "bag-of-arguments" type of event representation to predict upcoming input. In this paper, we re-implement a traditional structured model and adapt it to compare the different hypotheses concerning the degree of structure in our event knowledge, evaluating their relative performance in the task of the argument expectations update.Comment: conference paper, IWC

arXiv.org e-Print Archive

HAL AMU

Archivio della Ricerca - Università di Pisa

Distributional Semantics Today Introduction to the special issue

Author: Fabre Cécile
Lenci Alessandro
Publication venue: ATALA (Association pour le Traitement Automatique des Langues)
Publication date: 01/01/2015
Field of study

International audienceThis introduction to the special issue of the TAL journal on distributional semantics provides an overview of the current topics of this field and gives a brief summary of the contributions. RÉSUMÉ. Cette introduction au numéro spécial de la revue TAL consacré à la sémantique dis-tributionnelle propose un panorama des thèmes de recherche actuels dans ce champ et fournit un résumé succinct des contributions acceptées

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Unsupervised Measure of Word Similarity: How to Outperform Co-occurrence and Vector Cosine in VSMs

Author: Chiu Tin-Shing
Huang Chu-Ren
Lenci Alessandro
Lu Qin
Santus Enrico
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we claim that vector cosine, which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of the Average Precision that, without any optimization, outperforms the vector cosine and the co-occurrence on the standard ESL test set, with an improvement ranging between +9.00% and +17.98%, depending on the number of chosen top contexts.Comment: in AAAI 2016. arXiv admin note: substantial text overlap with arXiv:1603.0870

arXiv.org e-Print Archive

The Hong Kong Polytechnic University Pao Yue-kong Library

Association for the Advancement of Artificial Intelligence: AAAI Publications

What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets

Author: Chiu Tin-Shing
Huang Chu-Ren
Lenci Alessandro
Lu Qin
Santus Enrico
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we claim that Vector Cosine, which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that, independently of the adopted parameters, outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches.Comment: in LREC 201

arXiv.org e-Print Archive

The Hong Kong Polytechnic University Pao Yue-kong Library

Archivio della Ricerca - Università di Pisa

Evalita\u2709 Parsing Task: comparing dependency parsers and treebanks

Author: Bosco Cristina
Dell\u27Orletta Felice
Lenci Alessandro
Mazzei Alessandro
Montemagni Simonetta
Publication venue: EVALITA 2009 organizer
Publication date
Field of study

The aim of Evalita Parsing Task is at defining and extending Italian state of the art parsing by encouraging the application of existing models and approaches. As in the Evalita\u2707, the Task is organized around two tracks, i.e. Dependency Parsing and Constituency Parsing. As a main novelty with respect to the previous edition, the Dependency Parsing track has been articulated into two subtasks, differing at the level of the used treebanks, thus creating the prerequisites for assessing the impact of different annotation schemes on the parsers performance. In this paper, we describe the Dependency Parsing track by presenting the data sets for development and testing, reporting the test results and providing a first comparative analysis of these results, also with respect to state of the art parsing technologies

PUblication MAnagement

Unsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora

Author: Lenci Alessandro
McGillivray Barbara
Montemagni Simonetta
Pirrelli Vito
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2008
Field of study

In this paper, we reported experiments of unsupervised automatic acquisition of Italian and English verb subcategorization frames (SCFs) from general and domain corpora. The proposed technique operates on syntactically shallow-parsed corpora on the basis of a limited number of search heuristics not relying on any previous lexico-syntactic knowledge about SCFs. Although preliminary, reported results are in line with state-of-the-art lexical acquisition systems. The issue of whether verbs sharing similar SCFs distributions happen to share similar semantic properties as well was also explored by clustering verbs that share frames with the same distribution using the Minimum Description Length Principle (MDL). First experiments in this direction were carried out on Italian verbs with encouraging results

Archivio della Ricerca - Università di Pisa

PUblication MAnagement

Ontology learning from Italian legal texts

Author: Lenci Alessandro
Montemagni Simonetta
Pirrelli Vito
Venturi Giulia
Publication venue: New IOS Press Publication
Publication date
Field of study

The paper reports on the methodology and preliminary results of a case study in automatically extracting ontological knowledge from Italian legislative texts. We use a fully-implemented ontology learning system (T2K) that includes a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine language learning. Tools are dynamically integrated to provide an incremental representation of the content of vast repositories of unstructured documents. Evaluated results, however preliminary, show the great potential of NLP-powered incremental systems like T2K for accurate large-scale semi-automatic extraction of legal ontologies

PUblication MAnagement

Acquiring Legal Ontologies from Domain-specific Texts

Author: Dell\u27Orletta Felice
Lenci Alessandro
Marchi Simone
Montemagni Simonetta
Pirrelli Vito
Venturi Giulia
Publication venue
Publication date
Field of study

The paper reports on methodology and preliminary results of a case study in automatically extracting ontological knowledge from Italian legislative texts in the environmental domain. We use a fully-implemented ontology learning system (T2K) that includes a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine language learning. Tools are dynamically integrated to provide an incremental representation of the content of vast repositories of unstructured documents. Evaluated results, however preliminary, are very encouraging, showing the great potential of NLP-powered incremental systems like T2K for accurate large-scale semi?automatic extraction of legal ontologies

PUblication MAnagement

Dal testo alla conoscenza e ritorno: estrazione terminologica e annotazione semantica di basi documentali di dominio.

Author: Dell\u27Orletta Felice
Lenci Alessandro
Marchi Simone
Montemagni Simonetta
Pirrelli Vito
Venturi Giulia
Publication venue: Associazione Italiana per la Documentazione Avanzata
Publication date: 01/01/2008
Field of study

The paper focuses on the automatic extraction of domain knowledge from Italian legal texts and presents a fully-implemented ontology learning system (T2K, Text-2-Knowledge) that includes a battery of tools for Natural Language Processing, statistical text analysis and machine learning. Evaluated results show the considerable potential of systems like T2K, exploiting an incremental interleaving of NLP and machine learning techniques for accurate large-scale semi-automatic extraction and structuring of domain-specific knowledge

Archivio della Ricerca - Università di Pisa

PUblication MAnagement