500 research outputs found
Measuring Thematic Fit with Distributional Feature Overlap
In this paper, we introduce a new distributional method for modeling
predicate-argument thematic fit judgments. We use a syntax-based DSM to build a
prototypical representation of verb-specific roles: for every verb, we extract
the most salient second order contexts for each of its roles (i.e. the most
salient dimensions of typical role fillers), and then we compute thematic fit
as a weighted overlap between the top features of candidate fillers and role
prototypes. Our experiments show that our method consistently outperforms a
baseline re-implementing a state-of-the-art system, and achieves better or
comparable results to those reported in the literature for the other
unsupervised systems. Moreover, it provides an explicit representation of the
features characterizing verb-specific semantic roles.Comment: 9 pages, 2 figures, 5 tables, EMNLP, 2017, thematic fit, selectional
preference, semantic role, DSMs, Distributional Semantic Models, Vector Space
Models, VSMs, cosine, APSyn, similarity, prototyp
Is Structure Necessary for Modeling Argument Expectations in Distributional Semantics?
Despite the number of NLP studies dedicated to thematic fit estimation,
little attention has been paid to the related task of composing and updating
verb argument expectations. The few exceptions have mostly modeled this
phenomenon with structured distributional models, implicitly assuming a
similarly structured representation of events. Recent experimental evidence,
however, suggests that human processing system could also exploit an
unstructured "bag-of-arguments" type of event representation to predict
upcoming input. In this paper, we re-implement a traditional structured model
and adapt it to compare the different hypotheses concerning the degree of
structure in our event knowledge, evaluating their relative performance in the
task of the argument expectations update.Comment: conference paper, IWC
Distributional Semantics Today Introduction to the special issue
International audienceThis introduction to the special issue of the TAL journal on distributional semantics provides an overview of the current topics of this field and gives a brief summary of the contributions. RÉSUMÉ. Cette introduction au numéro spécial de la revue TAL consacré à la sémantique dis-tributionnelle propose un panorama des thèmes de recherche actuels dans ce champ et fournit un résumé succinct des contributions acceptées
Unsupervised Measure of Word Similarity: How to Outperform Co-occurrence and Vector Cosine in VSMs
In this paper, we claim that vector cosine, which is generally considered
among the most efficient unsupervised measures for identifying word similarity
in Vector Space Models, can be outperformed by an unsupervised measure that
calculates the extent of the intersection among the most mutually dependent
contexts of the target words. To prove it, we describe and evaluate APSyn, a
variant of the Average Precision that, without any optimization, outperforms
the vector cosine and the co-occurrence on the standard ESL test set, with an
improvement ranging between +9.00% and +17.98%, depending on the number of
chosen top contexts.Comment: in AAAI 2016. arXiv admin note: substantial text overlap with
arXiv:1603.0870
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
In this paper, we claim that Vector Cosine, which is generally considered one
of the most efficient unsupervised measures for identifying word similarity in
Vector Space Models, can be outperformed by a completely unsupervised measure
that evaluates the extent of the intersection among the most associated
contexts of two target words, weighting such intersection according to the rank
of the shared contexts in the dependency ranked lists. This claim comes from
the hypothesis that similar words do not simply occur in similar contexts, but
they share a larger portion of their most relevant contexts compared to other
related words. To prove it, we describe and evaluate APSyn, a variant of
Average Precision that, independently of the adopted parameters, outperforms
the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the
best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy
in the TOEFL dataset, beating therefore the non-English US college applicants
(whose average, as reported in the literature, is 64.50%) and several
state-of-the-art approaches.Comment: in LREC 201
Evalita\u2709 Parsing Task: comparing dependency parsers and treebanks
The aim of Evalita Parsing Task is at defining and extending Italian state of the art parsing by encouraging the application of existing models and approaches. As in the Evalita\u2707, the Task is organized around two tracks, i.e. Dependency Parsing and Constituency Parsing. As a main novelty with respect to the previous edition, the Dependency Parsing track has been articulated into two subtasks, differing at the level of the used treebanks, thus creating the prerequisites for assessing the impact of different annotation schemes on the parsers performance. In this paper, we describe the Dependency Parsing track by presenting the data sets for development and testing, reporting the test results and providing a first comparative analysis of these results, also with respect to state of the art parsing technologies
Unsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora
In this paper, we reported experiments of unsupervised automatic acquisition of Italian and English verb subcategorization frames (SCFs) from general and domain corpora. The proposed technique operates on syntactically shallow-parsed corpora on the basis of a limited number of search heuristics not relying on any previous lexico-syntactic knowledge about SCFs. Although preliminary, reported results are in line with state-of-the-art lexical acquisition systems. The issue of whether verbs sharing similar SCFs distributions happen to share similar semantic properties as well was also explored by clustering verbs that share frames with the same distribution using the Minimum Description Length Principle (MDL). First experiments in this direction were carried out on Italian verbs with encouraging results
Ontology learning from Italian legal texts
The paper reports on the methodology and preliminary results of a case study in automatically extracting ontological knowledge from Italian legislative texts. We use a fully-implemented ontology learning system (T2K) that includes a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine language learning. Tools are dynamically integrated to provide an incremental representation of the content of vast repositories of unstructured documents. Evaluated results, however preliminary, show the great potential of NLP-powered incremental systems like T2K for accurate large-scale semi-automatic extraction of legal ontologies
Acquiring Legal Ontologies from Domain-specific Texts
The paper reports on methodology and preliminary results of a case study in automatically extracting ontological knowledge from Italian legislative texts in the environmental domain. We use a fully-implemented ontology learning system (T2K) that includes a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine language learning. Tools are dynamically integrated to provide an incremental representation of the content of vast repositories of unstructured documents. Evaluated results, however preliminary, are very encouraging, showing the great potential of NLP-powered incremental systems like T2K for accurate large-scale semi?automatic extraction of legal ontologies
Dal testo alla conoscenza e ritorno: estrazione terminologica e annotazione semantica di basi documentali di dominio.
The paper focuses on the automatic extraction of domain knowledge from Italian legal texts and presents a fully-implemented ontology learning system (T2K, Text-2-Knowledge) that includes a battery of tools for Natural Language Processing, statistical text analysis and machine learning. Evaluated results show the considerable potential of systems like T2K, exploiting an incremental interleaving of NLP and machine learning techniques for accurate large-scale semi-automatic extraction and structuring of domain-specific knowledge
- …
