432 research outputs found
Multilingual Word Sense Induction to Improve Web Search Result Clustering
In [12] a novel approach to Web search result clustering based on Word Sense Induction, i.e. the automatic discovery of word senses from raw text was presented; key to the proposed approach is the idea of, first, automatically in- ducing senses for the target query and, second, clustering the search results based on their semantic similarity to the word senses induced. In [1] we proposed an innovative Word Sense Induction method based on multilingual data; key to our approach was the idea that a multilingual context representation, where the context of the words is expanded by considering its translations in different languages, may im- prove the WSI results; the experiments showed a clear per- formance gain. In this paper we give some preliminary ideas to exploit our multilingual Word Sense Induction method to Web search result clustering
Semantic Fuzzing with Zest
Programs expecting structured inputs often consist of both a syntactic
analysis stage, which parses raw input, and a semantic analysis stage, which
conducts checks on the parsed input and executes the core logic of the program.
Generator-based testing tools in the lineage of QuickCheck are a promising way
to generate random syntactically valid test inputs for these programs. We
present Zest, a technique which automatically guides QuickCheck-like
randominput generators to better explore the semantic analysis stage of test
programs. Zest converts random-input generators into deterministic parametric
generators. We present the key insight that mutations in the untyped parameter
domain map to structural mutations in the input domain. Zest leverages program
feedback in the form of code coverage and input validity to perform
feedback-directed parameter search. We evaluate Zest against AFL and QuickCheck
on five Java programs: Maven, Ant, BCEL, Closure, and Rhino. Zest covers
1.03x-2.81x as many branches within the benchmarks semantic analysis stages as
baseline techniques. Further, we find 10 new bugs in the semantic analysis
stages of these benchmarks. Zest is the most effective technique in finding
these bugs reliably and quickly, requiring at most 10 minutes on average to
find each bug.Comment: To appear in Proceedings of 28th ACM SIGSOFT International Symposium
on Software Testing and Analysis (ISSTA'19
A linked data approach for linking and aligning sign language and spoken language data
We present work dealing with a Linked
Open Data (LOD)-compliant representation of Sign Language (SL) data, with
the goal of supporting the cross-lingual
alignment of SL data and their linking to
Spoken Language (SpL) data. The proposed representation is based on activities
of groups of researchers in the field of
SL who have investigated the use of Open
Multilingual Wordnet (OMW) datasets for
(manually) cross-linking SL data or for
linking SL and SpL data. Another group
of researchers is proposing an XML encoding of articulatory elements of SLs and
(manually) linking those to an SpL lexical
resource. We propose an RDF-based representation of those various kinds of data.
This unified formal representation offers
a semantic repository of information on
SL and SpL data that could be accessed
for supporting the creation of datasets for
training or evaluating NLP applications
dealing with SLs, thinking for example of
Machine Translation (MT) between SLs
and between SLs and SpLs.peer-reviewe
- …
