220 research outputs found
Byte-based Language Identification with Deep Convolutional Networks
We report on our system for the shared task on discriminating between similar
languages (DSL 2016). The system uses only byte representations in a deep
residual network (ResNet). The system, named ResIdent, is trained only on the
data released with the task (closed training). We obtain 84.88% accuracy on
subtask A, 68.80% accuracy on subtask B1, and 69.80% accuracy on subtask B2. A
large difference in accuracy on development data can be observed with
relatively minor changes in our network's architecture and hyperparameters. We
therefore expect fine-tuning of these parameters to yield higher accuracies.Comment: 7 pages. Adapted reviewer comments. arXiv admin note: text overlap
with arXiv:1609.0705
Tracking Typological Traits of Uralic Languages in Distributed Language Representations
Although linguistic typology has a long history, computational approaches
have only recently gained popularity. The use of distributed representations in
computational linguistics has also become increasingly popular. A recent
development is to learn distributed representations of language, such that
typologically similar languages are spatially close to one another. Although
empirical successes have been shown for such language representations, they
have not been subjected to much typological probing. In this paper, we first
look at whether this type of language representations are empirically useful
for model transfer between Uralic languages in deep neural networks. We then
investigate which typological features are encoded in these representations by
attempting to predict features in the World Atlas of Language Structures, at
various stages of fine-tuning of the representations. We focus on Uralic
languages, and find that some typological traits can be automatically inferred
with accuracies well above a strong baseline.Comment: Finnish abstract included in the pape
SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological Inflection with Attentional Sequence-to-Sequence Models
This paper describes the Stockholm University/University of Groningen
(SU-RUG) system for the SIGMORPHON 2017 shared task on morphological
inflection. Our system is based on an attentional sequence-to-sequence neural
network model using Long Short-Term Memory (LSTM) cells, with joint training of
morphological inflection and the inverse transformation, i.e. lemmatization and
morphological analysis. Our system outperforms the baseline with a large
margin, and our submission ranks as the 4th best team for the track we
participate in (task 1, high-resource).Comment: 4 pages, to appear at CoNLL-SIGMORPHON 201
Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for
In earlier work, we have shown that articulation rate in Swedish
child-directed speech (CDS) increases as a function of the age of the child,
even when utterance length and differences in articulation rate between
subjects are controlled for. In this paper we show on utterance level in
spontaneous Swedish speech that i) for the youngest children, articulation rate
in CDS is lower than in adult-directed speech (ADS), ii) there is a significant
negative correlation between articulation rate and surprisal (the negative log
probability) in ADS, and iii) the increase in articulation rate in Swedish CDS
as a function of the age of the child holds, even when surprisal along with
utterance length and differences in articulation rate between speakers are
controlled for. These results indicate that adults adjust their articulation
rate to make it fit the linguistic capacity of the child.Comment: 5 pages, Interspeech 201
Zero-Shot Cross-Lingual Transfer with Meta Learning
Learning what to share between tasks has been a topic of great importance
recently, as strategic sharing of knowledge has been shown to improve
downstream task performance. This is particularly important for multilingual
applications, as most languages in the world are under-resourced. Here, we
consider the setting of training models on multiple different languages at the
same time, when little or no data is available for languages other than
English. We show that this challenging setup can be approached using
meta-learning, where, in addition to training a source language model, another
model learns to select which training instances are the most beneficial to the
first. We experiment using standard supervised, zero-shot cross-lingual, as
well as few-shot cross-lingual settings for different natural language
understanding tasks (natural language inference, question answering). Our
extensive experimental setup demonstrates the consistent effectiveness of
meta-learning for a total of 15 languages. We improve upon the state-of-the-art
for zero-shot and few-shot NLI (on MultiNLI and XNLI) and QA (on the MLQA
dataset). A comprehensive error analysis indicates that the correlation of
typological features between languages can partly explain when parameter
sharing learned via meta-learning is beneficial.Comment: Accepted as long paper in EMNLP2020 main conferenc
Parameter sharing between dependency parsers for related languages
Previous work has suggested that parameter sharing between transition-based
neural dependency parsers for related languages can lead to better performance,
but there is no consensus on what parameters to share. We present an evaluation
of 27 different parameter sharing strategies across 10 languages, representing
five pairs of related languages, each pair from a different language family. We
find that sharing transition classifier parameters always helps, whereas the
usefulness of sharing word and/or character LSTM parameters varies. Based on
this result, we propose an architecture where the transition classifier is
shared, and the sharing of word and character parameters is controlled by a
parameter that can be tuned on validation data. This model is linguistically
motivated and obtains significant improvements over a monolingually trained
baseline. We also find that sharing transition classifier parameters helps when
training a parser on unrelated language pairs, but we find that, in the case of
unrelated languages, sharing too many parameters does not help.Comment: EMNLP 201
Semantic Tagging with Deep Residual Networks
We propose a novel semantic tagging task, sem-tagging, tailored for the
purpose of multilingual semantic parsing, and present the first tagger using
deep residual networks (ResNets). Our tagger uses both word and character
representations and includes a novel residual bypass architecture. We evaluate
the tagset both intrinsically on the new task of semantic tagging, as well as
on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an
auxiliary loss function predicting our semantic tags, significantly outperforms
prior results on English Universal Dependencies POS tagging (95.71% accuracy on
UD v1.2 and 95.67% accuracy on UD v1.3).Comment: COLING 2016, camera ready versio
The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations
The Parallel Meaning Bank is a corpus of translations annotated with shared,
formal meaning representations comprising over 11 million words divided over
four languages (English, German, Italian, and Dutch). Our approach is based on
cross-lingual projection: automatically produced (and manually corrected)
semantic annotations for English sentences are mapped onto their word-aligned
translations, assuming that the translations are meaning-preserving. The
semantic annotation consists of five main steps: (i) segmentation of the text
in sentences and lexical items; (ii) syntactic parsing with Combinatory
Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and
(v) compositional semantic analysis based on Discourse Representation Theory.
These steps are performed using statistical models trained in a semi-supervised
manner. The employed annotation models are all language-neutral. Our first
results are promising.Comment: To appear at EACL 201
One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis
When learning a new skill, you take advantage of your preexisting skills and
knowledge. For instance, if you are a skilled violinist, you will likely have
an easier time learning to play cello. Similarly, when learning a new language
you take advantage of the languages you already speak. For instance, if your
native language is Norwegian and you decide to learn Dutch, the lexical overlap
between these two languages will likely benefit your rate of language
acquisition. This thesis deals with the intersection of learning multiple tasks
and learning multiple languages in the context of Natural Language Processing
(NLP), which can be defined as the study of computational processing of human
language. Although these two types of learning may seem different on the
surface, we will see that they share many similarities.
The traditional approach in NLP is to consider a single task for a single
language at a time. However, recent advances allow for broadening this
approach, by considering data for multiple tasks and languages simultaneously.
This is an important approach to explore further as the key to improving the
reliability of NLP, especially for low-resource languages, is to take advantage
of all relevant data whenever possible. In doing so, the hope is that in the
long term, low-resource languages can benefit from the advances made in NLP
which are currently to a large extent reserved for high-resource languages.
This, in turn, may then have positive consequences for, e.g., language
preservation, as speakers of minority languages will have a lower degree of
pressure to using high-resource languages. In the short term, answering the
specific research questions posed should be of use to NLP researchers working
towards the same goal.Comment: PhD thesis, University of Groninge
- …
