306 research outputs found
New Embedded Representations and Evaluation Protocols for Inferring Transitive Relations
Beyond word embeddings, continuous representations of knowledge graph (KG)
components, such as entities, types and relations, are widely used for entity
mention disambiguation, relation inference and deep question answering. Great
strides have been made in modeling general, asymmetric or antisymmetric KG
relations using Gaussian, holographic, and complex embeddings. None of these
directly enforce transitivity inherent in the is-instance-of and is-subtype-of
relations. A recent proposal, called order embedding (OE), demands that the
vector representing a subtype elementwise dominates the vector representing a
supertype. However, the manner in which such constraints are asserted and
evaluated have some limitations. In this short research note, we make three
contributions specific to representing and inferring transitive relations.
First, we propose and justify a significant improvement to the OE loss
objective. Second, we propose a new representation of types as
hyper-rectangular regions, that generalize and improve on OE. Third, we show
that some current protocols to evaluate transitive relation inference can be
misleading, and offer a sound alternative. Rather than use black-box deep
learning modules off-the-shelf, we develop our training networks using
elementary geometric considerations.Comment: Accepted at SIGIR 201
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
A lot of the recent success in natural language processing (NLP) has been
driven by distributed vector representations of words trained on large amounts
of text in an unsupervised manner. These representations are typically used as
general purpose features for words across a range of NLP problems. However,
extending this success to learning representations of sequences of words, such
as sentences, remains an open problem. Recent work has explored unsupervised as
well as supervised learning techniques with different training objectives to
learn general purpose fixed-length sentence representations. In this work, we
present a simple, effective multi-task learning framework for sentence
representations that combines the inductive biases of diverse training
objectives in a single model. We train this model on several data sources with
multiple training objectives on over 100 million sentences. Extensive
experiments demonstrate that sharing a single recurrent sentence encoder across
weakly related tasks leads to consistent improvements over previous methods. We
present substantial improvements in the context of transfer learning and
low-resource settings using our learned general-purpose representations.Comment: Accepted at ICLR 201
Adversarial Generation of Natural Language
Generative Adversarial Networks (GANs) have gathered a lot of attention from
the computer vision community, yielding impressive results for image
generation. Advances in the adversarial generation of natural language from
noise however are not commensurate with the progress made in generating images,
and still lag far behind likelihood based methods. In this paper, we take a
step towards generating natural language with a GAN objective alone. We
introduce a simple baseline that addresses the discrete output space problem
without relying on gradient estimators and show that it is able to achieve
state-of-the-art results on a Chinese poem generation dataset. We present
quantitative results on generating sentences from context-free and
probabilistic context-free grammars, and qualitative language modeling results.
A conditional version is also described that can generate sequences conditioned
on sentence characteristics.Comment: 11 pages, 3 figures, 5 table
Deep Complex Networks
At present, the vast majority of building blocks, techniques, and
architectures for deep learning are based on real-valued operations and
representations. However, recent work on recurrent neural networks and older
fundamental theoretical analysis suggests that complex numbers could have a
richer representational capacity and could also facilitate noise-robust memory
retrieval mechanisms. Despite their attractive properties and potential for
opening up entirely new neural architectures, complex-valued deep neural
networks have been marginalized due to the absence of the building blocks
required to design such models. In this work, we provide the key atomic
components for complex-valued deep neural networks and apply them to
convolutional feed-forward networks and convolutional LSTMs. More precisely, we
rely on complex convolutions and present algorithms for complex
batch-normalization, complex weight initialization strategies for
complex-valued neural nets and we use them in experiments with end-to-end
training schemes. We demonstrate that such complex-valued models are
competitive with their real-valued counterparts. We test deep complex models on
several computer vision tasks, on music transcription using the MusicNet
dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve
state-of-the-art performance on these audio-related tasks
- …
