Search CORE

395 research outputs found

Visually Grounded Meaning Representations

Author: Ferrari Vittorio
Lapata Mirella
Silberer Carina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new model which uses stacked autoencoders to learn higher-level representations from textual and visual input. The visual modality is encoded via vectors of attributes obtained automatically from images. We create a new large-scale taxonomy of 600 visual attributes representing more than 500 concepts and 700K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We evaluate our model on its ability to simulate word similarity judgments and concept categorization. On both tasks, our model yields a better fit to behavioral data compared to baselines and related models which either rely on a single modality or do not make use of attribute-based input

Crossref

Edinburgh Research Explorer

UPF Digital Repository

Grounded Models of Semantic Representation

Author: Lapata Mirella
Silberer Carina
Publication venue
Publication date: 01/01/2012
Field of study

A popular tradition of studying semantic representation has been driven by the assumption that word meaning can be learned from the linguistic environment, despite ample evidence suggesting that language is grounded in perception and action. In this paper we present a comparative study of models that represent word meaning based on linguistic and perceptual data. Linguistic information is approximated by naturally occurring corpora and sensorimotor experience by feature norms (i.e., attributes native speakers consider important in describing the meaning of a word). The models differ in terms of the mechanisms by which they integrate the two modalities. Experimental results show that a closer correspondence to human data can be obtained by uncovering latent information shared among the textual and perceptual modalities rather than arriving at semantic knowledge by concatenating the two

Edinburgh Research Explorer

Eine venezianischer Filmregisseur

Author: Silberer Rose
Publication venue: BYU ScholarsArchive
Publication date: 23/12/1924
Field of study

BYU ScholarsArchive (Brigham Young University)

On the Complementarity of Images and Text for the Expression of Emotions in Social Media

Author: Khlyzova Anna
Klinger Roman
Silberer Carina
Publication venue
Publication date: 01/01/2022
Field of study

Authors of posts in social media communicate their emotions and what causes them with text and images. While there is work on emotion and stimulus detection for each modality separately, it is yet unknown if the modalities contain complementary emotion information in social media. We aim at filling this research gap and contribute a novel, annotated corpus of English multimodal Reddit posts. On this resource, we develop models to automatically detect the relation between image and text, an emotion stimulus category and the emotion class. We evaluate if these tasks require both modalities and find for the image-text relations, that text alone is sufficient for most categories (complementary, illustrative, opposing): the information in the text allows to predict if an image is required for emotion understanding. The emotions of anger and sadness are best predicted with a multimodal model, while text alone is sufficient for disgust, joy, and surprise. Stimuli depicted by objects, animals, food, or a person are best predicted by image-only models, while multimodal models are most effective on art, events, memes, places, or screenshots.Comment: accepted for WASSA 2022 at ACL 202

arXiv.org e-Print Archive

Forschungsinformationssystem der Universität Bamberg