510 research outputs found
Location Prediction of Social Images via Generative Model
The vast amount of geo-tagged social images has attracted great attention in
research of predicting location using the plentiful content of images, such as
visual content and textual description. Most of the existing researches use the
text-based or vision-based method to predict location. There still exists a
problem: how to effectively exploit the correlation between different types of
content as well as their geographical distributions for location prediction. In
this paper, we propose to predict image location by learning the latent
relation between geographical location and multiple types of image content. In
particularly, we propose a geographical topic model GTMI (geographical topic
model of social image) to integrate multiple types of image content as well as
the geographical distributions, In GTMI, image topic is modeled on both text
vocabulary and visual feature. Each region has its own distribution over topics
and hence has its own language model and vision pattern. The location of a new
image is estimated based on the joint probability of image content and
similarity measure on topic distribution between images. Experiment results
demonstrate the performance of location prediction based on GTMI.Comment: 8 page
Learning Social Image Embedding with Deep Multimodal Attention Networks
Learning social media data embedding by deep models has attracted extensive
research interest as well as boomed a lot of applications, such as link
prediction, classification, and cross-modal search. However, for social images
which contain both link information and multimodal contents (e.g., text
description, and visual content), simply employing the embedding learnt from
network structure or data content results in sub-optimal social image
representation. In this paper, we propose a novel social image embedding
approach called Deep Multimodal Attention Networks (DMAN), which employs a deep
model to jointly embed multimodal contents and link information. Specifically,
to effectively capture the correlations between multimodal contents, we propose
a multimodal attention network to encode the fine-granularity relation between
image regions and textual words. To leverage the network structure for
embedding learning, a novel Siamese-Triplet neural network is proposed to model
the links among images. With the joint deep model, the learnt embedding can
capture both the multimodal contents and the nonlinear network information.
Extensive experiments are conducted to investigate the effectiveness of our
approach in the applications of multi-label classification and cross-modal
search. Compared to state-of-the-art image embeddings, our proposed DMAN
achieves significant improvement in the tasks of multi-label classification and
cross-modal search
- …
