508 research outputs found
Learning a Recurrent Visual Representation for Image Caption Generation
In this paper we explore the bi-directional mapping between images and their
sentence-based descriptions. We propose learning this mapping using a recurrent
neural network. Unlike previous approaches that map both sentences and images
to a common embedding, we enable the generation of novel sentences given an
image. Using the same model, we can also reconstruct the visual features
associated with an image given its visual description. We use a novel recurrent
visual memory that automatically learns to remember long-term visual concepts
to aid in both sentence generation and visual feature reconstruction. We
evaluate our approach on several tasks. These include sentence generation,
sentence retrieval and image retrieval. State-of-the-art results are shown for
the task of generating novel image descriptions. When compared to human
generated captions, our automatically generated captions are preferred by
humans over of the time. Results are better than or comparable to
state-of-the-art results on the image and sentence retrieval tasks for methods
using similar visual features
Improving Small Object Proposals for Company Logo Detection
Many modern approaches for object detection are two-staged pipelines. The
first stage identifies regions of interest which are then classified in the
second stage. Faster R-CNN is such an approach for object detection which
combines both stages into a single pipeline. In this paper we apply Faster
R-CNN to the task of company logo detection. Motivated by its weak performance
on small object instances, we examine in detail both the proposal and the
classification stage with respect to a wide range of object sizes. We
investigate the influence of feature map resolution on the performance of those
stages.
Based on theoretical considerations, we introduce an improved scheme for
generating anchor proposals and propose a modification to Faster R-CNN which
leverages higher-resolution feature maps for small objects. We evaluate our
approach on the FlickrLogos dataset improving the RPN performance from 0.52 to
0.71 (MABO) and the detection performance from 0.52 to 0.67 (mAP).Comment: 8 Pages, ICMR 201
- …
