2,512 research outputs found
Semantic Part Segmentation using Compositional Model combining Shape and Appearance
In this paper, we study the problem of semantic part segmentation for
animals. This is more challenging than standard object detection, object
segmentation and pose estimation tasks because semantic parts of animals often
have similar appearance and highly varying shapes. To tackle these challenges,
we build a mixture of compositional models to represent the object boundary and
the boundaries of semantic parts. And we incorporate edge, appearance, and
semantic part cues into the compositional model. Given part-level segmentation
annotation, we develop a novel algorithm to learn a mixture of compositional
models under various poses and viewpoints for certain animal classes.
Furthermore, a linear complexity algorithm is offered for efficient inference
of the compositional model using dynamic programming. We evaluate our method
for horse and cow using a newly annotated dataset on Pascal VOC 2010 which has
pixelwise part labels. Experimental results demonstrate the effectiveness of
our method
Genetic CNN
The deep Convolutional Neural Network (CNN) is the state-of-the-art solution
for large-scale visual recognition. Following basic principles such as
increasing the depth and constructing highway connections, researchers have
manually designed a lot of fixed network structures and verified their
effectiveness.
In this paper, we discuss the possibility of learning deep network structures
automatically. Note that the number of possible network structures increases
exponentially with the number of layers in the network, which inspires us to
adopt the genetic algorithm to efficiently traverse this large search space. We
first propose an encoding method to represent each network structure in a
fixed-length binary string, and initialize the genetic algorithm by generating
a set of randomized individuals. In each generation, we define standard genetic
operations, e.g., selection, mutation and crossover, to eliminate weak
individuals and then generate more competitive ones. The competitiveness of
each individual is defined as its recognition accuracy, which is obtained via
training the network from scratch and evaluating it on a validation set. We run
the genetic process on two small datasets, i.e., MNIST and CIFAR10,
demonstrating its ability to evolve and find high-quality structures which are
little studied before. These structures are also transferrable to the
large-scale ILSVRC2012 dataset.Comment: Submitted to CVPR 2017 (10 pages, 5 figures
Parsing Occluded People by Flexible Compositions
This paper presents an approach to parsing humans when there is significant
occlusion. We model humans using a graphical model which has a tree structure
building on recent work [32, 6] and exploit the connectivity prior that, even
in presence of occlusion, the visible nodes form a connected subtree of the
graphical model. We call each connected subtree a flexible composition of
object parts. This involves a novel method for learning occlusion cues. During
inference we need to search over a mixture of different flexible models. By
exploiting part sharing, we show that this inference can be done extremely
efficiently requiring only twice as many computations as searching for the
entire object (i.e., not modeling occlusion). We evaluate our model on the
standard benchmarked "We Are Family" Stickmen dataset and obtain significant
performance improvements over the best alternative algorithms.Comment: CVPR 15 Camera Read
Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations
We present a method for estimating articulated human pose from a single
static image based on a graphical model with novel pairwise relations that make
adaptive use of local image measurements. More precisely, we specify a
graphical model for human pose which exploits the fact the local image
measurements can be used both to detect parts (or joints) and also to predict
the spatial relationships between them (Image Dependent Pairwise Relations).
These spatial relationships are represented by a mixture model. We use Deep
Convolutional Neural Networks (DCNNs) to learn conditional probabilities for
the presence of parts and their spatial relationships within image patches.
Hence our model combines the representational flexibility of graphical models
with the efficiency and statistical power of DCNNs. Our method significantly
outperforms the state of the art methods on the LSP and FLIC datasets and also
performs very well on the Buffy dataset without any training.Comment: NIPS 2014 Camera Read
Statistical physics, mixtures of distributions, and the EM algorithm
We show that there are strong relationships between approaches to optmization and learning based on statistical physics or mixtures of experts. In particular, the EM algorithm can be interpreted as converging either to a local maximum of the mixtures model or to a saddle point solution to the statistical physics system. An advantage of the statistical physics approach is that it naturally gives rise to a heuristic continuation method, deterministic annealing, for finding good solutions
Few-Shot Image Recognition by Predicting Parameters from Activations
In this paper, we are interested in the few-shot learning problem. In
particular, we focus on a challenging scenario where the number of categories
is large and the number of examples per novel category is very limited, e.g. 1,
2, or 3. Motivated by the close relationship between the parameters and the
activations in a neural network associated with the same category, we propose a
novel method that can adapt a pre-trained neural network to novel categories by
directly predicting the parameters from the activations. Zero training is
required in adaptation to novel categories, and fast inference is realized by a
single forward pass. We evaluate our method by doing few-shot image recognition
on the ImageNet dataset, which achieves the state-of-the-art classification
accuracy on novel categories by a significant margin while keeping comparable
performance on the large-scale categories. We also test our method on the
MiniImageNet dataset and it strongly outperforms the previous state-of-the-art
methods
- …
