72 research outputs found
Matching neural paths: transfer from recognition to correspondence search
Many machine learning tasks require finding per-part correspondences between
objects. In this work we focus on low-level correspondences - a highly
ambiguous matching problem. We propose to use a hierarchical semantic
representation of the objects, coming from a convolutional neural network, to
solve this ambiguity. Training it for low-level correspondence prediction
directly might not be an option in some domains where the ground-truth
correspondences are hard to obtain. We show how transfer from recognition can
be used to avoid such training. Our idea is to mark parts as "matching" if
their features are close to each other at all the levels of convolutional
feature hierarchy (neural paths). Although the overall number of such paths is
exponential in the number of layers, we propose a polynomial algorithm for
aggregating all of them in a single backward pass. The empirical validation is
done on the task of stereo correspondence and demonstrates that we achieve
competitive results among the methods which do not use labeled target domain
data.Comment: Accepted at NIPS 201
Joint Learning of Intrinsic Images and Semantic Segmentation
Semantic segmentation of outdoor scenes is problematic when there are
variations in imaging conditions. It is known that albedo (reflectance) is
invariant to all kinds of illumination effects. Thus, using reflectance images
for semantic segmentation task can be favorable. Additionally, not only
segmentation may benefit from reflectance, but also segmentation may be useful
for reflectance computation. Therefore, in this paper, the tasks of semantic
segmentation and intrinsic image decomposition are considered as a combined
process by exploring their mutual relationship in a joint fashion. To that end,
we propose a supervised end-to-end CNN architecture to jointly learn intrinsic
image decomposition and semantic segmentation. We analyze the gains of
addressing those two problems jointly. Moreover, new cascade CNN architectures
for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as
single tasks. Furthermore, a dataset of 35K synthetic images of natural
environments is created with corresponding albedo and shading (intrinsics), as
well as semantic labels (segmentation) assigned to each object/scene. The
experiments show that joint learning of intrinsic image decomposition and
semantic segmentation is beneficial for both tasks for natural scenes. Dataset
and models are available at: https://ivi.fnwi.uva.nl/cv/intrinsegComment: ECCV 201
Combining appearance and structure from motion features for road scene understanding
In this paper we present a framework for pixel-wise object segmentation of road scenes that combines motion and appearance features. It is designed to handle street-level imagery such as that on Google Street View and Microsoft Bing Maps. We formulate the problem in a CRF framework in order to probabilistically model the label likelihoods and the a priori knowledge. An extended set of appearance-based features is used, which consists of textons, colour, location and HOG descriptors. A novel boosting approach is then applied to combine the motion and appearance-based features. We also incorporate higher order potentials in our CRF model, which produce segmentations with precise object boundaries. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video dataset. Our approach shows an overall recognition accuracy of 84% compared to the state-of-the-art accuracy of 69%
Graph cut based inference with co-occurrence statistics
Markov and Conditional random fields (crfs) used in computer vision typically model only local interactions between variables, as this is computationally tractable. In this paper we consider a class of global potentials defined over all variables in the crf. We show how they can be readily optimised using standard graph cut algorithms at little extra expense compared to a standard pairwise field. This result can be directly used for the problem of class based image segmentation which has seen increasing recent interest within computer vision. Here the aim is to assign a label to each pixel of a given image from a set of possible object classes. Typically these methods use random fields to model local interactions between pixels or super-pixels. One of the cues that helps recognition is global object co-occurrence statistics, a measure of which classes (such as chair or motorbike) are likely to occur in the same image together. There have been several approaches proposed to exploit this property, but all of them suffer from different limitations and typically carry a high computational cost, preventing their application on large images. We find that the new model we propose produces an improvement in the labelling compared to just using a pairwise model
NODIS: Neural Ordinary Differential Scene Understanding
Semantic image understanding is a challenging topic in computer vision. It
requires to detect all objects in an image, but also to identify all the
relations between them. Detected objects, their labels and the discovered
relations can be used to construct a scene graph which provides an abstract
semantic interpretation of an image. In previous works, relations were
identified by solving an assignment problem formulated as Mixed-Integer Linear
Programs. In this work, we interpret that formulation as Ordinary Differential
Equation (ODE). The proposed architecture performs scene graph inference by
solving a neural variant of an ODE by end-to-end learning. It achieves
state-of-the-art results on all three benchmark tasks: scene graph generation
(SGGen), classification (SGCls) and visual relationship detection (PredCls) on
Visual Genome benchmark
Squeeze-SegNet: A new fast Deep Convolutional Neural Network for Semantic Segmentation
The recent researches in Deep Convolutional Neural Network have focused their
attention on improving accuracy that provide significant advances. However, if
they were limited to classification tasks, nowadays with contributions from
Scientific Communities who are embarking in this field, they have become very
useful in higher level tasks such as object detection and pixel-wise semantic
segmentation. Thus, brilliant ideas in the field of semantic segmentation with
deep learning have completed the state of the art of accuracy, however this
architectures become very difficult to apply in embedded systems as is the case
for autonomous driving. We present a new Deep fully Convolutional Neural
Network for pixel-wise semantic segmentation which we call Squeeze-SegNet. The
architecture is based on Encoder-Decoder style. We use a SqueezeNet-like
encoder and a decoder formed by our proposed squeeze-decoder module and
upsample layer using downsample indices like in SegNet and we add a
deconvolution layer to provide final multi-channel feature map. On datasets
like Camvid or City-states, our net gets SegNet-level accuracy with less than
10 times fewer parameters than SegNet.Comment: The 10th International Conference on Machine Vision (ICMV 2017).
arXiv admin note: text overlap with arXiv:1704.06857 by other author
SEMANTIC3D.NET: A NEW LARGE-SCALE POINT CLOUD CLASSIFICATION BENCHMARK
This paper presents a new 3D point cloud classification benchmark data set
with over four billion manually labelled points, meant as input for data-hungry
(deep) learning methods. We also discuss first submissions to the benchmark
that use deep convolutional neural networks (CNNs) as a work horse, which
already show remarkable performance improvements over state-of-the-art. CNNs
have become the de-facto standard for many tasks in computer vision and machine
learning like semantic segmentation or object detection in images, but have no
yet led to a true breakthrough for 3D point cloud labelling tasks due to lack
of training data. With the massive data set presented in this paper, we aim at
closing this data gap to help unleash the full potential of deep learning
methods for 3D labelling tasks. Our semantic3D.net data set consists of dense
point clouds acquired with static terrestrial laser scanners. It contains 8
semantic classes and covers a wide range of urban outdoor scenes: churches,
streets, railroad tracks, squares, villages, soccer fields and castles. We
describe our labelling interface and show that our data set provides more dense
and complete point clouds with much higher overall number of labelled points
compared to those already available to the research community. We further
provide baseline method descriptions and comparison between methods submitted
to our online system. We hope semantic3D.net will pave the way for deep
learning methods in 3D point cloud labelling to learn richer, more general 3D
representations, and first submissions after only a few months indicate that
this might indeed be the case.Comment: Accepted to ISPRS Annals. The benchmark website is available at
http://www.semantic3d.net/ . The baseline code is available at
https://github.com/nsavinov/semantic3dne
A high performance CRF model for clothes parsing
In this paper we tackle the problem of clothing parsing: Our goal is to segment and classify different garments a person is wearing. We frame the problem as the one of inference in a pose-aware Conditional Random Field (CRF) which exploits appearance, figure/ground segmentation, shape and location priors for each garment as well as similarities between segments, and symmetries between different human body parts. We demonstrate the effectiveness of our approach on the Fashionista dataset and show that we can obtain a significant improvement over the state-of-the-art.Peer ReviewedPostprint (published version
- …
