23,951 research outputs found
GSLAM: Initialization-robust Monocular Visual SLAM via Global Structure-from-Motion
Many monocular visual SLAM algorithms are derived from incremental
structure-from-motion (SfM) methods. This work proposes a novel monocular SLAM
method which integrates recent advances made in global SfM. In particular, we
present two main contributions to visual SLAM. First, we solve the visual
odometry problem by a novel rank-1 matrix factorization technique which is more
robust to the errors in map initialization. Second, we adopt a recent global
SfM method for the pose-graph optimization, which leads to a multi-stage linear
formulation and enables L1 optimization for better robustness to false loops.
The combination of these two approaches generates more robust reconstruction
and is significantly faster (4X) than recent state-of-the-art SLAM systems. We
also present a new dataset recorded with ground truth camera motion in a Vicon
motion capture room, and compare our method to prior systems on it and
established benchmark datasets.Comment: 3DV 2017 Project Page: https://frobelbest.github.io/gsla
Deep Learning Face Attributes in the Wild
Predicting face attributes in the wild is challenging due to complex face
variations. We propose a novel deep learning framework for attribute prediction
in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly
with attribute tags, but pre-trained differently. LNet is pre-trained by
massive general object categories for face localization, while ANet is
pre-trained by massive face identities for attribute prediction. This framework
not only outperforms the state-of-the-art with a large margin, but also reveals
valuable facts on learning face representation.
(1) It shows how the performances of face localization (LNet) and attribute
prediction (ANet) can be improved by different pre-training strategies.
(2) It reveals that although the filters of LNet are fine-tuned only with
image-level attribute tags, their response maps over entire images have strong
indication of face locations. This fact enables training LNet for face
localization with only image-level annotations, but without face bounding boxes
or landmarks, which are required by all attribute recognition works.
(3) It also demonstrates that the high-level hidden neurons of ANet
automatically discover semantic concepts after pre-training with massive face
identities, and such concepts are significantly enriched after fine-tuning with
attribute tags. Each attribute can be well explained with a sparse linear
combination of these concepts.Comment: To appear in International Conference on Computer Vision (ICCV) 201
Linear Global Translation Estimation with Feature Tracks
This paper derives a novel linear position constraint for cameras seeing a
common scene point, which leads to a direct linear method for global camera
translation estimation. Unlike previous solutions, this method deals with
collinear camera motion and weak image association at the same time. The final
linear formulation does not involve the coordinates of scene points, which
makes it efficient even for large scale data. We solve the linear equation
based on norm, which makes our system more robust to outliers in
essential matrices and feature correspondences. We experiment this method on
both sequentially captured images and unordered Internet images. The
experiments demonstrate its strength in robustness, accuracy, and efficiency.Comment: Changes: 1. Adopt BMVC2015 style; 2. Combine sections 3 and 5; 3.
Move "Evaluation on synthetic data" out to supplementary file; 4. Divide
subsection "Evaluation on general data" to subsections "Experiment on
sequential data" and "Experiment on unordered Internet data"; 5. Change Fig.
1 and Fig.8; 6. Move Fig. 6 and Fig. 7 to supplementary file; 7 Change some
symbols; 8. Correct some typo
- …
