193 research outputs found
Tied factor analysis for face recognition across large pose differences
Face recognition algorithms perform very unreliably when the pose of the probe face is different from the gallery face: typical feature vectors vary more with pose than with identity. We propose a generative model that creates a one-to-many mapping from an idealized “identity” space to the observed data space. In identity space, the representation for each individual does not vary with pose. We model the measured feature vector as being generated by a pose-contingent linear transformation of the identity variable in the presence of Gaussian noise. We term this model “tied” factor analysis. The choice of linear transformation (factors) depends on the pose, but the loadings are constant (tied) for a given individual. We use the EM algorithm to estimate the linear transformations and the noise parameters from training data.
We propose a probabilistic distance metric that allows a full posterior over possible matches to be established. We introduce a novel feature extraction process and investigate recognition performance by using the FERET, XM2VTS, and PIE databases. Recognition performance compares favorably with contemporary approaches
ImageSpirit: Verbal Guided Image Parsing
Humans describe images in terms of nouns and adjectives while algorithms
operate on images represented as sets of pixels. Bridging this gap between how
humans would like to access images versus their typical representation is the
goal of image parsing, which involves assigning object and attribute labels to
pixel. In this paper we propose treating nouns as object labels and adjectives
as visual attribute labels. This allows us to formulate the image parsing
problem as one of jointly estimating per-pixel object and attribute labels from
a set of training images. We propose an efficient (interactive time) solution.
Using the extracted labels as handles, our system empowers a user to verbally
refine the results. This enables hands-free parsing of an image into pixel-wise
object/attribute labels that correspond to human semantics. Verbally selecting
objects of interests enables a novel and natural interaction modality that can
possibly be used to interact with new generation devices (e.g. smart phones,
Google Glass, living room devices). We demonstrate our system on a large number
of real-world images with varying complexity. To help understand the tradeoffs
compared to traditional mouse based interactions, results are reported for both
a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit
Filter-based mean-field inference for random fields with higher-order terms and product label-spaces
Recently, a number of cross bilateral filtering methods have been proposed for solving multi-label problems in computer vision, such as stereo, optical flow and object class segmentation that show an order of magnitude improvement in speed over previous methods. These methods have achieved good results despite using models with only unary and/or pairwise terms. However, previous work has shown the value of using models with higher-order terms e.g. to represent label consistency over large regions, or global co-occurrence relations. We show how these higher-order terms can be formulated such that filter-based inference remains possible. We demonstrate our techniques on joint stereo and object labelling problems, as well as object class segmentation, showing in addition for joint object-stereo labelling how our method provides an efficient approach to inference in product label-spaces. We show that we are able to speed up inference in these models around 10–30 times with respect to competing graph-cut/move-making methods, as well as maintaining or improving accuracy in all cases. We show results on PascalVOC-10 for object class segmentation, and Leuven for joint object-stereo labelling
Proposal generation for object detection using cascaded ranking SVMs
Object recognition has made great strides recently. However, the best methods, such as those based on kernel-SVMs are highly computationally intensive. The problem of how to accelerate the evaluation process without decreasing accuracy is thus of current interest. In this paper, we deal with this problem by using the idea of ranking. We propose a cascaded architecture which using the ranking SVM generates an ordered set of proposals for windows containing object instances. The top ranking windows may then be fed to a more complex detector. Our experiments demonstrate that our approach is robust, achieving higher overlap-recall values using fewer output proposals than the state-of-the-art. Our use of simple gradient features and linear convolution indicates that our method is also faster than the state-of-the-art
Improved initialization and Gaussian mixture pairwise terms for dense random fields with mean-field inference
Recently, Krahenbuhl and Koltun proposed an efficient inference method for densely connected pairwise random fields using the mean-field approximation for a Conditional Random Field (CRF). However, they restrict their pairwise weights to take the form of a weighted combination of Gaussian kernels where each Gaussian component is allowed to take only zero mean, and can only be rescaled by a single value for each label pair. Further, their method is sensitive to initialization. In this paper, we propose methods to alleviate these issues. First, we propose a hierarchical mean-field approach where labelling from the coarser level is propagated to the finer level for better initialisation. Further, we use SIFT-flow based label transfer to provide a good initial condition at the coarsest level. Second, we allow our approach to take general Gaussian pairwise weights, where we learn the mean, the co-variance matrix, and the mixing co-efficient for every mixture component. We propose a variation of Expectation Maximization (EM) for piecewise learning of the parameters of the mixture model determined by the maximum likelihood function. Finally, we demonstrate the efficiency and accuracy offered by our method for object class segmentation problems on two challenging datasets: PascalVOC-10 segmentation and CamVid datasets. We show that we are able to achieve state of the art performance on the CamVid dataset, and an almost 3% improvement on the PascalVOC- 10 dataset compared to baseline graph-cut and mean-field methods, while also reducing the inference time by almost a factor of 3 compared to graph-cuts based methods
- …
