193 research outputs found

    Tied factor analysis for face recognition across large pose differences

    Get PDF
    Face recognition algorithms perform very unreliably when the pose of the probe face is different from the gallery face: typical feature vectors vary more with pose than with identity. We propose a generative model that creates a one-to-many mapping from an idealized “identity” space to the observed data space. In identity space, the representation for each individual does not vary with pose. We model the measured feature vector as being generated by a pose-contingent linear transformation of the identity variable in the presence of Gaussian noise. We term this model “tied” factor analysis. The choice of linear transformation (factors) depends on the pose, but the loadings are constant (tied) for a given individual. We use the EM algorithm to estimate the linear transformations and the noise parameters from training data. We propose a probabilistic distance metric that allows a full posterior over possible matches to be established. We introduce a novel feature extraction process and investigate recognition performance by using the FERET, XM2VTS, and PIE databases. Recognition performance compares favorably with contemporary approaches

    ImageSpirit: Verbal Guided Image Parsing

    Get PDF
    Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the tradeoffs compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit

    Filter-based mean-field inference for random fields with higher-order terms and product label-spaces

    Get PDF
    Recently, a number of cross bilateral filtering methods have been proposed for solving multi-label problems in computer vision, such as stereo, optical flow and object class segmentation that show an order of magnitude improvement in speed over previous methods. These methods have achieved good results despite using models with only unary and/or pairwise terms. However, previous work has shown the value of using models with higher-order terms e.g. to represent label consistency over large regions, or global co-occurrence relations. We show how these higher-order terms can be formulated such that filter-based inference remains possible. We demonstrate our techniques on joint stereo and object labelling problems, as well as object class segmentation, showing in addition for joint object-stereo labelling how our method provides an efficient approach to inference in product label-spaces. We show that we are able to speed up inference in these models around 10–30 times with respect to competing graph-cut/move-making methods, as well as maintaining or improving accuracy in all cases. We show results on PascalVOC-10 for object class segmentation, and Leuven for joint object-stereo labelling

    Proposal generation for object detection using cascaded ranking SVMs

    Get PDF
    Object recognition has made great strides recently. However, the best methods, such as those based on kernel-SVMs are highly computationally intensive. The problem of how to accelerate the evaluation process without decreasing accuracy is thus of current interest. In this paper, we deal with this problem by using the idea of ranking. We propose a cascaded architecture which using the ranking SVM generates an ordered set of proposals for windows containing object instances. The top ranking windows may then be fed to a more complex detector. Our experiments demonstrate that our approach is robust, achieving higher overlap-recall values using fewer output proposals than the state-of-the-art. Our use of simple gradient features and linear convolution indicates that our method is also faster than the state-of-the-art

    Improved initialization and Gaussian mixture pairwise terms for dense random fields with mean-field inference

    Get PDF
    Recently, Krahenbuhl and Koltun proposed an efficient inference method for densely connected pairwise random fields using the mean-field approximation for a Conditional Random Field (CRF). However, they restrict their pairwise weights to take the form of a weighted combination of Gaussian kernels where each Gaussian component is allowed to take only zero mean, and can only be rescaled by a single value for each label pair. Further, their method is sensitive to initialization. In this paper, we propose methods to alleviate these issues. First, we propose a hierarchical mean-field approach where labelling from the coarser level is propagated to the finer level for better initialisation. Further, we use SIFT-flow based label transfer to provide a good initial condition at the coarsest level. Second, we allow our approach to take general Gaussian pairwise weights, where we learn the mean, the co-variance matrix, and the mixing co-efficient for every mixture component. We propose a variation of Expectation Maximization (EM) for piecewise learning of the parameters of the mixture model determined by the maximum likelihood function. Finally, we demonstrate the efficiency and accuracy offered by our method for object class segmentation problems on two challenging datasets: PascalVOC-10 segmentation and CamVid datasets. We show that we are able to achieve state of the art performance on the CamVid dataset, and an almost 3% improvement on the PascalVOC- 10 dataset compared to baseline graph-cut and mean-field methods, while also reducing the inference time by almost a factor of 3 compared to graph-cuts based methods
    corecore