Search CORE

154 research outputs found

What is Holding Back Convnets for Detection?

Author: D Hoiem
H Li
J Xu
M Everingham
P Agrawal
Y Bengio
Publication venue
Publication date: 01/01/2015
Field of study

Convolutional neural networks have recently shown excellent results in general object detection and many other tasks. Albeit very effective, they involve many user-defined design choices. In this paper we want to better understand these choices by inspecting two key aspects "what did the network learn?", and "what can the network learn?". We exploit new annotations (Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite common belief, our results indicate that existing state-of-the-art convnet architectures are not invariant to various appearance factors. In fact, all considered networks have similar weak points which cannot be mitigated by simply increasing the training data (architectural changes are needed). We show that overall performance can improve when using image renderings for data augmentation. We report the best known results on the Pascal3D+ detection and view-point estimation tasks

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery

Author: A Newell
D Hoiem
G Cheng
G Ghiasi
J Ma
K Liu
M Everingham
PO Pinheiro
T Tang
TY Lin
W Liu
X Yang
Publication venue
Publication date: 01/01/2018
Field of study

Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on oriented bounding box detection tasks on the challenging DOTA dataset, outperforming all published methods by a large margin (+6% and +12% absolute improvement, respectively). Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery.Comment: ACCV 201

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Scipedia

'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems

Author: A Borji
A Khosla
A Torralba
CW Tyler
D Hoiem
GA Miller
L Xu
M Everingham
MD Zeiler
PF Felzenszwalb
SE Palmer
T Tommasi
T-Y Lin
Publication venue
Publication date: 24/11/2016
Field of study

An examination of object recognition challenge leaderboards (ILSVRC, PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small differences amongst themselves in terms of error rate/mAP. To better differentiate the top performers, additional criteria are required. Moreover, the (test) images, on which the performance scores are based, predominantly contain fully visible objects. Therefore, `harder' test images, mimicking the challenging conditions (e.g. occlusion) in which humans routinely recognize objects, need to be utilized for benchmarking. To address the concerns mentioned above, we make two contributions. First, we systematically vary the level of local object-part content, global detail and spatial context in images from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12. Second, we propose an object-part based benchmarking procedure which quantifies classifiers' robustness to a range of visibility and contextual settings. The benchmarking procedure relies on a semantic similarity measure that naturally addresses potential semantic granularity differences between the category labels in training and test datasets, thus eliminating manual mapping. We use our procedure on the PPSS-12 dataset to benchmark top-performing classifiers trained on the ILSVRC-2012 dataset. Our results show that the proposed benchmarking procedure enables additional differentiation among state-of-the-art object classifiers in terms of their ability to handle missing content and insufficient object detail. Given this capability for additional differentiation, our approach can potentially supplement existing benchmarking procedures used in object recognition challenge leaderboards.Comment: Extended version of our ACCV-2016 paper. Author formatting modifie

arXiv.org e-Print Archive

Crossref

Contextual Object Detection with a Few Relevant Neighbors

Author: C Desai
D Hoiem
D Koller
G Heitz
J Li
J Oramas
J Pearl
L Wolf
N Komodakis
PF Felzenszwalb
R Mairon
R Perko
RG Cinbis
S Shalev-Shwartz
T-Y Lin
Publication venue
Publication date: 17/10/2018
Field of study

A natural way to improve the detection of objects is to consider the contextual constraints imposed by the detection of additional objects in a given scene. In this work, we exploit the spatial relations between objects in order to improve detection capacity, as well as analyze various properties of the contextual object detection problem. To precisely calculate context-based probabilities of objects, we developed a model that examines the interactions between objects in an exact probabilistic setting, in contrast to previous methods that typically utilize approximations based on pairwise interactions. Such a scheme is facilitated by the realistic assumption that the existence of an object in any given location is influenced by only few informative locations in space. Based on this assumption, we suggest a method for identifying these relevant locations and integrating them into a mostly exact calculation of probability based on their raw detector responses. This scheme is shown to improve detection results and provides unique insights about the process of contextual inference for object detection. We show that it is generally difficult to learn that a particular object reduces the probability of another, and that in cases when the context and detector strongly disagree this learning becomes virtually impossible for the purposes of improving the results of an object detector. Finally, we demonstrate improved detection results through use of our approach as applied to the PASCAL VOC and COCO datasets

arXiv.org e-Print Archive

Crossref

Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution

Author: A Joulin
C Galleguillos
C Rother
CL Zitnick
D Hoiem
JH Hosang
JRR Uijlings
K He
M Everingham
O Russakovsky
PF Felzenszwalb
R Girshick
T Deselaers
W Ren
Y Boykov
Publication venue
Publication date: 01/01/2016
Field of study

Given a set of images containing objects from the same category, the task of image co-localization is to identify and localize each instance. This paper shows that this problem can be solved by a simple but intriguing idea, that is, a common object detector can be learnt by making its detection confidence scores distributed like those of a strongly supervised detector. More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them. Thus, we devise an entropy-based objective function to enforce the above property when learning the common object detector. Once the detector is learnt, we resort to a segmentation approach to refine the localization. We show that despite its simplicity, our approach outperforms state-of-the-art methods.Comment: Accepted to Proc. European Conf. Computer Vision 201

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Recovering 6D Object Pose: A Review and Multi-modal Analysis

Author: A Tejani
C Sahin
D Hoiem
E Brachmann
H Azizpour
M Everingham
M Everingham
MY Liu
N Correll
O Russakovsky
S Hinterstoisser
S Hinterstoisser
T Hodaň
U Bonde
W Kehl
Publication venue
Publication date: 15/08/2018
Field of study

A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances' 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem

arXiv.org e-Print Archive

Crossref

3D Spatial Layout Propagation in a Video Sequence

Author: A Gupta
A Saxena
C Rother
D Hoiem
D Hoiem
V Hedau
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2014
Field of study

Crossref

eScholarship - University of California

Modelling search for people in 900 scenes: A combined source model of eye guidance

Author: Antonio Torralba
Aude Oliva
Barbara Hidalgo-Sotelo
Bruce N.
Buswell G. T.
Chun M. M.
Dalal N.
Dalal N.
Einhäuser W.
Fei Fei L.
Harel J.
Henderson J. M.
Hoiem D.
Itti L.
Koch C.
Krista A. Ehinger
Kumar S.
Lazebnik S.
Renninger L. W.
Torralba A.
Ullman S.
Wolfe J. M.
Yarbus A.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2009
Field of study

How predictable are human eye movements during search in real world scenes? We recorded 14 observers’ eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from the scene. These eye movements were used to evaluate computational models of search guidance from three sources: Saliency, target features, and scene context. Each of these models independently outperformed a cross-image control in predicting human fixations. Models that combined sources of guidance ultimately predicted 94% of human agreement, with the scene context component providing the most explanatory power. None of the models, however, could reach the precision and fidelity of an attentional map defined by human fixations. This work puts forth a benchmark for computational models of search in real world scenes. Further improvements in modelling should capture mechanisms underlying the selectivity of observers’ fixations during search.National Eye Institute (Integrative Training Program in Vision grant T32 EY013935)Massachusetts Institute of Technology (Singleton Graduate Research Fellowship)National Science Foundation (U.S.) (Graduate Research Fellowship)National Science Foundation (U.S.) (CAREER Award (0546262))National Science Foundation (U.S.) (NSF contract (0705677))National Science Foundation (U.S.) (Career Award (0747120)

Crossref

DSpace@MIT

PubMed Central

A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems

Author: A Jaimovich
B Savchynskyy
Bernhard X. Kausler
Bjoern Andres
Bogdan Savchynskyy
BW Kernighan
C Chekuri
C Nieuwenhuis
C Yanover
CA Cocosco
Carsten Rother
Christoph Schnörr
D Goldberg
D Hoiem
D Koller
Dhruv Batra
Fred A. Hamprecht
G Călinescu
J Besag
J Lellmann
J Pearl
Jan Lellmann
Jörg H. Kappes
K Alahari
M Guignard
M Schlesinger
MJ Wainwright
N Komodakis
N Komodakis
Nikos Komodakis
OJ Woodford
P Kohli
PF Felzenszwalb
S Nowozin
Sebastian Nowozin
SL Lauritzen
Sungwoong Kim
T Achterberg
T Bonato
T Werner
Thorben Kröger
U Brandes
V Kolmogorov
V Lempitsky
Y Boykov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/04/2014
Field of study

International audienceSzeliski et al. published an influential study in 2006 on energy minimization methods for Markov Random Fields (MRF). This study provided valuable insights in choosing the best optimization technique for certain classes of problems. While these insights remain generally useful today, the phenomenal success of random field models means that the kinds of inference problems that have to be solved changed significantly. Specifically , the models today often include higher order interactions, flexible connectivity structures, large label-spaces of different car-dinalities, or learned energy tables. To reflect these changes, we provide a modernized and enlarged study. We present an empirical comparison of more than 27 state-of-the-art optimization techniques on a corpus of 2,453 energy minimization instances from diverse applications in computer vision. To ensure reproducibility, we evaluate all methods in the OpenGM 2 framework and report extensive results regarding runtime and solution quality. Key insights from our study agree with the results of Szeliski et al. for the types of models they studied. However, on new and challenging types of models our findings disagree and suggest that polyhedral methods and integer programming solvers are competitive in terms of runtime and solution quality over a large range of model types

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL: Hyper Article en Ligne

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

One-Shot Unsupervised Cross-Domain Detection

Author: A Geiger
B Sun
C Sakaridis
D Hoiem
M Everingham
M Noroozi
R Zhang
S Ben-David
S Bucci
S Liu
W Liu
Y Ganin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)