Search CORE

227 research outputs found

AnchorNet: a weakly supervised network to learn geometry-sensitive features for semantic matching

Author: Larlus D
Novotny David
Vedaldi Andrea
Publication venue: IEEE
Publication date: 01/07/2017
Field of study

Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we propose a deep network, termed AnchorNet, that produces image representations that are well-suited for semantic matching. It relies on a set of filters whose response is geometrically consistent across different object instances, even in the presence of strong intra-class, scale, or viewpoint variations. Trained only with weak image-level labels, the final representation successfully captures information about the object structure and improves results of state-of-the-art semantic matching methods such as the Deformable Spatial Pyramid or the Proposal Flow methods. We show positive results on the cross-instance matching task where different instances of the same object category are matched as well as on a new cross-category semantic matching task aligning pairs of instances each from a different object class

Crossref

Oxford University Research Archive

Deep image prior

Author: Lempitsky V
Ulyanov D
Vedaldi Andrea
Publication venue: Institute for Electrical and Electronics Engineers
Publication date: 17/12/2018
Field of study

Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, superresolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs. Apart from its diverse applications, our approach highlights the inductive bias captured by standard generator network architectures. It also bridges the gap between two very popular families of image restoration methods: learning-based methods using deep convolutional networks and learning-free methods based on handcrafted image priors such as self-similarity

Crossref

Oxford University Research Archive

SNeS: learning probably symmetric neural surfaces from incomplete data

Author: Campbell D
Henriques JF
Insafutdinov E
Vedaldi Andrea
Publication venue: Springer
Publication date: 01/01/2022
Field of study

We present a method for the accurate 3D reconstruction of partly-symmetric objects. We build on the strengths of recent advances in neural reconstruction and rendering such as Neural Radiance Fields (NeRF). A major shortcoming of such approaches is that they fail to reconstruct any part of the object which is not clearly visible in the training image, which is often the case for in-the-wild images and videos. When evidence is lacking, structural priors such as symmetry can be used to complete the missing information. However, exploiting such priors in neural rendering is highly non-trivial: while geometry and non-reflective materials may be symmetric, shadows and reflections from the ambient scene are not symmetric in general. To address this, we apply a soft symmetry constraint to the 3D geometry and material properties, having factored appearance into lighting, albedo colour and reflectivity. We evaluate our method on the recently introduced CO3D dataset, focusing on the car category due to the challenge of reconstructing highly-reflective materials. We show that it can reconstruct unobserved regions with high fidelity and render high-quality novel view images

Crossref

Oxford University Research Archive

HoloFusion: Towards Photo-realistic 3D Generative Modeling

Author: Karnewar A
Mitra NJ
Novotny D
Vedaldi A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2024
Field of study

Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects but with potential structural defects and lacking view consistency or realism. We present HoloFusion, a method that combines the best of these approaches to produce high-fidelity, plausible, and diverse 3D samples while learning from a collection of multi-view 2D images only. The method first generates coarse 3D samples using a variant of the recently proposed HoloDiffusion generator. Then, it independently renders and upsamples a large number of views of the coarse 3D model, super-resolves them to add detail, and distills those into a single, high-fidelity implicit 3D representation, which also ensures view-consistency of the final renders. The super-resolution network is trained as an integral part of HoloFusion, end-to-end, and the final distillation uses a new sampling scheme to capture the space of super-resolved signals. We compare our method against existing baselines, including DreamFusion, Get3D, EG3D, and HoloDiffusion, and achieve, to the best of our knowledge, the most realistic results on the challenging CO3Dv2 dataset

UCL Discovery

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

Author: Asano Y.
Campbell D.
Feichtenhofer C.
Henriques J.
Metze F.
Misra I.
Patrick M.
Vedaldi A.
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2022
Field of study

International Migration, Integration and Social Cohesion online publications

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

Author: Asano Y.
Campbell D.
Feichtenhofer C.
Henriques J.
Metze F.
Misra I.
Patrick M.
Vedaldi A.
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2022
Field of study

International Migration, Integration and Social Cohesion online publications

Shape Retrieval of Non-rigid 3D Human Models

Author: A Elad
A Giachetti
A Vedaldi
A. Ben Hamza
A. Bronstein
A. Giachetti
A. Godil
A. Tatsuma
AM Bronstein
B Li
B Li
B. Li
C Ionescu
C Li
C Li
C Li
C. Li
D. Pickup
F Heijden Van Der
G. Tam
GE Hinton
GK Tam
H. Johan
H. Li
J Sun
J. Han
J. Ye
KQ Weinberger
L. Isaia
L. Lai
L. Sun
M Kac
M Ovsjanikov
M Reuter
M. Aono
M. Bronstein
N Hasler
P. L. Rosin
R Gal
R Litman
R Osada
R. Litman
R. R. Martin
RO Duda
S Bu
S Bu
S Valette
S. Bu
S. Cheng
U. Castellani
V. Garro
X. Liu
X. Sun
Y Lipman
Y Rubner
Y. Lu
Z Lian
Z Lian
Z. Cheng
Z. Lian
Z. Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks. We have added 145 new models for use as a separate training set, in order to standardise the training data used and provide a fairer comparison. We have also included experiments with the FAUST dataset of human scans. All participants of the previous benchmark study have taken part in the new tests reported here, many providing updated results using the new data. In addition, further participants have also taken part, and we provide extra analysis of the retrieval results. A total of 25 different shape retrieval methods are compared

Crossref

Online Research @ Cardiff

Springer - Publisher Connector

Fraunhofer-Publica

Catalogo dei prodotti della ricerca

Cronfa at Swansea University

Learning the 3D fauna of the web

Author: Jakab Tomáš
Li Ruining
Li Z
Litvak D
Rupprecht Christian
Vedaldi Andrea
Wu J
Wu S
Zhang Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/05/2024
Field of study

Learning 3D models of all animals in nature requires massively scaling up existing solutions. With this ultimate goal in mind, we develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottleneck of modeling animals is the limited availability of training data, which we overcome by learning our model from 2D Internet images. We show that prior approaches, which are category-specific, fail to generalize to rare species with limited training images. We address this challenge by introducing the Semantic Bank of Skinned Models (SBSM), which automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge implicitly captured by an off-the-shelf self-supervised feature extractor. To train such a model, we also contribute a new large-scale dataset of diverse animal species. At inference time, given a single image of any quadruped animal, our model reconstructs an articulated 3D mesh in a feed-forward manner in seconds

Oxford University Research Archive

Replay: multi-modal multi-view acted videos for casual holography

Author: Chen C
Graham B
Kleiman Y
Kokkinos F
Neverova N
Novotny D
Rocco I
Shapovalov R
Vedaldi Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/10/2023
Field of study

We introduce Replay, a collection of multi-view, multimodal videos of humans interacting socially. Each scene is filmed in high production quality, from different viewpoints with several static cameras, as well as wearable action cameras, and recorded with a large array of microphones at different positions in the room. Overall, the dataset contains over 4000 minutes of footage and over 7 million timestamped high-resolution frames annotated with camera poses and partially with foreground masks. The Replay dataset has many potential applications, such as novelview synthesis, 3D reconstruction, novel-view acoustic synthesis, human body and face analysis, and training generative models. We provide a benchmark for training and evaluating novel-view synthesis, with two scenarios of different difficulty. Finally, we evaluate several baseline state-of-theart methods on the new benchmark

Oxford University Research Archive

Text-to-4D dynamic scene generation

Author: Ashual O
Goyal N
Johnson J
Kokkinos F
Makarov I
Parikh D
Polyak A
Sheynin S
Singer U
Taigman Y
Vedaldi Andrea
Publication venue: Proceedings of Machine Learning Research
Publication date: 31/08/2023
Field of study

We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment. MAV3D does not require any 3D or 4D data and the T2V model is trained only on Text-Image pairs and unlabeled videos. We demonstrate the effectiveness of our approach using comprehensive quantitative and qualitative experiments and show an improvement over previously established internal baselines. To the best of our knowledge, our method is the first to generate 3D dynamic scenes given a text description. Generated samples can be viewed at make-a-video3d.github.i

Oxford University Research Archive