182 research outputs found
A framework for realistic 3D tele-immersion
Meeting, socializing and conversing online with a group of people using teleconferencing systems is still quite differ- ent from the experience of meeting face to face. We are abruptly aware that we are online and that the people we are engaging with are not in close proximity. Analogous to how talking on the telephone does not replicate the experi- ence of talking in person. Several causes for these differences have been identified and we propose inspiring and innova- tive solutions to these hurdles in attempt to provide a more realistic, believable and engaging online conversational expe- rience. We present the distributed and scalable framework REVERIE that provides a balanced mix of these solutions. Applications build on top of the REVERIE framework will be able to provide interactive, immersive, photo-realistic ex- periences to a multitude of users that for them will feel much more similar to having face to face meetings than the expe- rience offered by conventional teleconferencing systems
BTSeg: Barlow Twins Regularization for Domain Adaptation in Semantic Segmentation
Semantic image segmentation is a critical component in many computer vision
systems, such as autonomous driving. In such applications, adverse conditions
(heavy rain, night time, snow, extreme lighting) on the one hand pose specific
challenges, yet are typically underrepresented in the available datasets.
Generating more training data is cumbersome and expensive, and the process
itself is error-prone due to the inherent aleatoric uncertainty. To address
this challenging problem, we propose BTSeg, which exploits image-level
correspondences as weak supervision signal to learn a segmentation model that
is agnostic to adverse conditions. To this end, our approach uses the Barlow
twins loss from the field of unsupervised learning and treats images taken at
the same location but under different adverse conditions as "augmentations" of
the same unknown underlying base image. This allows the training of a
segmentation model that is robust to appearance changes introduced by different
adverse conditions. We evaluate our approach on ACDC and the new challenging
ACG benchmark to demonstrate its robustness and generalization capabilities.
Our approach performs favorably when compared to the current state-of-the-art
methods, while also being simpler to implement and train. The code will be
released upon acceptance
SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
In this paper, we present SPVLoc, a global indoor localization method that
accurately determines the six-dimensional (6D) camera pose of a query image and
requires minimal scene-specific prior knowledge and no scene-specific training.
Our approach employs a novel matching procedure to localize the perspective
camera's viewport, given as an RGB image, within a set of panoramic semantic
layout representations of the indoor environment. The panoramas are rendered
from an untextured 3D reference model, which only comprises approximate
structural information about room shapes, along with door and window
annotations. We demonstrate that a straightforward convolutional network
structure can successfully achieve image-to-panorama and ultimately
image-to-model matching. Through a viewport classification score, we rank
reference panoramas and select the best match for the query image. Then, a 6D
relative pose is estimated between the chosen panorama and query image. Our
experiments demonstrate that this approach not only efficiently bridges the
domain gap but also generalizes well to previously unseen scenes that are not
part of the training data. Moreover, it achieves superior localization accuracy
compared to the state of the art methods and also estimates more degrees of
freedom of the camera pose. Our source code is publicly available at
https://fraunhoferhhi.github.io/spvloc .Comment: ECCV 2024. 24 pages, 11 figures, 8 tables. Includes paper and
supplementary materia
CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation
Applications in the field of augmented reality or robotics often require
joint localisation and 6D pose estimation of multiple objects. However, most
algorithms need one network per object class to be trained in order to provide
the best results. Analysing all visible objects demands multiple inferences,
which is memory and time-consuming. We present a new single-stage architecture
called CASAPose that determines 2D-3D correspondences for pose estimation of
multiple different objects in RGB images in one pass. It is fast and memory
efficient, and achieves high accuracy for multiple objects by exploiting the
output of a semantic segmentation decoder as control input to a keypoint
recognition decoder via local class-adaptive normalisation. Our new
differentiable regression of keypoint locations significantly contributes to a
faster closing of the domain gap between real test and synthetic training data.
We apply segmentation-aware convolutions and upsampling operations to increase
the focus inside the object mask and to reduce mutual interference of occluding
objects. For each inserted object, the network grows by only one output
segmentation map and a negligible number of parameters. We outperform
state-of-the-art approaches in challenging multi-object scenes with
inter-object occlusion and synthetic training.Comment: BMVC 2022, camera-ready version (this submission includes the paper
and supplementary material
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
3D Gaussian Splatting has recently emerged as a highly promising technique
for modeling of static 3D scenes. In contrast to Neural Radiance Fields, it
utilizes efficient rasterization allowing for very fast rendering at
high-quality. However, the storage size is significantly higher, which hinders
practical deployment, e.g. on resource constrained devices. In this paper, we
introduce a compact scene representation organizing the parameters of 3D
Gaussian Splatting (3DGS) into a 2D grid with local homogeneity, ensuring a
drastic reduction in storage requirements without compromising visual quality
during rendering. Central to our idea is the explicit exploitation of
perceptual redundancies present in natural scenes. In essence, the inherent
nature of a scene allows for numerous permutations of Gaussian parameters to
equivalently represent it. To this end, we propose a novel highly parallel
algorithm that regularly arranges the high-dimensional Gaussian parameters into
a 2D grid while preserving their neighborhood structure. During training, we
further enforce local smoothness between the sorted parameters in the grid. The
uncompressed Gaussians use the same structure as 3DGS, ensuring a seamless
integration with established renderers. Our method achieves a reduction factor
of 17x to 42x in size for complex scenes with no increase in training time,
marking a substantial leap forward in the domain of 3D scene distribution and
consumption. Additional information can be found on our project page:
https://fraunhoferhhi.github.io/Self-Organizing-Gaussians/Comment: Added compression of spherical harmonics, updated compression method
with improved results (all attributes compressed with JPEG XL now), added
qualitative comparison of additional scenes, moved compression explanation
and comparison to main paper, added comparison with "Making Gaussian Splats
smaller
Automatic Reconstruction of Semantic 3D Models from 2D Floor Plans
Digitalization of existing buildings and the creation of 3D BIM models for
them has become crucial for many tasks. Of particular importance are floor
plans, which contain information about building layouts and are vital for
processes such as construction, maintenance or refurbishing. However, this data
is not always available in digital form, especially for older buildings
constructed before CAD tools were widely available, or lacks semantic
information. The digitalization of such information usually requires manual
work of an expert that must reconstruct the layouts by hand, which is a
cumbersome and error-prone process. In this paper, we present a pipeline for
reconstruction of vectorized 3D models from scanned 2D plans, aiming at
increasing the efficiency of this process. The method presented achieves
state-of-the-art results in the public dataset CubiCasa5k, and shows good
generalization to different types of plans. Our vectorization approach is
particularly effective, outperforming previous methods.Comment: 5 pages, 1 figur
- …
