70 research outputs found

    Formalization of the General Video Temporal Synchronization Problem

    Get PDF
    In this work, we present a theoretical formalization of the temporal synchronization problem and a method to temporally synchronize multiple stationary video cameras with overlapping views of the same scene. The method uses a two stage approach that first approximates the synchronization by tracking moving objects and identifying curvature points. The method then proceeds to refine the estimate using a consensus based matching heuristic to find frames that best agree with the pre-computed camera geometries from stationary background image features. By using the fundamental matrix and the trifocal tensor in the second refinement step, we improve the estimation of the first step and handle a broader more generic range of input scenarios and camera conditions. The method is relatively simple compared to current techniques and is no harder than feature tracking in stage one and computing accurate geometries in stage two. We also provide a robust method to assist synchronization in the presence of inaccurate geometry computation, and a theoretical limit on the accuracy that can be expected from any synchronization syste

    Multi-Robot SLAM: A Vision-Based Approach

    Get PDF

    What You See Is What You Detect: Towards better Object Densification in 3D detection

    Full text link
    Recent works have demonstrated the importance of object completion in 3D Perception from Lidar signal. Several methods have been proposed in which modules were used to densify the point clouds produced by laser scanners, leading to better recall and more accurate results. Pursuing in that direction, we present, in this work, a counter-intuitive perspective: the widely-used full-shape completion approach actually leads to a higher error-upper bound especially for far away objects and small objects like pedestrians. Based on this observation, we introduce a visible part completion method that requires only 11.3\% of the prediction points that previous methods generate. To recover the dense representation, we propose a mesh-deformation-based method to augment the point set associated with visible foreground objects. Considering that our approach focuses only on the visible part of the foreground objects to achieve accurate 3D detection, we named our method What You See Is What You Detect (WYSIWYD). Our proposed method is thus a detector-independent model that consists of 2 parts: an Intra-Frustum Segmentation Transformer (IFST) and a Mesh Depth Completion Network(MDCNet) that predicts the foreground depth from mesh deformation. This way, our model does not require the time-consuming full-depth completion task used by most pseudo-lidar-based methods. Our experimental evaluation shows that our approach can provide up to 12.2\% performance improvements over most of the public baseline models on the KITTI and NuScenes dataset bringing the state-of-the-art to a new level. The codes will be available at \textcolor[RGB]{0,0,255}{\url{{https://github.com/Orbis36/WYSIWYD}

    FSMDet: Vision-guided feature diffusion for fully sparse 3D detector

    Full text link
    Fully sparse 3D detection has attracted an increasing interest in the recent years. However, the sparsity of the features in these frameworks challenges the generation of proposals because of the limited diffusion process. In addition, the quest for efficiency has led to only few work on vision-assisted fully sparse models. In this paper, we propose FSMDet (Fully Sparse Multi-modal Detection), which use visual information to guide the LiDAR feature diffusion process while still maintaining the efficiency of the pipeline. Specifically, most of fully sparse works focus on complex customized center fusion diffusion/regression operators. However, we observed that if the adequate object completion is performed, even the simplest interpolation operator leads to satisfactory results. Inspired by this observation, we split the vision-guided diffusion process into two modules: a Shape Recover Layer (SRLayer) and a Self Diffusion Layer (SDLayer). The former uses RGB information to recover the shape of the visible part of an object, and the latter uses a visual prior to further spread the features to the center region. Experiments demonstrate that our approach successfully improves the performance of previous fully sparse models that use LiDAR only and reaches SOTA performance in multimodal models. At the same time, thanks to the sparse architecture, our method can be up to 5 times more efficient than previous SOTA methods in the inference process.Accepted by European Conference on Computer Vision (ECCV) 2024 workshop on VCA
    corecore