1,337 research outputs found

    Composite Texture Synthesis

    Get PDF
    Many textures require complex models to describe their intricate structures. Their modeling can be simplified if they are considered composites of simpler subtextures. After an initial, unsupervised segmentation of the composite texture into the subtextures, it can be described at two levels. One is a label map texture, which captures the layout of the different subtextures. The other consists of the different subtextures. This scheme has to be refined to also include mutual influences between textures, mainly found near their boundaries. The proposed composite texture model also includes these. The paper describes an improved implementation of this idea. Whereas in a previous implementation subtextures and their interactions were synthesized sequentially, this paper proposes a parallel implementation, which yields results of higher qualit

    Single-File Diffusion of Externally Driven Particles

    Full text link
    We study 1-D diffusion of NN hard-core interacting Brownian particles driven by the space- and time-dependent external force. We give the exact solution of the NN-particle Smoluchowski diffusion equation. In particular, we investigate the nonequilibrium energetics of two interacting particles under the time-periodic driving. The hard-core interaction induces entropic repulsion which differentiates the energetics of the two particles. We present exact time-asymptotic results which describe the mean energy, the accepted work and heat, and the entropy production of interacting particles and we contrast these quantities against the corresponding ones for the non-interacting particles

    Semantic-Context-Based Augmented Descriptor For Image Feature Matching

    Get PDF
    Abstract. This paper proposes an augmented version of local features that enhances the discriminative power of the feature without affecting its invariance to image deformations. The idea is about learning local features, aiming to estimate its semantic, which is then exploited in conjunction with the bag of words paradigm to build an augmented feature descriptor. Basically, any local descriptor can be casted in the proposed context, and thus the approach can be easy generalized to fit in with any local approach. The semantic-context signature is a 2D histogram which accumulates the spatial distribution of the visual words around each local feature. The obtained semantic-context component is concatenated with the local feature to generate our proposed feature descriptor. This is expected to handle ambiguities occurring in images with multiple similar motifs and depicting slight complicated non-affine distortions, outliers, and detector errors. The approach is evaluated for two data sets. The first one is intentionally selected with images containing multiple similar regions and depicting slight non-affine distortions. The second is the standard data set of Mikolajczyk. The evaluation results showed our approach performs significantly better than expected results as well as in comparison with other methods.

    {DAFormer}: {I}mproving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

    Get PDF
    As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in unsupervised domain adaptation (UDA). Even though a large number of methods propose new adaptation strategies, they are mostly based on outdated network architectures. As the influence of recent network architectures has not been systematically studied, we first benchmark different network architectures for UDA and then propose a novel UDA method, DAFormer, based on the benchmark results. The DAFormer network consists of a Transformer encoder and a multi-level context-aware feature fusion decoder. It is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting DAFormer to the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a learning rate warmup promote feature transfer from ImageNet pretraining. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes and enables learning even difficult classes such as train, bus, and truck well. The implementation is available at https://github.com/lhoyer/DAFormer

    Direct Dense Pose Estimation

    Get PDF

    Sound and Visual Representation Learning with Multiple Pretraining Tasks

    Get PDF
    Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. Specifically, for this study, we investigate binaural sounds and image data in isolation. For binaural sounds, we propose three SSL tasks namely, spatial alignment, temporal synchronization of foreground objects and binaural audio and temporal gap prediction. We investigate several approaches of Multi-SSL and give insights into the downstream task performance on video retrieval, spatial sound super resolution, and semantic prediction on the OmniAudio dataset. Our experiments on binaural sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models and fully supervised models in the downstream task performance. As a check of applicability on other modality, we also formulate our Multi-SSL models for image representation learning and we use the recently proposed SSL tasks, MoCov2 and DenseCL. Here, Multi-SSL surpasses recent methods such as MoCov2, DenseCL and DetCo by 2.06%, 3.27% and 1.19% on VOC07 classification and +2.83, +1.56 and +1.61 AP on COCO detection. Code will be made publicly available

    Scribble-Supervised {LiDAR} Semantic Segmentation

    Get PDF

    MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

    Get PDF
    In unsupervised domain adaptation (UDA), a model trained on source data (e.g.synthetic) is adapted to target data (e.g. real-world) without access to targetannotation. Most previous UDA methods struggle with classes that have a similarvisual appearance on the target domain as no ground truth is available to learnthe slight appearance differences. To address this problem, we propose a MaskedImage Consistency (MIC) module to enhance UDA by learning spatial contextrelations of the target domain as additional clues for robust visualrecognition. MIC enforces the consistency between predictions of masked targetimages, where random patches are withheld, and pseudo-labels that are generatedbased on the complete image by an exponential moving average teacher. Tominimize the consistency loss, the network has to learn to infer thepredictions of the masked regions from their context. Due to its simple anduniversal concept, MIC can be integrated into various UDA methods acrossdifferent visual recognition tasks such as image classification, semanticsegmentation, and object detection. MIC significantly improves thestate-of-the-art performance across the different recognition tasks forsynthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. Forinstance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8%on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to animprovement of +2.1 and +3.0 percent points over the previous state of the art.The implementation is available at https://github.com/lhoyer/MIC.<br
    corecore