80 research outputs found

    Resolving Semantic Confusions for Improved Zero-Shot Detection

    Full text link
    Zero-shot detection (ZSD) is a challenging task where we aim to recognize and localize objects simultaneously, even when our model has not been trained with visual samples of a few target ("unseen") classes. Recently, methods employing generative models like GANs have shown some of the best results, where unseen-class samples are generated based on their semantics by a GAN trained on seen-class data, enabling vanilla object detectors to recognize unseen objects. However, the problem of semantic confusion still remains, where the model is sometimes unable to distinguish between semantically-similar classes. In this work, we propose to train a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes and reflects them in the generated samples. Moreover, a cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics. Extensive experiments on two benchmark ZSD datasets - MSCOCO and PASCAL-VOC - demonstrate significant gains over the current ZSD methods, reducing semantic confusion and improving detection for the unseen classes.Comment: Accepted to BMVC 2022 (Oral). 15 pages, 5 figures. Project page: https://github.com/sandipan211/ZSD-SC-Resolve

    View invariant DIBR-3D image watermarking using DT-CWT

    Get PDF
    In 3D image compression, depth image based rendering (DIBR) is one of the latest techniques where the center image (say the main view, is used to synthesise the left and the right view image) and the depth image are communicated to the receiver side. It has been observed in the literature that most of the existing 3D image watermarking schemes are not resilient to the view synthesis process used in the DIBR technique. In this paper, a 3D image watermarking scheme is proposed which is invariant to the DIBR view synthesis process. In this proposed scheme, 2D-dual-tree complex wavelet transform (2D-DT-CWT) coefficients of centre view are used for watermark embedding such that shift invariance and directional property of the DT-CWT can be exploited to make the scheme robust against view synthesis process. A comprehensive set of experiments has been carried out to justify the robustness of the proposed scheme over the related existing schemes with respect to the JPEG compression and synthesis view attack

    Segmentation of tibiofemoral joint tissues from knee MRI using MtRA-Unet and incorporating shape information: Data from the Osteoarthritis Initiative

    Full text link
    Knee Osteoarthritis (KOA) is the third most prevalent Musculoskeletal Disorder (MSD) after neck and back pain. To monitor such a severe MSD, a segmentation map of the femur, tibia and tibiofemoral cartilage is usually accessed using the automated segmentation algorithm from the Magnetic Resonance Imaging (MRI) of the knee. But, in recent works, such segmentation is conceivable only from the multistage framework thus creating data handling issues and needing continuous manual inference rendering it unable to make a quick and precise clinical diagnosis. In order to solve these issues, in this paper the Multi-Resolution Attentive-Unet (MtRA-Unet) is proposed to segment the femur, tibia and tibiofemoral cartilage automatically. The proposed work has included a novel Multi-Resolution Feature Fusion (MRFF) and Shape Reconstruction (SR) loss that focuses on multi-contextual information and structural anatomical details of the femur, tibia and tibiofemoral cartilage. Unlike previous approaches, the proposed work is a single-stage and end-to-end framework producing a Dice Similarity Coefficient (DSC) of 98.5% for the femur, 98.4% for the tibia, 89.1% for Femoral Cartilage (FC) and 86.1% for Tibial Cartilage (TC) for critical MRI slices that can be helpful to clinicians for KOA grading. The time to segment MRI volume (160 slices) per subject is 22 sec. which is one of the fastest among state-of-the-art. Moreover, comprehensive experimentation on the segmentation of FC and TC which is of utmost importance for morphology-based studies to check KOA progression reveals that the proposed method has produced an excellent result with binary segmentatio

    ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

    Full text link
    Recently, transformers have captured significant interest in the area of single-image super-resolution tasks, demonstrating substantial gains in performance. Current models heavily depend on the network\u27s extensive ability to extract high-level semantic details from images while overlooking the effective utilization of multi-scale image details and intermediate information within the network. Furthermore, it has been observed that high-frequency areas in images present significant complexity for super-resolution compared to low-frequency areas. This work proposes a transformer-based super-resolution architecture called ML-CrAIST that addresses this gap by utilizing low-high frequency information in multiple scales. Unlike most of the previous work (either spatial or channel), we operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions, exploiting the inherent correlations across spatial and channel axis. Further, we devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information. Quantitative and qualitative assessments indicate that our proposed ML-CrAIST surpasses state-of-the-art super-resolution methods (e.g., 0.15 dB gain @Manga109 ×\times4). Code is available on: https://github.com/Alik033/ML-CrAIST

    Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration

    Full text link
    Underwater imagery is often compromised by factors such as color distortion and low contrast, posing challenges for high-level vision tasks. Recent underwater image restoration (UIR) methods either analyze the input image at full resolution, resulting in spatial richness but contextual weakness, or progressively from high to low resolution, yielding reliable semantic information but reduced spatial accuracy. Here, we propose a lightweight multi-stage network called Lit-Net that focuses on multi-resolution and multi-scale image analysis for restoring underwater images while retaining original resolution during the first stage, refining features in the second, and focusing on reconstruction in the final stage. Our novel encoder block utilizes parallel 1×11\times1 convolution layers to capture local information and speed up operations. Further, we incorporate a modified weighted color channel-specific l1l_1 loss (cl1cl_1) function to recover color and detail information. Extensive experimentations on publicly available datasets suggest our model\u27s superiority over recent state-of-the-art methods, with significant improvement in qualitative and quantitative measures, such as 29.47729.477 dB PSNR (1.92%1.92\% improvement) and 0.8510.851 SSIM (2.87%2.87\% improvement) on the EUVP dataset. The contributions of Lit-Net offer a more robust approach to underwater image enhancement and super-resolution, which is of considerable importance for underwater autonomous vehicles and surveillance. The code is available at: https://github.com/Alik033/Lit-Net

    Scalable Video Watermarking

    Full text link
    In recent times, enormous advancement in communication as well as hardware technologies makes the video communication very popular. With the increasing diversity among the end using media players and its associated network bandwidth, the requirement of video streams with respect to quality, resolution, frame rate becomes more heterogeneous. This increasing heterogeneity make the scalable adaptation of the video stream in the receiver end, a real problem. Scalable video coding (SVC) has been introduced as a countermeasure of this practical problem where the main video stream is designed in such a hierarchical fashion that a set of independent bit streams can be produced as per requirement of different end using devices. SVC becomes very popular in recent time and consequently, efficient and secure transmission of scalable video stream becomes a requirement. Watermarking is being considered as an efficient DRM tool for almost a decade. Although video watermarking is regarded as a well focused research domain, a very less attention has been paid on the scalable watermarking in recent times. In this book chapter, a comprehensive survey on the scalable video watermarking has been done. The main objective of this survey work is to analyse the robustness of the different existing video watermarking scheme against scalable video adaptation and try to define the research problems for the same. Firstly, few existing scalable image watermarking schemes are discussed to understand the advantages and limitations of the direct extension of such scheme for frame by frame video watermarking. Similarly few video watermarking and some recent scalable video watermarking are also narrated by specifying their pros and cons. Finally, a summary of this survey is presented by pointing out the possible countermeasure of the existing problems. </jats:p
    corecore