43 research outputs found

    Shot-based object retrieval from video with compressed Fisher vectors

    Get PDF
    This paper addresses the problem of retrieving those shots from a database of video sequences that match a query image. Existing architectures are mainly based on Bag of Words model, which consists in matching the query image with a high-level representation of local features extracted from the video database. Such architectures lack however the capability to scale up to very large databases. Recently, Fisher Vectors showed promising results in large scale image retrieval problems, but it is still not clear how they can be best exploited in video-related applications. In our work, we use compressed Fisher Vectors to represent the video-shots and we show that inherent correlation between video-frames can be proficiently exploited. Experiments show that our proposal enables better performance for lower computational requirements than similar architectures

    Learnable Descriptors for Visual Search

    Get PDF
    This work proposes LDVS, a learnable binary local descriptor devised for matching natural images within the MPEG CDVS framework. LDVS descriptors are learned so that they can be sign-quantized and compared using the Hamming distance. The underlying convolutional architecture enjoys a moderate parameters count for operations on mobile devices. Our experiments show that LDVS descriptors perform favorably over comparable learned binary descriptors at patch matching on two different datasets. A complete pair-wise image matching pipeline is then designed around LDVS descriptors, integrating them in the reference CDVS evaluation framework. Experiments show that LDVS descriptors outperform the compressed CDVS SIFT-like descriptors at pair-wise image matching over the challenging CDVS image dataset

    WiGNet: Windowed Vision Graph Neural Network

    Get PDF
    corecore