Search CORE

9 research outputs found

Video augmentation for improving audio speech recognition under noise

Author: British Machine Vision Conference (BMVC)
Cavallaro A
Gong S
Pachoud S
Publication venue
Publication date: 23/02/2015
Field of study

Queen Mary Research Online

Sketch-a-Net that Beats Humans

Author: British Machine Vision Conference
Hospedales T
Song Y
XIANG T
Yang Y
Yu Q
Publication venue
Publication date: 08/12/2015
Field of study

Queen Mary Research Online

Part-based Face Recognition with Vision Transformers

Author: British Machine Vision Conference
Sun Z
Tzimiropoulos G
Publication venue
Publication date: 21/11/2022
Field of study

Queen Mary Research Online

Finding Directions in GAN’s Latent Space for Neural Face Reenactment

Author: Argyriou V
Bounareli S
British Machine Vision Conference
Tzimiropoulos G
Publication venue
Publication date: 21/11/2022
Field of study

Queen Mary Research Online

Prompting Visual-Language Models for Dynamic Facial Expression Recognition

Author: Patras I
The 34th British Machine Vision Conference
Zhao Z
Publication venue
Publication date: 01/01/2023
Field of study

This paper presents a novel visual-language model called DFER-CLIP, which is based on the CLIP model and designed for in-the-wild Dynamic Facial Expression Recognition (DFER). Specifically, the proposed DFER-CLIP consists of a visual part and a textual part. For the visual part, based on the CLIP image encoder, a temporal model consisting of several Transformer encoders is introduced for extracting temporal facial expression features, and the final feature embedding is obtained as a learnable "class" token. For the textual part, we use as inputs textual descriptions of the facial behaviour that is related to the classes (facial expressions) that we are interested in recognising – those descriptions are generated using large language models, like ChatGPT. This, in contrast to works that use only the class names and more accurately captures the relationship between them. Alongside the textual description, we introduce a learnable token which helps the model learn relevant context information for each expression during training. Extensive experiments demonstrate the effectiveness of the proposed method and show that our DFER-CLIP also achieves state-of-the-art results compared with the current supervised DFER methods on the DFEW, FERV39k, and MAFW benchmarks

Queen Mary Research Online

TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition

Author: 30th British Machine Vision Conference
Bishay M
Patras I
Zoumpourlis G
Publication venue
Publication date: 09/09/2019
Field of study

In this paper we propose a novel Temporal Attentive Relation Network (TARN) for the problems of few-shot and zero-shot action recognition. At the heart of our network is a meta-learning approach that learns to compare representations of variable temporal length, that is, either two videos of different length (in the case of few-shot action recognition) or a video and a semantic representation such as word vector (in the case of zero-shot action recognition). By contrast to other works in few-shot and zero-shot action recognition, we a) utilise attention mechanisms so as to perform temporal alignment, and b) learn a deep-distance measure on the aligned representations at video segment level. We adopt an episode-based training scheme and train our network in an end-to-end manner. The proposed method does not require any fine-tuning in the target domain or maintaining additional representations as is the case of memory networks. Experimental results show that the proposed architecture outperforms the state of the art in few-shot action recognition, and achieves competitive results in zero-shot action recognition

Queen Mary Research Online

Prioritizing the Propagation of Identity Beliefs for Multi-object Tracking

Author: British Machine Vision Conference
De Vleeschouwer Christophe
K.C. Amit Kumar
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2012
Field of study

Multi-object tracking requires locating the targets as well as labeling their identities. Inferring identities of the targets from their appearances is a challenge when the avail- ability and the reliability of the observation process do vary along the time and space. The purpose of this paper is to assign identities to those appearance measurements using a graph-based formalism. Each node of the graph corresponds to a tracklet, which is defined to be a sequence of positions that very likely correspond to the same physical target. Tracklets are pre-computed and our work investigates how to assign them identities, knowing the reference appearance of each target. Initially, each node is assigned a probability distribution over the set of possible identities, based on the observed appearance features. Afterwards, belief propagation is considered to infer the identities of more ambiguous nodes from those of less ambiguous nodes, by exploiting the graph constraints and the measures of similarities between the nodes. In contrast to the standard belief propagation, which treats the nodes in an arbitrary order, the pro- posed method uses a priority-based belief propagation, in which less ambiguous nodes are scheduled to transmit their messages first. Validation is performed on a real-life basketball dataset. The proposed method achieves 89% identification rate, which is an improvement of 21% and 16% compared to individ- ual identity assignment, and to standard belief propagation, respectively

DIAL UCLouvain

Ordinal pooling

Author: British Machine Vision Conference
De Vleeschouwer Christophe
Deliège Adrien
Istasse Maxime
Kumar Ashwani
Van Droogenbroeck Marc
Publication venue
Publication date: 01/01/2019
Field of study

In the framework of convolutional neural networks, downsampling is often performed with an average-pooling, where all the activations are treated equally, or with a maxpooling operation that only retains an element with maximum activation while discarding the others. Both of these operations are restrictive and have previously been shown to be sub-optimal. To address this issue, a novel pooling scheme, named ordinal pooling, is introduced in this work. Ordinal pooling rearranges all the elements of a pooling region in a sequence and assigns a different weight to each element based upon its order in the sequence. These weights are used to compute the pooling operation as a weighted sum of the rearranged elements of the pooling region. They are learned via a standard gradient-based training, allowing to learn a behavior anywhere in the spectrum of average-pooling to max-pooling in a differentiable manner. Our experiments suggest that it is advantageous for the networks to perform different types of pooling operations within a pooling layer and that a hybrid behavior between average- and maxpooling is often beneficial. More importantly, they also demonstrate that ordinal pooling leads to consistent improvements in the accuracy over average- or max-pooling operations while speeding up the training and alleviating the issue of the choice of the pooling operations and activation functions to be used in the networks. In particular, ordinal pooling mainly helps on lightweight or quantized deep learning architectures, as typically considered e.g. for embedded applications

DIAL UCLouvain