Search CORE

49 research outputs found

Advances in complex systems and their applications to cybersecurity

Author: Comminiello D.
Krzemien A.
Sanchez Lasheras F.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2019
Field of study

Cybersecurity is one of the fastest growing and largest technology sectors and is increasingly being recognized as one of the major issues in many industries, so companies are increasing their security budgets in order to guarantee the security of their processes. Successful menaces to the security of information systems could lead to safety, environmental, production, and quality problems. One of the most harmful issues of attacks and intrusions is the ever-changing nature of attack technologies and strategies, which increases the difficulty of protecting computer systems. As a result, advanced systems are required to deal with the ever-increasing complexity of attacks in order to protect systems and information

Archivio della ricerca- Università di Roma La Sapienza

Hypercomplex multimodal emotion recognition from EEG and peripheral physiological signals

Author: Chiarantano E.
Comminiello D.
Grassucci E.
Lopez E.
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/01/2023
Field of study

Multimodal emotion recognition from physiological signals is receiving an increasing amount of attention due to the impossibility to control them at will unlike behavioral reactions, thus providing more reliable information. Existing deep learning-based methods still rely on extracted handcrafted features, not taking full advantage of the learning ability of neural networks, and often adopt a single-modality approach, while human emotions are inherently expressed in a multimodal way. In this paper, we propose a hypercomplex multimodal network equipped with a novel fusion module comprising parameterized hypercomplex multiplications. Indeed, by operating in a hypercomplex domain the operations follow algebraic rules which allow to model latent relations among learned feature dimensions for a more effective fusion step. We perform classification of valence and arousal from electroencephalogram (EEG) and peripheral physiological signals, employing the publicly available database MAHNOB-HCI surpassing a multimodal state-of-the-art network. The code of our work is freely available at https://github.com/ispamm/MHyEEG

Archivio della ricerca- Università di Roma La Sapienza

Compressing deep-quaternion neural networks with targeted regularisation

Author: Comminiello D.
Scardapane S.
Uncini A.
Vecchi R.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2020
Field of study

In recent years, hyper-complex deep networks (such as complex-valued and quaternion-valued neural networks - QVNNs) have received a renewed interest in the literature. They find applications in multiple fields, ranging from image reconstruction to 3D audio processing. Similar to their real-valued counterparts, quaternion neural networks require custom regularisation strategies to avoid overfitting. In addition, for many real-world applications and embedded implementations, there is the need of designing sufficiently compact networks, with few weights and neurons. However, the problem of regularising and/or sparsifying QVNNs has not been properly addressed in the literature as of now. In this study, the authors show how to address both problems by designing targeted regularisation strategies, which can minimise the number of connections and neurons of the network during training. To this end, they investigate two extensions of l1and structured regularisations to the quaternion domain. In the authors' experimental evaluation, they show that these tailored strategies significantly outperform classical (realvalued) regularisation approaches, resulting in small networks especially suitable for low-power and real-time applications

Archivio della ricerca- Università di Roma La Sapienza

PHYDI. Initializing parameterized hypercomplex neural networks as identity functions

Author: Comminiello D.
Grassucci E.
Mancanelli M.
Uncini A.
Publication venue: IEEE Computer Society
Publication date: 01/01/2023
Field of study

Neural models based on hypercomplex algebra systems are growing and prolificating for a plethora of applications, ranging from computer vision to natural language processing. Hand in hand with their adoption, parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergence at a large scale. In this paper, we study PHNNs convergence and propose parameterized hypercomplex identity initialization (PHYDI), a method to improve their convergence at different scales, leading to more robust performance when the number of layers scales up, while also reaching the same performance with fewer iterations. We show the effectiveness of this approach in different benchmarks and with common PHNNs with ResNets- and Transformer-based architecture. The code is available at https://github.com/ispamm/PHYDI

Archivio della ricerca- Università di Roma La Sapienza

Dual quaternion rotational and translational equivariance in 3D rigid motion modelling

Author: Comminiello D.
Grassucci E.
Valle M. E.
Vieira G.
Publication venue: IEEE Computer Society
Publication date: 01/01/2023
Field of study

Objects' rigid motions in 3D space are described by rotations and translations of a highly-correlated set of points, each with associated x, y, z coordinates that real-valued networks consider as separate entities, losing information. Previous works exploit quaternion algebra and their ability to model rotations in 3D space. However, these algebras do not properly encode translations, leading to sub-optimal performance in 3D learning tasks. To overcome these limitations, we employ a dual quaternion representation of rigid motions in the 3D space that jointly describes rotations and translations of point sets, processing each of the points as a single entity. Our approach is translation and rotation equivariant, so it does not suffer from shifts in the data and better learns object trajectories, as we validate in the experimental evaluations. Models endowed with this formulation outperform previous approaches in a human pose forecasting application, attesting to the effectiveness of the proposed dual quaternion formulation for rigid motions in 3D space

Archivio della ricerca- Università di Roma La Sapienza

Visual odometry with depth-wise separable convolution and quaternion neural networks

Author: Comminiello D.
De Magistris G.
Napoli C.
Starczewski J. T.
Publication venue: CEUR-WS
Publication date: 01/01/2023
Field of study

Monocular visual odometry is a fundamental problem in computer vision and it was extensively studied in literature. The vast majority of visual odometry algorithms are based on a standard pipeline consisting in feature detection, feature matching, motion estimation and local optimization. Only recently, deep learning approaches have shown cutting-edge performance, replacing the standard pipeline with an end-to-end solution. One of the main advantages of deep learning approaches over the standard methods is the reduced inference time, that is an important requirement for the application of visual odometry in real-time. Less emphasis, however, has been placed on memory requirements and training efficiency. The memory footprint, in particular, is important for real world applications such as robot navigation or autonomous driving, where the devices have limited memory resources. In this paper we tackle both aspects introducing novel architectures based on Depth-Wise Separable Convolutional Neural Network and deep Quaternion Recurrent Convolutional Neural Network. In particular, we obtain equal or better accuracy with respect to the other state-of-the-art methods on the KITTI VO dataset with a reduction of the number of parameters and a speed-up in the inference time

Archivio della ricerca- Università di Roma La Sapienza

L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality

Author: Chen C.
Comminiello D.
Gramaccioni R. F.
Marinoni C.
Uncini A.
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/01/2024
Field of study

The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. We support the dataset download and the use of the baseline models via extensive instructions provided on the official GitHub repository at https://github.com/l3das/L3DAS23. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge. For more comprehensive information and in-depth details about the challenge, we invite the reader to visit the L3DAS Project website at http://www.l3das.com/icassp2023

Archivio della ricerca- Università di Roma La Sapienza

Semantic Communications Based on Adaptive Generative Models and Information Bottleneck

Author: Barbarossa S.
Comminiello D.
Di Lorenzo P.
Grassucci E.
Pezone F.
Sardellitti S.
Publication venue
Publication date: 05/09/2023
Field of study

Semantic communications represent a significant breakthrough with respect to the current communication paradigm, as they focus on recovering the meaning behind the transmitted sequence of symbols, rather than the symbols themselves. In semantic communications, the scope of the destination is not to recover a list of symbols symbolically identical to the transmitted ones, but rather to recover a message that is semantically equivalent to the semantic message emitted by the source. This paradigm shift introduces many degrees of freedom to the encoding and decoding rules that can be exploited to make the design of communication systems much more efficient. In this paper, we present an approach to semantic communication building on three fundamental ideas: 1) represent data over a topological space as a formal way to capture semantics, as expressed through relations; 2) use the information bottleneck principle as a way to identify relevant information and adapt the information bottleneck online, as a function of the wireless channel state, in order to strike an optimal trade-off between transmit power, reconstruction accuracy and delay; 3) exploit probabilistic generative models as a general tool to adapt the transmission rate to the wireless channel state and make possible the regeneration of the transmitted images or run classification tasks at the receiver side.Comment: To appear on IEEE Communications Magazine, special issue on Semantic Communications: Transmission beyond Shannon, 202

arXiv.org e-Print Archive

SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

Author: Comminiello Danilo
Comunità Marco
Gramaccioni Riccardo F.
Postolache Emilian
Reiss Joshua D.
Rodolà Emanuele
Publication venue
Publication date: 23/10/2023
Field of study

Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no reference audio exists, requiring manual annotation of event timings from the video. We propose a system to extract repetitive actions onsets from a video, which are then used - in conjunction with audio or textual embeddings - to condition a diffusion model trained to generate a new synchronized sound effects audio track. In this way, we leave complete creative control to the sound designer while removing the burden of synchronization with video. Furthermore, editing the onset track or changing the conditioning embedding requires much less effort than editing the audio track itself, simplifying the sonification process. We provide sound examples, source code, and pretrained models to faciliate reproducibilit

arXiv.org e-Print Archive

Group sparse regularization for deep neural networks

Author: Comminiello D.
Hussain A.
Scardapane S.
Uncini A.
Publication venue: Elsevier
Publication date: 02/07/2016
Field of study

In this paper, we address the challenging task of simultaneously optimizing (i) the weights of a neural network, (ii) the number of neurons for each hidden layer, and (iii) the subset of active input features (i.e., feature selection). While these problems are traditionally dealt with separately, we propose an efficient regularized formulation enabling their simultaneous parallel execution, using standard optimization routines. Specifically, we extend the group Lasso penalty, originally proposed in the linear regression literature, to impose group-level sparsity on the network’s connections, where each group is defined as the set of outgoing weights from a unit. Depending on the specific case, the weights can be related to an input variable, to a hidden neuron, or to a bias unit, thus performing simultaneously all the aforementioned tasks in order to obtain a compact network. We carry out an extensive experimental evaluation, in comparison with classical weight decay and Lasso penalties, both on a toy dataset for handwritten digit recognition, and multiple realistic mid-scale classification benchmarks. Comparative results demonstrate the potential of our proposed sparse group Lasso penalty in producing extremely compact networks, with a significantly lower number of input features, with a classification accuracy which is equal or only slightly inferior to standard regularization terms

arXiv.org e-Print Archive

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Edinburgh Napier Research Repository

Archivio della ricerca- Università di Roma La Sapienza