49 research outputs found
Advances in complex systems and their applications to cybersecurity
Cybersecurity is one of the fastest growing and largest technology sectors and is increasingly being recognized as one of the major issues in many industries, so companies are increasing their security budgets in order to guarantee the security of their processes. Successful menaces to the security of information systems could lead to safety, environmental, production, and quality problems.
One of the most harmful issues of attacks and intrusions is the ever-changing nature of attack technologies and strategies, which increases the difficulty of protecting computer systems. As a result, advanced systems are required to deal with the ever-increasing complexity of attacks in order to protect systems and information
Hypercomplex multimodal emotion recognition from EEG and peripheral physiological signals
Multimodal emotion recognition from physiological signals is receiving an increasing amount of attention due to the impossibility to control them at will unlike behavioral reactions, thus providing more reliable information. Existing deep learning-based methods still rely on extracted handcrafted features, not taking full advantage of the learning ability of neural networks, and often adopt a single-modality approach, while human emotions are inherently expressed in a multimodal way. In this paper, we propose a hypercomplex multimodal network equipped with a novel fusion module comprising parameterized hypercomplex multiplications. Indeed, by operating in a hypercomplex domain the operations follow algebraic rules which allow to model latent relations among learned feature dimensions for a more effective fusion step. We perform classification of valence and arousal from electroencephalogram (EEG) and peripheral physiological signals, employing the publicly available database MAHNOB-HCI surpassing a multimodal state-of-the-art network. The code of our work is freely available at https://github.com/ispamm/MHyEEG
Compressing deep-quaternion neural networks with targeted regularisation
In recent years, hyper-complex deep networks (such as complex-valued and quaternion-valued neural networks - QVNNs) have received a renewed interest in the literature. They find applications in multiple fields, ranging from image reconstruction to 3D audio processing. Similar to their real-valued counterparts, quaternion neural networks require custom regularisation strategies to avoid overfitting. In addition, for many real-world applications and embedded implementations, there is the need of designing sufficiently compact networks, with few weights and neurons. However, the problem of regularising and/or sparsifying QVNNs has not been properly addressed in the literature as of now. In this study, the authors show how to address both problems by designing targeted regularisation strategies, which can minimise the number of connections and neurons of the network during training. To this end, they investigate two extensions of l1and structured regularisations to the quaternion domain. In the authors' experimental evaluation, they show that these tailored strategies significantly outperform classical (realvalued) regularisation approaches, resulting in small networks especially suitable for low-power and real-time applications
PHYDI. Initializing parameterized hypercomplex neural networks as identity functions
Neural models based on hypercomplex algebra systems are growing and prolificating for a plethora of applications, ranging from computer vision to natural language processing. Hand in hand with their adoption, parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergence at a large scale. In this paper, we study PHNNs convergence and propose parameterized hypercomplex identity initialization (PHYDI), a method to improve their convergence at different scales, leading to more robust performance when the number of layers scales up, while also reaching the same performance with fewer iterations. We show the effectiveness of this approach in different benchmarks and with common PHNNs with ResNets- and Transformer-based architecture. The code is available at https://github.com/ispamm/PHYDI
Dual quaternion rotational and translational equivariance in 3D rigid motion modelling
Objects' rigid motions in 3D space are described by rotations and translations of a highly-correlated set of points, each with associated x, y, z coordinates that real-valued networks consider as separate entities, losing information. Previous works exploit quaternion algebra and their ability to model rotations in 3D space. However, these algebras do not properly encode translations, leading to sub-optimal performance in 3D learning tasks. To overcome these limitations, we employ a dual quaternion representation of rigid motions in the 3D space that jointly describes rotations and translations of point sets, processing each of the points as a single entity. Our approach is translation and rotation equivariant, so it does not suffer from shifts in the data and better learns object trajectories, as we validate in the experimental evaluations. Models endowed with this formulation outperform previous approaches in a human pose forecasting application, attesting to the effectiveness of the proposed dual quaternion formulation for rigid motions in 3D space
Visual odometry with depth-wise separable convolution and quaternion neural networks
Monocular visual odometry is a fundamental problem in computer vision and it was extensively studied in literature. The vast majority of visual odometry algorithms are based on a standard pipeline consisting in feature detection, feature matching, motion estimation and local optimization. Only recently, deep learning approaches have shown cutting-edge performance, replacing the standard pipeline with an end-to-end solution. One of the main advantages of deep learning approaches over the standard methods is the reduced inference time, that is an important requirement for the application of visual odometry in real-time. Less emphasis, however, has been placed on memory requirements and training efficiency. The memory footprint, in particular, is important for real world applications such as robot navigation or autonomous driving, where the devices have limited memory resources. In this paper we tackle both aspects introducing novel architectures based on Depth-Wise Separable Convolutional Neural Network and deep Quaternion Recurrent Convolutional Neural Network. In particular, we obtain equal or better accuracy with respect to the other state-of-the-art methods on the KITTI VO dataset with a reduction of the number of parameters and a speed-up in the inference time
L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality
The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. We support the dataset download and the use of the baseline models via extensive instructions provided on the official GitHub repository at https://github.com/l3das/L3DAS23. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge. For more comprehensive information and in-depth details about the challenge, we invite the reader to visit the L3DAS Project website at http://www.l3das.com/icassp2023
Semantic Communications Based on Adaptive Generative Models and Information Bottleneck
Semantic communications represent a significant breakthrough with respect to
the current communication paradigm, as they focus on recovering the meaning
behind the transmitted sequence of symbols, rather than the symbols themselves.
In semantic communications, the scope of the destination is not to recover a
list of symbols symbolically identical to the transmitted ones, but rather to
recover a message that is semantically equivalent to the semantic message
emitted by the source. This paradigm shift introduces many degrees of freedom
to the encoding and decoding rules that can be exploited to make the design of
communication systems much more efficient. In this paper, we present an
approach to semantic communication building on three fundamental ideas: 1)
represent data over a topological space as a formal way to capture semantics,
as expressed through relations; 2) use the information bottleneck principle as
a way to identify relevant information and adapt the information bottleneck
online, as a function of the wireless channel state, in order to strike an
optimal trade-off between transmit power, reconstruction accuracy and delay; 3)
exploit probabilistic generative models as a general tool to adapt the
transmission rate to the wireless channel state and make possible the
regeneration of the transmitted images or run classification tasks at the
receiver side.Comment: To appear on IEEE Communications Magazine, special issue on Semantic
Communications: Transmission beyond Shannon, 202
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
Sound design involves creatively selecting, recording, and editing sound
effects for various media like cinema, video games, and virtual/augmented
reality. One of the most time-consuming steps when designing sound is
synchronizing audio with video. In some cases, environmental recordings from
video shoots are available, which can aid in the process. However, in video
games and animations, no reference audio exists, requiring manual annotation of
event timings from the video. We propose a system to extract repetitive actions
onsets from a video, which are then used - in conjunction with audio or textual
embeddings - to condition a diffusion model trained to generate a new
synchronized sound effects audio track. In this way, we leave complete creative
control to the sound designer while removing the burden of synchronization with
video. Furthermore, editing the onset track or changing the conditioning
embedding requires much less effort than editing the audio track itself,
simplifying the sonification process. We provide sound examples, source code,
and pretrained models to faciliate reproducibilit
Group sparse regularization for deep neural networks
In this paper, we address the challenging task of simultaneously optimizing (i) the weights of a neural network, (ii) the number of neurons for each hidden layer, and (iii) the subset of active input features (i.e., feature selection). While these problems are traditionally dealt with separately, we propose an efficient regularized formulation enabling their simultaneous parallel execution, using standard optimization routines. Specifically, we extend the group Lasso penalty, originally proposed in the linear regression literature, to impose group-level sparsity on the network’s connections, where each group is defined as the set of outgoing weights from a unit. Depending on the specific case, the weights can be related to an input variable, to a hidden neuron, or to a bias unit, thus performing simultaneously all the aforementioned tasks in order to obtain a compact network. We carry out an extensive experimental evaluation, in comparison with classical weight decay and Lasso penalties, both on a toy dataset for handwritten digit recognition, and multiple realistic mid-scale classification benchmarks. Comparative results demonstrate the potential of our proposed sparse group Lasso penalty in producing extremely compact networks, with a significantly lower number of input features, with a classification accuracy which is equal or only slightly inferior to standard regularization terms
