349 research outputs found
Sound Event Detection in Synthetic Audio: Analysis of the DCASE 2016 Task Results
As part of the 2016 public evaluation challenge on Detection and
Classification of Acoustic Scenes and Events (DCASE 2016), the second task
focused on evaluating sound event detection systems using synthetic mixtures of
office sounds. This task, which follows the `Event Detection - Office
Synthetic' task of DCASE 2013, studies the behaviour of tested algorithms when
facing controlled levels of audio complexity with respect to background noise
and polyphony/density, with the added benefit of a very accurate ground truth.
This paper presents the task formulation, evaluation metrics, submitted
systems, and provides a statistical analysis of the results achieved, with
respect to various aspects of the evaluation dataset
The bag-of-frames approach: a not so sufficient model for urban soundscapes
The "bag-of-frames" approach (BOF), which encodes audio signals as the
long-term statistical distribution of short-term spectral features, is commonly
regarded as an effective and sufficient way to represent environmental sound
recordings (soundscapes) since its introduction in an influential 2007 article.
The present paper describes a concep-tual replication of this seminal article
using several new soundscape datasets, with results strongly questioning the
adequacy of the BOF approach for the task. We show that the good accuracy
originally re-ported with BOF likely result from a particularly thankful
dataset with low within-class variability, and that for more realistic
datasets, BOF in fact does not perform significantly better than a mere
one-point av-erage of the signal's features. Soundscape modeling, therefore,
may not be the closed case it was once thought to be. Progress, we ar-gue,
could lie in reconsidering the problem of considering individual acoustical
events within each soundscape
On the visual display of audio data using stacked graphs
Visualisation is an important tool for many steps of a research project. In this paper, we present several displays of audio data based on stacked graphs. Thanks to a careful use of the layering the proposed displays concisely convey a large amount of information. Many flavours are presented, each useful for a specific type of data, from spectral and chromatic data to multi-source and multi channel data. We shall demonstrate that such displays for the case of spectral and chromatic data offer a different compromise than the traditional spectrogram and chroma gram, emphasizing timing information over frequency
A novel interface for audio based sound data mining
In this paper, the design of a web interface for audio-based sound data mining is studied. The interface allows the user to explore a sound dataset without any written textual hint. Dataset sounds are grouped into semantic classes which are themselves clustered to build a semantic hierarchical struc-ture. Each class is represented by a circle distributed on a two dimensional space according to its semantic level. Sev-eral means of displaying sounds following this template are presented and evaluated with a crowdsourcing experiment
Large-scale feature selection with Gaussian mixture models for the classification of high dimensional remote sensing images
A large-scale feature selection wrapper is discussed for the classification of high dimensional remote sensing. An efficient implementation is proposed based on intrinsic properties of Gaussian mixtures models and block matrix. The criterion function is split into two parts:one that is updated to test each feature and one that needs to be updated only once per feature selection. This split saved a lot of computation for each test. The algorithm is implemented in C++ and integrated into the Orfeo Toolbox. It has been compared to other classification algorithms on two high dimension remote sensing images. Results show that the approach provides good classification accuracies with low computation time
On the visual display of audio data using stacked graphs
Visualisation is an important tool for many steps of a research project. In this paper, we present several displays of audio data based on stacked graphs. Thanks to a careful use of the layering the proposed displays concisely convey a large amount of information. Many flavours are presented, each useful for a specific type of data, from spectral and chromatic data to multi-source and multi channel data. We shall demonstrate that such displays for the case of spectral and chromatic data offer a different compromise than the traditional spectrogram and chroma gram, emphasizing timing information over frequency
SimScene : a web-based acoustic scenes simulator
International audienceWe introduce in this paper a soundscape simulator called SimScene, designed to be used as an experimental tool to characterize the mental representation of sound environments. The soundscape simulator allows a subject to generate a full sonic environment by sequencing and mixing sound elements, and manipulating their sound level and time positioning. To make the simulation process effective, SimScene has not be designed to manipulate individual parameters of individ-ual sounds, but to specify high-level parameters for whole classes of sounds, organized into a hierarchical semantically structured dataset. To avoid any linguistic bias, a listening oriented interface allows subjects to explore the dataset with-out any text written help. The entire software is developed in Javascript using the standard Web Audio technology, and is thus fully supported by most modern web browsers. This fact should allow experimenters to adopt a crowdsourcing approach to experimentation in order to assess hypotheses on large populations, and facilitate the development of ex-perimental protocols to investigate the influence of socio-cultural background on soundscape perception
- …
