2,236 research outputs found
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression
This paper addresses the problem of localizing audio sources using binaural
measurements. We propose a supervised formulation that simultaneously localizes
multiple sources at different locations. The approach is intrinsically
efficient because, contrary to prior work, it relies neither on source
separation, nor on monaural segregation. The method starts with a training
stage that establishes a locally-linear Gaussian regression model between the
directional coordinates of all the sources and the auditory features extracted
from binaural measurements. While fixed-length wide-spectrum sounds (white
noise) are used for training to reliably estimate the model parameters, we show
that the testing (localization) can be extended to variable-length
sparse-spectrum sounds (such as speech), thus enabling a wide range of
realistic applications. Indeed, we demonstrate that the method can be used for
audio-visual fusion, namely to map speech signals onto images and hence to
spatially align the audio and visual modalities, thus enabling to discriminate
between speaking and non-speaking faces. We release a novel corpus of real-room
recordings that allow quantitative evaluation of the co-localization method in
the presence of one or two sound sources. Experiments demonstrate increased
accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure
Full- & Reduced-Order State-Space Modeling of Wind Turbine Systems with Permanent-Magnet Synchronous Generator
Wind energy is an integral part of nowadays energy supply and one of the
fastest growing sources of electricity in the world today. Accurate models for
wind energy conversion systems (WECSs) are of key interest for the analysis and
control design of present and future energy systems. Existing control-oriented
WECSs models are subject to unstructured simplifications, which have not been
discussed in literature so far. Thus, this technical note presents are thorough
derivation of a physical state-space model for permanent magnet synchronous
generator WECSs. The physical model considers all dynamic effects that
significantly influence the system's power output, including the switching of
the power electronics. Alternatively, the model is formulated in the -
and -reference frame. Secondly, a complete control and operation
management system for the wind regimes II and III and the transition between
the regimes is presented. The control takes practical effects such as input
saturation and integral windup into account. Thirdly, by a structured model
reduction procedure, two state-space models of WECS with reduced complexity are
derived: a non-switching model and a non-switching reduced-order model. The
validity of the models is illustrated and compared through a numerical
simulation study.Comment: 23 pages, 11 figure
Separating Reflection and Transmission Images in the Wild
The reflections caused by common semi-reflectors, such as glass windows, can
impact the performance of computer vision algorithms. State-of-the-art methods
can remove reflections on synthetic data and in controlled scenarios. However,
they are based on strong assumptions and do not generalize well to real-world
images. Contrary to a common misconception, real-world images are challenging
even when polarization information is used. We present a deep learning approach
to separate the reflected and the transmitted components of the recorded
irradiance, which explicitly uses the polarization properties of light. To
train it, we introduce an accurate synthetic data generation pipeline, which
simulates realistic reflections, including those generated by curved and
non-ideal surfaces, non-static scenes, and high-dynamic-range scenes.Comment: accepted at ECCV 201
Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs
Human visual system relies on both binocular stereo cues and monocular
focusness cues to gain effective 3D perception. In computer vision, the two
problems are traditionally solved in separate tracks. In this paper, we present
a unified learning-based technique that simultaneously uses both types of cues
for depth inference. Specifically, we use a pair of focal stacks as input to
emulate human perception. We first construct a comprehensive focal stack
training dataset synthesized by depth-guided light field rendering. We then
construct three individual networks: a Focus-Net to extract depth from a single
focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from
the focal stack, and a Stereo-Net to conduct stereo matching. We show how to
integrate them into a unified BDfF-Net to obtain high-quality depth maps.
Comprehensive experiments show that our approach outperforms the
state-of-the-art in both accuracy and speed and effectively emulates human
vision systems
- …
