2,236 research outputs found

    Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

    Get PDF
    This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure

    Full- & Reduced-Order State-Space Modeling of Wind Turbine Systems with Permanent-Magnet Synchronous Generator

    Get PDF
    Wind energy is an integral part of nowadays energy supply and one of the fastest growing sources of electricity in the world today. Accurate models for wind energy conversion systems (WECSs) are of key interest for the analysis and control design of present and future energy systems. Existing control-oriented WECSs models are subject to unstructured simplifications, which have not been discussed in literature so far. Thus, this technical note presents are thorough derivation of a physical state-space model for permanent magnet synchronous generator WECSs. The physical model considers all dynamic effects that significantly influence the system's power output, including the switching of the power electronics. Alternatively, the model is formulated in the (a,b,c)(a,b,c)- and (d,q)(d,q)-reference frame. Secondly, a complete control and operation management system for the wind regimes II and III and the transition between the regimes is presented. The control takes practical effects such as input saturation and integral windup into account. Thirdly, by a structured model reduction procedure, two state-space models of WECS with reduced complexity are derived: a non-switching model and a non-switching reduced-order model. The validity of the models is illustrated and compared through a numerical simulation study.Comment: 23 pages, 11 figure

    Separating Reflection and Transmission Images in the Wild

    Full text link
    The reflections caused by common semi-reflectors, such as glass windows, can impact the performance of computer vision algorithms. State-of-the-art methods can remove reflections on synthetic data and in controlled scenarios. However, they are based on strong assumptions and do not generalize well to real-world images. Contrary to a common misconception, real-world images are challenging even when polarization information is used. We present a deep learning approach to separate the reflected and the transmitted components of the recorded irradiance, which explicitly uses the polarization properties of light. To train it, we introduce an accurate synthetic data generation pipeline, which simulates realistic reflections, including those generated by curved and non-ideal surfaces, non-static scenes, and high-dynamic-range scenes.Comment: accepted at ECCV 201

    Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs

    Full text link
    Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a unified learning-based technique that simultaneously uses both types of cues for depth inference. Specifically, we use a pair of focal stacks as input to emulate human perception. We first construct a comprehensive focal stack training dataset synthesized by depth-guided light field rendering. We then construct three individual networks: a Focus-Net to extract depth from a single focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from the focal stack, and a Stereo-Net to conduct stereo matching. We show how to integrate them into a unified BDfF-Net to obtain high-quality depth maps. Comprehensive experiments show that our approach outperforms the state-of-the-art in both accuracy and speed and effectively emulates human vision systems
    corecore