650 research outputs found

    A Dilated Inception Network for Visual Saliency Prediction

    Full text link
    Recently, with the advent of deep convolutional neural networks (DCNN), the improvements in visual saliency prediction research are impressive. One possible direction to approach the next improvement is to fully characterize the multi-scale saliency-influential factors with a computationally-friendly module in DCNN architectures. In this work, we proposed an end-to-end dilated inception network (DINet) for visual saliency prediction. It captures multi-scale contextual features effectively with very limited extra parameters. Instead of utilizing parallel standard convolutions with different kernel sizes as the existing inception module, our proposed dilated inception module (DIM) uses parallel dilated convolutions with different dilation rates which can significantly reduce the computation load while enriching the diversity of receptive fields in feature maps. Moreover, the performance of our saliency model is further improved by using a set of linear normalization-based probability distribution distance metrics as loss functions. As such, we can formulate saliency prediction as a probability distribution prediction task for global saliency inference instead of a typical pixel-wise regression problem. Experimental results on several challenging saliency benchmark datasets demonstrate that our DINet with proposed loss functions can achieve state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are available at https://github.com/ysyscool/DINe

    Towards Robust Curve Text Detection with Conditional Spatial Expansion

    Full text link
    It is challenging to detect curve texts due to their irregular shapes and varying sizes. In this paper, we first investigate the deficiency of the existing curve detection methods and then propose a novel Conditional Spatial Expansion (CSE) mechanism to improve the performance of curve text detection. Instead of regarding the curve text detection as a polygon regression or a segmentation problem, we treat it as a region expansion process. Our CSE starts with a seed arbitrarily initialized within a text region and progressively merges neighborhood regions based on the extracted local features by a CNN and contextual information of merged regions. The CSE is highly parameterized and can be seamlessly integrated into existing object detection frameworks. Enhanced by the data-dependent CSE mechanism, our curve text detection system provides robust instance-level text region extraction with minimal post-processing. The analysis experiment shows that our CSE can handle texts with various shapes, sizes, and orientations, and can effectively suppress the false-positives coming from text-like textures or unexpected texts included in the same RoI. Compared with the existing curve text detection algorithms, our method is more robust and enjoys a simpler processing flow. It also creates a new state-of-art performance on curve text benchmarks with F-score of up to 78.4%\%.Comment: This paper has been accepted by IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2019

    Do Personality and Culture Influence Perceived Video Quality and Enjoyment?

    Get PDF
    The interplay between system, context and human factors is important in perception of multimedia quality. However, studies on human factors are very limited in comparison to those for system and context factors. This article presents an attempt to explore the influence of personality and cultural traits on perception of multimedia quality. As a first step, a database consisting of 144 video sequences from 12 short movie excerpts has been assembled and rated by 114 participants from a cross-cultural population. Thereby providing a useful ground-truth for this (as well as future) study. As a second step, three statistical models are compared: (i) a baseline model to only consider system factors; (ii) an extended model to include personality and culture; and (iii) an optimistic model in which each participant is modeled. As a third step, predictive models based on content, affect, system, and human factors are trained to generalize the statistical findings. As shown by statistical analysis, personality and cultural traits represent 9.3% of the variance attributable to human factors and human factors overall predict an equal or higher proportion of variance compared to system factors. Moreover, the quality-enjoyment correlation varies across the excerpts. Predictive models trained by including human factors demonstrate about 3% and 9% improvement over models trained solely based on system factors for predicting perceived quality and enjoyment. As evidenced by this, human factors indeed are important in perceptual multimedia quality, but the results suggest further investigation of moderation effects and a broader range of human factors is necessary

    An Iterative Co-Saliency Framework for RGBD Images

    Full text link
    As a newly emerging and significant topic in computer vision community, co-saliency detection aims at discovering the common salient objects in multiple related images. The existing methods often generate the co-saliency map through a direct forward pipeline which is based on the designed cues or initialization, but lack the refinement-cycle scheme. Moreover, they mainly focus on RGB image and ignore the depth information for RGBD images. In this paper, we propose an iterative RGBD co-saliency framework, which utilizes the existing single saliency maps as the initialization, and generates the final RGBD cosaliency map by using a refinement-cycle model. Three schemes are employed in the proposed RGBD co-saliency framework, which include the addition scheme, deletion scheme, and iteration scheme. The addition scheme is used to highlight the salient regions based on intra-image depth propagation and saliency propagation, while the deletion scheme filters the saliency regions and removes the non-common salient regions based on interimage constraint. The iteration scheme is proposed to obtain more homogeneous and consistent co-saliency map. Furthermore, a novel descriptor, named depth shape prior, is proposed in the addition scheme to introduce the depth information to enhance identification of co-salient objects. The proposed method can effectively exploit any existing 2D saliency model to work well in RGBD co-saliency scenarios. The experiments on two RGBD cosaliency datasets demonstrate the effectiveness of our proposed framework.Comment: 13 pages, 13 figures, Accepted by IEEE Transactions on Cybernetics 2017. Project URL: https://rmcong.github.io/proj_RGBD_cosal_tcyb.htm

    Saliency detection for stereoscopic images

    Get PDF
    International audienceSaliency detection techniques have been widely used in various 2D multimedia processing applications. Currently, the emerging applications of stereoscopic display require new saliency detection models for stereoscopic images. Different from saliency detection for 2D images, depth features have to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth. Four types of features including color, luminance, texture, and depth are extracted from DC-T coefficients to represent the energy for image patches. A Gaussian model of the spatial distance between image patches is adopted for the consideration of local and global contrast calculation. A new fusion method is designed to combine the feature maps for computing the final saliency map for stereoscopic images. Experimental results on a recent eye tracking database show the superior performance of the proposed method over other existing ones in saliency estimation for 3D images

    The CP-QAE-I: A Video Dataset for Exploring the Effects of Personality and Culture on Perceived Quality and Affect in Multimedia

    Get PDF
    Perception of quality and affect are subjective, driven by a complex interplay between system and human factors. Is it, however, possible to model these factors to predict subjective perception? To pursue this question, broader collaboration is needed to sample all aspects of personality, culture, and other human factors. Thus, an appropriate dataset is needed to integrate such efforts. Here, the CP-QAE-I is proposed. This is a video dataset containing 144 video sequences based on 12 short movie clips. These vary by: frame rate; frame dimension; bit-rate; and affect. An evaluation by 76 participants drawn from the United Kingdom, Singapore, India, and China suggests adequate distinction between the video sequences in terms of perceived quality as well as positive and negative affect. Nationality also emerged as a significant predictor, supporting the rationale for further study. By sharing the dataset, this paper aims to promote work modeling human factors in multimedia perception

    One symbol blind synchronization in SIMO molecular communication systems

    Get PDF
    Molecular communication offers new possibilities in the micro-and nano-scale application environments. Similar to other communication paradigms, molecular communication also requires clock synchronization between the transmitter and the receiver nanomachine in many time-and control-sensitive applications. This letter presents a novel high-efficiency blind clock synchronization mechanism. Without knowing the channel parameters of the diffusion coefficient and the transmitter-receiver distance, the receiver only requires one symbol to achieve synchronization. The samples are used to estimate the propagation delay by least square method and achieve clock synchronization. Single-input multiple-output (SIMO) diversity design is then proposed to mitigate channel noise and therefore to improve the synchronization accuracy. The simulation results show that the proposed clock synchronization mechanism has a good performance and may help chronopharmaceutical drug delivery applications
    corecore