650 research outputs found
A Dilated Inception Network for Visual Saliency Prediction
Recently, with the advent of deep convolutional neural networks (DCNN), the
improvements in visual saliency prediction research are impressive. One
possible direction to approach the next improvement is to fully characterize
the multi-scale saliency-influential factors with a computationally-friendly
module in DCNN architectures. In this work, we proposed an end-to-end dilated
inception network (DINet) for visual saliency prediction. It captures
multi-scale contextual features effectively with very limited extra parameters.
Instead of utilizing parallel standard convolutions with different kernel sizes
as the existing inception module, our proposed dilated inception module (DIM)
uses parallel dilated convolutions with different dilation rates which can
significantly reduce the computation load while enriching the diversity of
receptive fields in feature maps. Moreover, the performance of our saliency
model is further improved by using a set of linear normalization-based
probability distribution distance metrics as loss functions. As such, we can
formulate saliency prediction as a probability distribution prediction task for
global saliency inference instead of a typical pixel-wise regression problem.
Experimental results on several challenging saliency benchmark datasets
demonstrate that our DINet with proposed loss functions can achieve
state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are
available at https://github.com/ysyscool/DINe
Towards Robust Curve Text Detection with Conditional Spatial Expansion
It is challenging to detect curve texts due to their irregular shapes and
varying sizes. In this paper, we first investigate the deficiency of the
existing curve detection methods and then propose a novel Conditional Spatial
Expansion (CSE) mechanism to improve the performance of curve text detection.
Instead of regarding the curve text detection as a polygon regression or a
segmentation problem, we treat it as a region expansion process. Our CSE starts
with a seed arbitrarily initialized within a text region and progressively
merges neighborhood regions based on the extracted local features by a CNN and
contextual information of merged regions. The CSE is highly parameterized and
can be seamlessly integrated into existing object detection frameworks.
Enhanced by the data-dependent CSE mechanism, our curve text detection system
provides robust instance-level text region extraction with minimal
post-processing. The analysis experiment shows that our CSE can handle texts
with various shapes, sizes, and orientations, and can effectively suppress the
false-positives coming from text-like textures or unexpected texts included in
the same RoI. Compared with the existing curve text detection algorithms, our
method is more robust and enjoys a simpler processing flow. It also creates a
new state-of-art performance on curve text benchmarks with F-score of up to
78.4.Comment: This paper has been accepted by IEEE International Conference on
Computer Vision and Pattern Recognition (CVPR 2019
Do Personality and Culture Influence Perceived Video Quality and Enjoyment?
The interplay between system, context and human factors is important in perception of multimedia quality. However, studies on human factors are very limited in comparison to those for system and context factors. This article presents an attempt to explore the influence of personality and cultural traits on perception of multimedia quality. As a first step, a database consisting of 144 video sequences from 12 short movie excerpts has been assembled and rated by 114 participants from a cross-cultural population. Thereby providing a useful ground-truth for this (as well as future) study. As a second step, three statistical models are compared: (i) a baseline model to only consider system factors; (ii) an extended model to include personality and culture; and (iii) an optimistic model in which each participant is modeled. As a third step, predictive models based on content, affect, system, and human factors are trained to generalize the statistical findings. As shown by statistical analysis, personality and cultural traits represent 9.3% of the variance attributable to human factors and human factors overall predict an equal or higher proportion of variance compared to system factors. Moreover, the quality-enjoyment correlation varies across the excerpts. Predictive models trained by including human factors demonstrate about 3% and 9% improvement over models trained solely based on system factors for predicting perceived quality and enjoyment. As evidenced by this, human factors indeed are important in perceptual multimedia quality, but the results suggest further investigation of moderation effects and a broader range of human factors is necessary
An Iterative Co-Saliency Framework for RGBD Images
As a newly emerging and significant topic in computer vision community,
co-saliency detection aims at discovering the common salient objects in
multiple related images. The existing methods often generate the co-saliency
map through a direct forward pipeline which is based on the designed cues or
initialization, but lack the refinement-cycle scheme. Moreover, they mainly
focus on RGB image and ignore the depth information for RGBD images. In this
paper, we propose an iterative RGBD co-saliency framework, which utilizes the
existing single saliency maps as the initialization, and generates the final
RGBD cosaliency map by using a refinement-cycle model. Three schemes are
employed in the proposed RGBD co-saliency framework, which include the addition
scheme, deletion scheme, and iteration scheme. The addition scheme is used to
highlight the salient regions based on intra-image depth propagation and
saliency propagation, while the deletion scheme filters the saliency regions
and removes the non-common salient regions based on interimage constraint. The
iteration scheme is proposed to obtain more homogeneous and consistent
co-saliency map. Furthermore, a novel descriptor, named depth shape prior, is
proposed in the addition scheme to introduce the depth information to enhance
identification of co-salient objects. The proposed method can effectively
exploit any existing 2D saliency model to work well in RGBD co-saliency
scenarios. The experiments on two RGBD cosaliency datasets demonstrate the
effectiveness of our proposed framework.Comment: 13 pages, 13 figures, Accepted by IEEE Transactions on Cybernetics
2017. Project URL: https://rmcong.github.io/proj_RGBD_cosal_tcyb.htm
Saliency detection for stereoscopic images
International audienceSaliency detection techniques have been widely used in various 2D multimedia processing applications. Currently, the emerging applications of stereoscopic display require new saliency detection models for stereoscopic images. Different from saliency detection for 2D images, depth features have to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth. Four types of features including color, luminance, texture, and depth are extracted from DC-T coefficients to represent the energy for image patches. A Gaussian model of the spatial distance between image patches is adopted for the consideration of local and global contrast calculation. A new fusion method is designed to combine the feature maps for computing the final saliency map for stereoscopic images. Experimental results on a recent eye tracking database show the superior performance of the proposed method over other existing ones in saliency estimation for 3D images
The CP-QAE-I: A Video Dataset for Exploring the Effects of Personality and Culture on Perceived Quality and Affect in Multimedia
Perception of quality and affect are subjective, driven by a complex interplay between system and human factors. Is it, however, possible to model these factors to predict subjective perception? To pursue this question, broader collaboration is needed to sample all aspects of personality, culture, and other human factors. Thus, an appropriate dataset is needed to integrate such efforts. Here, the CP-QAE-I is proposed. This is a video dataset containing 144 video sequences based on 12 short movie clips. These vary by: frame rate; frame dimension; bit-rate; and affect. An evaluation by 76 participants drawn from the United Kingdom, Singapore, India, and China suggests adequate distinction between the video sequences in terms of perceived quality as well as positive and negative affect. Nationality also emerged as a significant predictor, supporting the rationale for further study. By sharing the dataset, this paper aims to promote work modeling human factors in multimedia perception
One symbol blind synchronization in SIMO molecular communication systems
Molecular communication offers new possibilities in the micro-and nano-scale application environments. Similar to other communication paradigms, molecular communication also requires clock synchronization between the transmitter and the receiver nanomachine in many time-and control-sensitive applications. This letter presents a novel high-efficiency blind clock synchronization mechanism. Without knowing the channel parameters of the diffusion coefficient and the transmitter-receiver distance, the receiver only requires one symbol to achieve synchronization. The samples are used to estimate the propagation delay by least square method and achieve clock synchronization. Single-input multiple-output (SIMO) diversity design is then proposed to mitigate channel noise and therefore to improve the synchronization accuracy. The simulation results show that the proposed clock synchronization mechanism has a good performance and may help chronopharmaceutical drug delivery applications
- …
