14 research outputs found
Why Are Deep Representations Good Perceptual Quality Features?
Recently, intermediate feature maps of pre-trained convolutional neural
networks have shown significant perceptual quality improvements, when they are
used in the loss function for training new networks. It is believed that these
features are better at encoding the perceptual quality and provide more
efficient representations of input images compared to other perceptual metrics
such as SSIM and PSNR. However, there have been no systematic studies to
determine the underlying reason. Due to the lack of such an analysis, it is not
possible to evaluate the performance of a particular set of features or to
improve the perceptual quality even more by carefully selecting a subset of
features from a pre-trained CNN. This work shows that the capabilities of
pre-trained deep CNN features in optimizing the perceptual quality are
correlated with their success in capturing basic human visual perception
characteristics. In particular, we focus our analysis on fundamental aspects of
human perception, such as the contrast sensitivity and orientation selectivity.
We introduce two new formulations to measure the frequency and orientation
selectivity of the features learned by convolutional layers for evaluating deep
features learned by widely-used deep CNNs such as VGG-16. We demonstrate that
the pre-trained CNN features which receive higher scores are better at
predicting human quality judgment. Furthermore, we show the possibility of
using our method to select deep features to form a new loss function, which
improves the image reconstruction quality for the well-known single-image
super-resolution problem.Comment: To be presented at ECCV 202
Fabrication of composite polyaniline/CNT nanofibers using an ultrasonically assisted dynamic inverse emulsion polymerization technique
Structure and properties of multi-walled carbon nanotube porous sheets with enhanced elongation
Study of interactions between single-wall carbon nanotubes and surfactant using molecular simulations
Integrating Cross-modality Hallucinated MRI with CT to Aid Mediastinal Lung Tumor Segmentation
TENet: Triple Excitation Network for video salient object detection
In this paper, we propose a simple yet effective approach, named Triple
Excitation Network, to reinforce the training of video salient object detection
(VSOD) from three aspects, spatial, temporal, and online excitations. These
excitation mechanisms are designed following the spirit of curriculum learning
and aim to reduce learning ambiguities at the beginning of training by
selectively exciting feature activations using ground truth. Then we gradually
reduce the weight of ground truth excitations by a curriculum rate and replace
it by a curriculum complementary map for better and faster convergence. In
particular, the spatial excitation strengthens feature activations for clear
object boundaries, while the temporal excitation imposes motions to emphasize
spatio-temporal salient regions. Spatial and temporal excitations can combat
the saliency shifting problem and conflict between spatial and temporal
features of VSOD. Furthermore, our semi-curriculum learning design enables the
first online refinement strategy for VSOD, which allows exciting and boosting
saliency responses during testing without re-training. The proposed triple
excitations can easily plug in different VSOD methods. Extensive experiments
show the effectiveness of all three excitation methods and the proposed method
outperforms state-of-the-art image and video salient object detection methods
