724 research outputs found
Deep SimNets
We present a deep layered architecture that generalizes convolutional neural
networks (ConvNets). The architecture, called SimNets, is driven by two
operators: (i) a similarity function that generalizes inner-product, and (ii) a
log-mean-exp function called MEX that generalizes maximum and average. The two
operators applied in succession give rise to a standard neuron but in "feature
space". The feature spaces realized by SimNets depend on the choice of the
similarity operator. The simplest setting, which corresponds to a convolution,
realizes the feature space of the Exponential kernel, while other settings
realize feature spaces of more powerful kernels (Generalized Gaussian, which
includes as special cases RBF and Laplacian), or even dynamically learned
feature spaces (Generalized Multiple Kernel Learning). As a result, the SimNet
contains a higher abstraction level compared to a traditional ConvNet. We argue
that enhanced expressiveness is important when the networks are small due to
run-time constraints (such as those imposed by mobile applications). Empirical
evaluation validates the superior expressiveness of SimNets, showing a
significant gain in accuracy over ConvNets when computational resources at
run-time are limited. We also show that in large-scale settings, where
computational complexity is less of a concern, the additional capacity of
SimNets can be controlled with proper regularization, yielding accuracies
comparable to state of the art ConvNets
Example Based Image Analysis and Synthesis
Image analysis and graphics synthesis can be achieved with learning techniques using directly image examples without physically-based, 3D models. In our technique: -- the mapping from novel images to a vector of "pose" and "expression" parameters can be learned from a small set of example images using a function approximation technique that we call an analysis network; -- the inverse mapping from input "pose" and "expression" parameters to output images can be synthesized from a small set of example images and used to produce new images using a similar synthesis network. The techniques described here have several applications in computer graphics, special effects, interactive multimedia and very low bandwidth teleconferencing
Algebraic Functions For Recognition
In the general case, a trilinear relationship between three perspective views is shown to exist. The trilinearity result is shown to be of much practical use in visual recognition by alignment --- yielding a direct method that cuts through the computations of camera transformation, scene structure and epipolar geometry. The proof of the central result may be of further interest as it demonstrates certain regularities across homographies of the plane and introduces new view invariants. Experiments on simulated and real image data were conducted, including a comparative analysis with epipolar intersection and the linear combination methods, with results indicating a greater degree of robustness in practice and a higher level of performance in re-projection tasks
- …
