181 research outputs found
Physical temperature and the meaning of the q parameter in Tsallis statistics
We show that the function beta(E) derived from the density of states of a
constant heat capacity reservoir coupled to some system of interest is not
identical to the physically measurable (transitive) temperature. There are,
however, connections between the two quantities as well as with the Tsallis
parameter q. We exemplify these connections using the one-dimensional Ising
model in the "dynamical ensemble".Comment: 12 pages, 2 figure
A Lagrangian Gauss-Newton-Krylov Solver for Mass- and Intensity-Preserving Diffeomorphic Image Registration
We present an efficient solver for diffeomorphic image registration problems
in the framework of Large Deformations Diffeomorphic Metric Mappings (LDDMM).
We use an optimal control formulation, in which the velocity field of a
hyperbolic PDE needs to be found such that the distance between the final state
of the system (the transformed/transported template image) and the observation
(the reference image) is minimized. Our solver supports both stationary and
non-stationary (i.e., transient or time-dependent) velocity fields. As
transformation models, we consider both the transport equation (assuming
intensities are preserved during the deformation) and the continuity equation
(assuming mass-preservation).
We consider the reduced form of the optimal control problem and solve the
resulting unconstrained optimization problem using a discretize-then-optimize
approach. A key contribution is the elimination of the PDE constraint using a
Lagrangian hyperbolic PDE solver. Lagrangian methods rely on the concept of
characteristic curves that we approximate here using a fourth-order Runge-Kutta
method. We also present an efficient algorithm for computing the derivatives of
final state of the system with respect to the velocity field. This allows us to
use fast Gauss-Newton based methods. We present quickly converging iterative
linear solvers using spectral preconditioners that render the overall
optimization efficient and scalable. Our method is embedded into the image
registration framework FAIR and, thus, supports the most commonly used
similarity measures and regularization functionals. We demonstrate the
potential of our new approach using several synthetic and real world test
problems with up to 14.7 million degrees of freedom.Comment: code available at:
https://github.com/C4IR/FAIR.m/tree/master/add-ons/LagLDDM
Stable Architectures for Deep Neural Networks
Deep neural networks have become invaluable tools for supervised machine
learning, e.g., classification of text or images. While often offering superior
results over traditional techniques and successfully expressing complicated
patterns in data, deep architectures are known to be challenging to design and
train such that they generalize well to new data. Important issues with deep
architectures are numerical instabilities in derivative-based learning
algorithms commonly called exploding or vanishing gradients. In this paper we
propose new forward propagation techniques inspired by systems of Ordinary
Differential Equations (ODE) that overcome this challenge and lead to
well-posed learning problems for arbitrarily deep networks.
The backbone of our approach is our interpretation of deep learning as a
parameter estimation problem of nonlinear dynamical systems. Given this
formulation, we analyze stability and well-posedness of deep learning and use
this new understanding to develop new network architectures. We relate the
exploding and vanishing gradient phenomenon to the stability of the discrete
ODE and present several strategies for stabilizing deep learning for very deep
networks. While our new architectures restrict the solution space, several
numerical experiments show their competitiveness with state-of-the-art
networks.Comment: 23 pages, 7 figure
A Multiscale Method for Model Order Reduction in PDE Parameter Estimation
Estimating parameters of Partial Differential Equations (PDEs) is of interest
in a number of applications such as geophysical and medical imaging. Parameter
estimation is commonly phrased as a PDE-constrained optimization problem that
can be solved iteratively using gradient-based optimization. A computational
bottleneck in such approaches is that the underlying PDEs needs to be solved
numerous times before the model is reconstructed with sufficient accuracy. One
way to reduce this computational burden is by using Model Order Reduction (MOR)
techniques such as the Multiscale Finite Volume Method (MSFV).
In this paper, we apply MSFV for solving high-dimensional parameter
estimation problems. Given a finite volume discretization of the PDE on a fine
mesh, the MSFV method reduces the problem size by computing a
parameter-dependent projection onto a nested coarse mesh. A novelty in our work
is the integration of MSFV into a PDE-constrained optimization framework, which
updates the reduced space in each iteration. We also present a computationally
tractable way of differentiating the MOR solution that acknowledges the change
of basis. As we demonstrate in our numerical experiments, our method leads to
computational savings particularly for large-scale parameter estimation
problems and can benefit from parallelization.Comment: 22 pages, 4 figures, 3 table
Optimal Experimental Design for Constrained Inverse Problems
In this paper, we address the challenging problem of optimal experimental
design (OED) of constrained inverse problems. We consider two OED formulations
that allow reducing the experimental costs by minimizing the number of
measurements. The first formulation assumes a fine discretization of the design
parameter space and uses sparsity promoting regularization to obtain an
efficient design. The second formulation parameterizes the design and seeks
optimal placement for these measurements by solving a small-dimensional
optimization problem. We consider both problems in a Bayes risk as well as an
empirical Bayes risk minimization framework. For the unconstrained inverse
state problem, we exploit the closed form solution for the inner problem to
efficiently compute derivatives for the outer OED problem. The empirical
formulation does not require an explicit solution of the inverse problem and
therefore allows to integrate constraints efficiently. A key contribution is an
efficient optimization method for solving the resulting, typically
high-dimensional, bilevel optimization problem using derivative-based methods.
To overcome the lack of non-differentiability in active set methods for
inequality constraints problems, we use a relaxed interior point method. To
address the growing computational complexity of empirical Bayes OED, we
parallelize the computation over the training models. Numerical examples and
illustrations from tomographic reconstruction, for various data sets and under
different constraints, demonstrate the impact of constraints on the optimal
design and highlight the importance of OED for constrained problems.Comment: 19 pages, 8 figure
LAP: a Linearize and Project Method for Solving Inverse Problems with Coupled Variables
Many inverse problems involve two or more sets of variables that represent
different physical quantities but are tightly coupled with each other. For
example, image super-resolution requires joint estimation of the image and
motion parameters from noisy measurements. Exploiting this structure is key for
efficiently solving these large-scale optimization problems, which are often
ill-conditioned.
In this paper, we present a new method called Linearize And Project (LAP)
that offers a flexible framework for solving inverse problems with coupled
variables. LAP is most promising for cases when the subproblem corresponding to
one of the variables is considerably easier to solve than the other. LAP is
based on a Gauss-Newton method, and thus after linearizing the residual, it
eliminates one block of variables through projection. Due to the linearization,
this block can be chosen freely. Further, LAP supports direct, iterative, and
hybrid regularization as well as constraints. Therefore LAP is attractive,
e.g., for ill-posed imaging problems. These traits differentiate LAP from
common alternatives for this type of problem such as variable projection
(VarPro) and block coordinate descent (BCD). Our numerical experiments compare
the performance of LAP to BCD and VarPro using three coupled problems whose
forward operators are linear with respect to one block and nonlinear for the
other set of variables.Comment: 21 pages, 6 figures, 3 table
Gauss-Newton Optimization for Phase Recovery from the Bispectrum
Phase recovery from the bispectrum is a central problem in speckle
interferometry which can be posed as an optimization problem minimizing a
weighted nonlinear least-squares objective function. We look at two different
formulations of the phase recovery problem from the literature, both of which
can be minimized with respect to either the recovered phase or the recovered
image. Previously, strategies for solving these formulations have been limited
to first-order optimization methods such as gradient descent or quasi-Newton
methods. This paper explores Gauss-Newton optimization schemes for the problem
of phase recovery from the bispectrum. We implement efficient Gauss-Newton
optimization schemes for all the formulations. For the two of these
formulations which optimize with respect to the recovered image, we also extend
to projected Gauss-Newton to enforce element-wise lower and upper bounds on the
pixel intensities of the recovered image. We show that our efficient
Gauss-Newton schemes result in better image reconstructions with no or limited
additional computational cost compared to previously implemented first-order
optimization schemes for phase recovery from the bispectrum. MATLAB
implementations of all methods and simulations are made publicly available in
the BiBox repository on Github.Comment: 13 pages, 4 figures, 2 table
LeanResNet: A Low-cost Yet Effective Convolutional Residual Networks
Convolutional Neural Networks (CNNs) filter the input data using spatial
convolution operators with compact stencils. Commonly, the convolution
operators couple features from all channels, which leads to immense
computational cost in the training of and prediction with CNNs. To improve the
efficiency of CNNs, we introduce lean convolution operators that reduce the
number of parameters and computational complexity, and can be used in a wide
range of existing CNNs. Here, we exemplify their use in residual networks
(ResNets), which have been very reliable for a few years now and analyzed
intensively. In our experiments on three image classification problems, the
proposed LeanResNet yields results that are comparable to other recently
proposed reduced architectures using similar number of parameters
Learning across scales - A multiscale method for Convolution Neural Networks
In this work we establish the relation between optimal control and training
deep Convolution Neural Networks (CNNs). We show that the forward propagation
in CNNs can be interpreted as a time-dependent nonlinear differential equation
and learning as controlling the parameters of the differential equation such
that the network approximates the data-label relation for given training data.
Using this continuous interpretation we derive two new methods to scale CNNs
with respect to two different dimensions. The first class of multiscale methods
connects low-resolution and high-resolution data through prolongation and
restriction of CNN parameters. We demonstrate that this enables classifying
high-resolution images using CNNs trained with low-resolution images and vice
versa and warm-starting the learning process. The second class of multiscale
methods connects shallow and deep networks and leads to new training strategies
that gradually increase the depths of the CNN while re-using parameters for
initializations
ADMM-SOFTMAX : An ADMM Approach for Multinomial Logistic Regression
We present ADMM-Softmax, an alternating direction method of multipliers
(ADMM) for solving multinomial logistic regression (MLR) problems. Our method
is geared toward supervised classification tasks with many examples and
features. It decouples the nonlinear optimization problem in MLR into three
steps that can be solved efficiently. In particular, each iteration of
ADMM-Softmax consists of a linear least-squares problem, a set of independent
small-scale smooth, convex problems, and a trivial dual variable update.
Solution of the least-squares problem can be be accelerated by pre-computing a
factorization or preconditioner, and the separability in the smooth, convex
problem can be easily parallelized across examples. For two image
classification problems, we demonstrate that ADMM-Softmax leads to improved
generalization compared to a Newton-Krylov, a quasi Newton, and a stochastic
gradient descent method
- …
