Search CORE

181 research outputs found

Physical temperature and the meaning of the q parameter in Tsallis statistics

Author: Ruthotto Eicke
Publication venue
Publication date: 17/10/2003
Field of study

We show that the function beta(E) derived from the density of states of a constant heat capacity reservoir coupled to some system of interest is not identical to the physically measurable (transitive) temperature. There are, however, connections between the two quantities as well as with the Tsallis parameter q. We exemplify these connections using the one-dimensional Ising model in the "dynamical ensemble".Comment: 12 pages, 2 figure

arXiv.org e-Print Archive

A Lagrangian Gauss-Newton-Krylov Solver for Mass- and Intensity-Preserving Diffeomorphic Image Registration

Author: Mang Andreas
Ruthotto Lars
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 12/07/2017
Field of study

We present an efficient solver for diffeomorphic image registration problems in the framework of Large Deformations Diffeomorphic Metric Mappings (LDDMM). We use an optimal control formulation, in which the velocity field of a hyperbolic PDE needs to be found such that the distance between the final state of the system (the transformed/transported template image) and the observation (the reference image) is minimized. Our solver supports both stationary and non-stationary (i.e., transient or time-dependent) velocity fields. As transformation models, we consider both the transport equation (assuming intensities are preserved during the deformation) and the continuity equation (assuming mass-preservation). We consider the reduced form of the optimal control problem and solve the resulting unconstrained optimization problem using a discretize-then-optimize approach. A key contribution is the elimination of the PDE constraint using a Lagrangian hyperbolic PDE solver. Lagrangian methods rely on the concept of characteristic curves that we approximate here using a fourth-order Runge-Kutta method. We also present an efficient algorithm for computing the derivatives of final state of the system with respect to the velocity field. This allows us to use fast Gauss-Newton based methods. We present quickly converging iterative linear solvers using spectral preconditioners that render the overall optimization efficient and scalable. Our method is embedded into the image registration framework FAIR and, thus, supports the most commonly used similarity measures and regularization functionals. We demonstrate the potential of our new approach using several synthetic and real world test problems with up to 14.7 million degrees of freedom.Comment: code available at: https://github.com/C4IR/FAIR.m/tree/master/add-ons/LagLDDM

arXiv.org e-Print Archive

Stable Architectures for Deep Neural Networks

Author: Haber Eldad
Ruthotto Lars
Publication venue: 'IOP Publishing'
Publication date: 16/02/2019
Field of study

Deep neural networks have become invaluable tools for supervised machine learning, e.g., classification of text or images. While often offering superior results over traditional techniques and successfully expressing complicated patterns in data, deep architectures are known to be challenging to design and train such that they generalize well to new data. Important issues with deep architectures are numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper we propose new forward propagation techniques inspired by systems of Ordinary Differential Equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks. The backbone of our approach is our interpretation of deep learning as a parameter estimation problem of nonlinear dynamical systems. Given this formulation, we analyze stability and well-posedness of deep learning and use this new understanding to develop new network architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness with state-of-the-art networks.Comment: 23 pages, 7 figure

arXiv.org e-Print Archive

A Multiscale Method for Model Order Reduction in PDE Parameter Estimation

Author: Fung Samy Wu
Ruthotto Lars
Publication venue: 'Elsevier BV'
Publication date: 08/12/2017
Field of study

Estimating parameters of Partial Differential Equations (PDEs) is of interest in a number of applications such as geophysical and medical imaging. Parameter estimation is commonly phrased as a PDE-constrained optimization problem that can be solved iteratively using gradient-based optimization. A computational bottleneck in such approaches is that the underlying PDEs needs to be solved numerous times before the model is reconstructed with sufficient accuracy. One way to reduce this computational burden is by using Model Order Reduction (MOR) techniques such as the Multiscale Finite Volume Method (MSFV). In this paper, we apply MSFV for solving high-dimensional parameter estimation problems. Given a finite volume discretization of the PDE on a fine mesh, the MSFV method reduces the problem size by computing a parameter-dependent projection onto a nested coarse mesh. A novelty in our work is the integration of MSFV into a PDE-constrained optimization framework, which updates the reduced space in each iteration. We also present a computationally tractable way of differentiating the MOR solution that acknowledges the change of basis. As we demonstrate in our numerical experiments, our method leads to computational savings particularly for large-scale parameter estimation problems and can benefit from parallelization.Comment: 22 pages, 4 figures, 3 table

arXiv.org e-Print Archive

Optimal Experimental Design for Constrained Inverse Problems

Author: Chung Julianne
Chung Matthias
Ruthotto Lars
Publication venue
Publication date: 15/08/2017
Field of study

In this paper, we address the challenging problem of optimal experimental design (OED) of constrained inverse problems. We consider two OED formulations that allow reducing the experimental costs by minimizing the number of measurements. The first formulation assumes a fine discretization of the design parameter space and uses sparsity promoting regularization to obtain an efficient design. The second formulation parameterizes the design and seeks optimal placement for these measurements by solving a small-dimensional optimization problem. We consider both problems in a Bayes risk as well as an empirical Bayes risk minimization framework. For the unconstrained inverse state problem, we exploit the closed form solution for the inner problem to efficiently compute derivatives for the outer OED problem. The empirical formulation does not require an explicit solution of the inverse problem and therefore allows to integrate constraints efficiently. A key contribution is an efficient optimization method for solving the resulting, typically high-dimensional, bilevel optimization problem using derivative-based methods. To overcome the lack of non-differentiability in active set methods for inequality constraints problems, we use a relaxed interior point method. To address the growing computational complexity of empirical Bayes OED, we parallelize the computation over the training models. Numerical examples and illustrations from tomographic reconstruction, for various data sets and under different constraints, demonstrate the impact of constraints on the optimal design and highlight the importance of OED for constrained problems.Comment: 19 pages, 8 figure

arXiv.org e-Print Archive

LAP: a Linearize and Project Method for Solving Inverse Problems with Coupled Variables

Author: Herring James
Nagy James
Ruthotto Lars
Publication venue
Publication date: 14/06/2018
Field of study

Many inverse problems involve two or more sets of variables that represent different physical quantities but are tightly coupled with each other. For example, image super-resolution requires joint estimation of the image and motion parameters from noisy measurements. Exploiting this structure is key for efficiently solving these large-scale optimization problems, which are often ill-conditioned. In this paper, we present a new method called Linearize And Project (LAP) that offers a flexible framework for solving inverse problems with coupled variables. LAP is most promising for cases when the subproblem corresponding to one of the variables is considerably easier to solve than the other. LAP is based on a Gauss-Newton method, and thus after linearizing the residual, it eliminates one block of variables through projection. Due to the linearization, this block can be chosen freely. Further, LAP supports direct, iterative, and hybrid regularization as well as constraints. Therefore LAP is attractive, e.g., for ill-posed imaging problems. These traits differentiate LAP from common alternatives for this type of problem such as variable projection (VarPro) and block coordinate descent (BCD). Our numerical experiments compare the performance of LAP to BCD and VarPro using three coupled problems whose forward operators are linear with respect to one block and nonlinear for the other set of variables.Comment: 21 pages, 6 figures, 3 table

arXiv.org e-Print Archive

Gauss-Newton Optimization for Phase Recovery from the Bispectrum

Author: Herring James L.
Nagy James
Ruthotto Lars
Publication venue
Publication date: 25/06/2019
Field of study

Phase recovery from the bispectrum is a central problem in speckle interferometry which can be posed as an optimization problem minimizing a weighted nonlinear least-squares objective function. We look at two different formulations of the phase recovery problem from the literature, both of which can be minimized with respect to either the recovered phase or the recovered image. Previously, strategies for solving these formulations have been limited to first-order optimization methods such as gradient descent or quasi-Newton methods. This paper explores Gauss-Newton optimization schemes for the problem of phase recovery from the bispectrum. We implement efficient Gauss-Newton optimization schemes for all the formulations. For the two of these formulations which optimize with respect to the recovered image, we also extend to projected Gauss-Newton to enforce element-wise lower and upper bounds on the pixel intensities of the recovered image. We show that our efficient Gauss-Newton schemes result in better image reconstructions with no or limited additional computational cost compared to previously implemented first-order optimization schemes for phase recovery from the bispectrum. MATLAB implementations of all methods and simulations are made publicly available in the BiBox repository on Github.Comment: 13 pages, 4 figures, 2 table

arXiv.org e-Print Archive

LeanResNet: A Low-cost Yet Effective Convolutional Residual Networks

Author: Ephrath Jonathan
Haber Eldad
Ruthotto Lars
Treister Eran
Publication venue
Publication date: 16/05/2019
Field of study

Convolutional Neural Networks (CNNs) filter the input data using spatial convolution operators with compact stencils. Commonly, the convolution operators couple features from all channels, which leads to immense computational cost in the training of and prediction with CNNs. To improve the efficiency of CNNs, we introduce lean convolution operators that reduce the number of parameters and computational complexity, and can be used in a wide range of existing CNNs. Here, we exemplify their use in residual networks (ResNets), which have been very reliable for a few years now and analyzed intensively. In our experiments on three image classification problems, the proposed LeanResNet yields results that are comparable to other recently proposed reduced architectures using similar number of parameters

arXiv.org e-Print Archive

Learning across scales - A multiscale method for Convolution Neural Networks

Author: Haber Eldad
Holtham Elliot
Jun Seong-Hwan
Ruthotto Lars
Publication venue
Publication date: 22/06/2017
Field of study

In this work we establish the relation between optimal control and training deep Convolution Neural Networks (CNNs). We show that the forward propagation in CNNs can be interpreted as a time-dependent nonlinear differential equation and learning as controlling the parameters of the differential equation such that the network approximates the data-label relation for given training data. Using this continuous interpretation we derive two new methods to scale CNNs with respect to two different dimensions. The first class of multiscale methods connects low-resolution and high-resolution data through prolongation and restriction of CNN parameters. We demonstrate that this enables classifying high-resolution images using CNNs trained with low-resolution images and vice versa and warm-starting the learning process. The second class of multiscale methods connects shallow and deep networks and leads to new training strategies that gradually increase the depths of the CNN while re-using parameters for initializations

arXiv.org e-Print Archive

ADMM-SOFTMAX : An ADMM Approach for Multinomial Logistic Regression

Author: Fung Samy Wu
Haber Eldad
Ruthotto Lars
Tyrväinen Sanna
Publication venue
Publication date: 11/07/2019
Field of study

We present ADMM-Softmax, an alternating direction method of multipliers (ADMM) for solving multinomial logistic regression (MLR) problems. Our method is geared toward supervised classification tasks with many examples and features. It decouples the nonlinear optimization problem in MLR into three steps that can be solved efficiently. In particular, each iteration of ADMM-Softmax consists of a linear least-squares problem, a set of independent small-scale smooth, convex problems, and a trivial dual variable update. Solution of the least-squares problem can be be accelerated by pre-computing a factorization or preconditioner, and the separability in the smooth, convex problem can be easily parallelized across examples. For two image classification problems, we demonstrate that ADMM-Softmax leads to improved generalization compared to a Newton-Krylov, a quasi Newton, and a stochastic gradient descent method

arXiv.org e-Print Archive