2,603 research outputs found
An inexact Newton-Krylov algorithm for constrained diffeomorphic image registration
We propose numerical algorithms for solving large deformation diffeomorphic
image registration problems. We formulate the nonrigid image registration
problem as a problem of optimal control. This leads to an infinite-dimensional
partial differential equation (PDE) constrained optimization problem.
The PDE constraint consists, in its simplest form, of a hyperbolic transport
equation for the evolution of the image intensity. The control variable is the
velocity field. Tikhonov regularization on the control ensures well-posedness.
We consider standard smoothness regularization based on - or
-seminorms. We augment this regularization scheme with a constraint on the
divergence of the velocity field rendering the deformation incompressible and
thus ensuring that the determinant of the deformation gradient is equal to one,
up to the numerical error.
We use a Fourier pseudospectral discretization in space and a Chebyshev
pseudospectral discretization in time. We use a preconditioned, globalized,
matrix-free, inexact Newton-Krylov method for numerical optimization. A
parameter continuation is designed to estimate an optimal regularization
parameter. Regularity is ensured by controlling the geometric properties of the
deformation field. Overall, we arrive at a black-box solver. We study spectral
properties of the Hessian, grid convergence, numerical accuracy, computational
efficiency, and deformation regularity of our scheme. We compare the designed
Newton-Krylov methods with a globalized preconditioned gradient descent. We
study the influence of a varying number of unknowns in time.
The reported results demonstrate excellent numerical accuracy, guaranteed
local deformation regularity, and computational efficiency with an optional
control on local mass conservation. The Newton-Krylov methods clearly
outperform the Picard method if high accuracy of the inversion is required.Comment: 32 pages; 10 figures; 9 table
Distributed-memory large deformation diffeomorphic 3D image registration
We present a parallel distributed-memory algorithm for large deformation
diffeomorphic registration of volumetric images that produces large isochoric
deformations (locally volume preserving). Image registration is a key
technology in medical image analysis. Our algorithm uses a partial differential
equation constrained optimal control formulation. Finding the optimal
deformation map requires the solution of a highly nonlinear problem that
involves pseudo-differential operators, biharmonic operators, and pure
advection operators both forward and back- ward in time. A key issue is the
time to solution, which poses the demand for efficient optimization methods as
well as an effective utilization of high performance computing resources. To
address this problem we use a preconditioned, inexact, Gauss-Newton- Krylov
solver. Our algorithm integrates several components: a spectral discretization
in space, a semi-Lagrangian formulation in time, analytic adjoints, different
regularization functionals (including volume-preserving ones), a spectral
preconditioner, a highly optimized distributed Fast Fourier Transform, and a
cubic interpolation scheme for the semi-Lagrangian time-stepping. We
demonstrate the scalability of our algorithm on images with resolution of up to
on the "Maverick" and "Stampede" systems at the Texas Advanced
Computing Center (TACC). The critical problem in the medical imaging
application domain is strong scaling, that is, solving registration problems of
a moderate size of ---a typical resolution for medical images. We are
able to solve the registration problem for images of this size in less than
five seconds on 64 x86 nodes of TACC's "Maverick" system.Comment: accepted for publication at SC16 in Salt Lake City, Utah, USA;
November 201
A Semi-Lagrangian two-level preconditioned Newton-Krylov solver for constrained diffeomorphic image registration
We propose an efficient numerical algorithm for the solution of diffeomorphic
image registration problems. We use a variational formulation constrained by a
partial differential equation (PDE), where the constraints are a scalar
transport equation.
We use a pseudospectral discretization in space and second-order accurate
semi-Lagrangian time stepping scheme for the transport equations. We solve for
a stationary velocity field using a preconditioned, globalized, matrix-free
Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner.
We consider two strategies for inverting the preconditioner on the coarse grid:
a nested preconditioned conjugate gradient method (exact solve) and a nested
Chebyshev iterative method (inexact solve) with a fixed number of iterations.
We test the performance of our solver in different synthetic and real-world
two-dimensional application scenarios. We study grid convergence and
computational efficiency of our new scheme. We compare the performance of our
solver against our initial implementation that uses the same spatial
discretization but a standard, explicit, second-order Runge-Kutta scheme for
the numerical time integration of the transport equations and a single-level
preconditioner. Our improved scheme delivers significant speedups over our
original implementation. As a highlight, we observe a 20 speedup for a
two dimensional, real world multi-subject medical image registration problem
FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube
In this work, we benchmark and discuss the performance of the scalable
methods for the Poisson problem which are used widely in practice: the fast
Fourier transform (FFT), the fast multipole method (FMM), the geometric
multigrid (GMG), and algebraic multigrid (AMG). In total we compare five
different codes, three of which are developed in our group. Our FFT, GMG, and
FMM are parallel solvers that use high-order approximation schemes for Poisson
problems with continuous forcing functions (the source or right-hand side). We
examine and report results for weak scaling, strong scaling, and time to
solution for uniform and highly refined grids. We present results on the
Stampede system at the Texas Advanced Computing Center and on the Titan system
at the Oak Ridge National Laboratory. In our largest test case, we solved a
problem with 600 billion unknowns on 229,379 cores of Titan. Overall, all
methods scale quite well to these problem sizes. We have tested all of the
methods with different source functions (the right-hand side in the Poisson
problem). Our results indicate that FFT is the method of choice for smooth
source functions that require uniform resolution. However, FFT loses its
performance advantage when the source function has highly localized features
like internal sharp layers. FMM and GMG considerably outperform FFT for those
cases. The distinction between FMM and GMG is less pronounced and is sensitive
to the quality (from a performance point of view) of the underlying
implementations. The high-order accurate versions of GMG and FMM significantly
outperform their low-order accurate counterparts.Comment: 25 pages; accepted paper in SISC journa
Far-Field Compression for Fast Kernel Summation Methods in High Dimensions
We consider fast kernel summations in high dimensions: given a large set of
points in dimensions (with ) and a pair-potential function (the
{\em kernel} function), we compute a weighted sum of all pairwise kernel
interactions for each point in the set. Direct summation is equivalent to a
(dense) matrix-vector multiplication and scales quadratically with the number
of points. Fast kernel summation algorithms reduce this cost to log-linear or
linear complexity.
Treecodes and Fast Multipole Methods (FMMs) deliver tremendous speedups by
constructing approximate representations of interactions of points that are far
from each other. In algebraic terms, these representations correspond to
low-rank approximations of blocks of the overall interaction matrix. Existing
approaches require an excessive number of kernel evaluations with increasing
and number of points in the dataset.
To address this issue, we use a randomized algebraic approach in which we
first sample the rows of a block and then construct its approximate, low-rank
interpolative decomposition. We examine the feasibility of this approach
theoretically and experimentally. We provide a new theoretical result showing a
tighter bound on the reconstruction error from uniformly sampling rows than the
existing state-of-the-art. We demonstrate that our sampling approach is
competitive with existing (but prohibitively expensive) methods from the
literature. We also construct kernel matrices for the Laplacian, Gaussian, and
polynomial kernels -- all commonly used in physics and data analysis. We
explore the numerical properties of blocks of these matrices, and show that
they are amenable to our approach. Depending on the data set, our randomized
algorithm can successfully compute low rank approximations in high dimensions.
We report results for data sets with ambient dimensions from four to 1,000.Comment: 43 pages, 21 figure
- …
