linumpy.gpu.n4#

GPU N4 bias field correction.

Implements the Tustison 2010 N4 algorithm using the B-spline primitive in linumpy.gpu.bspline and a CuPy/NumPy-shared histogram sharpening routine.

Each fitting level loops over:

Compute the log-residual r = log(v) - log_bias on masked voxels.
Sharpen the residual histogram by Wiener-deconvolving it with a Gaussian PSF (Sled 1998 / Tustison 2010), producing a LUT mapping observed log-intensity to expected (unbiased) log-intensity.
Voxel-wise, compute the per-voxel log-bias update delta = log(v) - LUT(log(v) - log_bias).
Fit a tensor-product cubic B-spline to delta on a regular control grid, evaluate at full resolution, and add to log_bias.

The next fitting level doubles the number of control points per axis.

Memory budget (per N4 call):

~6 x volume_size x 4 bytes

i.e. ~12 GB for a (256, 1024, 1024) float32 volume.

Functions#

`sharpen_residual`(log_v, mask, *[, n_bins, fwhm_log, ...])	Return the per-voxel sharpened log-intensity (LUT-mapped).
`n4_correct_gpu`(vol[, mask, shrink_factor, ...])	GPU-accelerated N4 bias field correction.

Module Contents#

linumpy.gpu.n4.sharpen_residual(log_v, mask, *, n_bins=200, fwhm_log=0.15, wiener_noise=0.01, use_gpu=True, mask_weights=None)[source]#

Return the per-voxel sharpened log-intensity (LUT-mapped).

Implements the Sled/Tustison histogram sharpening: build the weighted log-intensity histogram restricted to mask, deconvolve it by a Gaussian PSF (Wiener-regularised), and return the LUT E[v_true | v_obs] evaluated at every voxel in log_v.

Parameters:

log_v (np.ndarray) – Log-intensity volume (any shape, float32).
mask (np.ndarray or None) – Boolean mask; only masked voxels contribute to the histogram. When None, all voxels are used.
n_bins (int) – Histogram bin count.
fwhm_log (float) – Full-width-half-maximum of the Gaussian PSF in log-intensity units. Controls how much sharpening is applied (smaller FWHM means less sharpening, since the deconvolution kernel is narrower). N4 default is approximately 0.15.
wiener_noise (float) – Wiener regularisation term. Larger values stabilise the deconvolution at the expense of sharpening.
use_gpu (bool) – Use CuPy when available.
mask_weights (np.ndarray, optional) – Float32 view of mask (mask.astype(float32)). When provided, skips the per-call cast – a full-volume allocation that the N4 fit loop otherwise repeats every iteration.

Returns:

Sharpened log-intensity, same shape and dtype as log_v. Outside the mask, the input log-intensity is returned unchanged.

Return type:

np.ndarray

linumpy.gpu.n4.n4_correct_gpu(vol, mask=None, *, shrink_factor=4, n_iterations=None, spline_distance_mm=10.0, voxel_size_mm=(1.0, 1.0, 1.0), n_bins=200, fwhm_log=0.15, wiener_noise=0.01, convergence_tol=0.001, use_gpu=True, out=None, bias_out=None)[source]#

GPU-accelerated N4 bias field correction.

Faithful CuPy/NumPy port of the Tustison 2010 N4 algorithm: at each fitting level, alternate Sled-style histogram sharpening and tensor cubic B-spline scattered-data fitting until convergence. The B-spline control mesh is fixed across levels (matching SimpleITK’s behaviour); n_iterations only controls per-level iteration counts and the residual is composed across levels.

Parameters mirror linumpy.intensity.bias_field.n4_correct() so the two backends are interchangeable. Extra knobs (n_bins, fwhm_log, wiener_noise) tune the sharpening histogram.

Parameters:

vol (np.ndarray) – Float32 input volume (Z, Y, X).
mask (np.ndarray or None) – Boolean tissue mask. None = full volume.
shrink_factor (int) – Isotropic spatial subsampling factor for the fit (>=1).
n_iterations (list of int or None) – Max iterations per fitting level. Length sets the number of levels. Default [20, 20, 20]. Fewer iterations than the SimpleITK CPU backend because the GPU PSDB residual update has no internal multilevel dampening, so each iteration has full effect; more than ~20 per level causes the bias field to absorb true tissue contrast (verified empirically on live OCT).
spline_distance_mm (float) – Approximate distance between B-spline control knots at level 0.
voxel_size_mm (3-tuple of float) – Voxel size (z, y, x) in mm.
n_bins (sharpening parameters) – See sharpen_residual().
fwhm_log (sharpening parameters) – See sharpen_residual().
wiener_noise (sharpening parameters) – See sharpen_residual().
convergence_tol (float) – Per-iteration convergence threshold on the relative L2 change of log_bias. Iterations stop early when the change drops below this value.
use_gpu (bool) – Use CuPy when available.
out (np.ndarray, optional) – Destination buffer for the corrected output (full vol.shape, float32). When provided, the streaming Z-tile loop writes results directly into this buffer instead of allocating a fresh array. May safely alias the input vol (the host buffer is not read after the initial H2D upload at function entry), saving a full-volume float32 allocation on large mosaics.
bias_out (np.ndarray, optional) – Destination buffer for the bias-field output, same shape and dtype constraints as out.

Returns:

corrected (np.ndarray) – Bias-corrected float32 volume (Z, Y, X), full resolution.
bias_field (np.ndarray) – Estimated multiplicative bias field, float32, full resolution.

Return type:

tuple[numpy.ndarray, numpy.ndarray]