linumpy.gpu.n4#

GPU N4 bias field correction.

Implements the Tustison 2010 N4 algorithm using the B-spline primitive in linumpy.gpu.bspline and a CuPy/NumPy-shared histogram sharpening routine.

Each fitting level loops over:

  1. Compute the log-residual r = log(v) - log_bias on masked voxels.

  2. Sharpen the residual histogram by Wiener-deconvolving it with a Gaussian PSF (Sled 1998 / Tustison 2010), producing a LUT mapping observed log-intensity to expected (unbiased) log-intensity.

  3. Voxel-wise, compute the per-voxel log-bias update delta = log(v) - LUT(log(v) - log_bias).

  4. Fit a tensor-product cubic B-spline to delta on a regular control grid, evaluate at full resolution, and add to log_bias.

The next fitting level doubles the number of control points per axis.

Memory budget (per N4 call):

~6 x volume_size x 4 bytes

i.e. ~12 GB for a (256, 1024, 1024) float32 volume.

Functions#

sharpen_residual(log_v, mask, *[, n_bins, fwhm_log, ...])

Return the per-voxel sharpened log-intensity (LUT-mapped).

n4_correct_gpu(vol[, mask, shrink_factor, ...])

GPU-accelerated N4 bias field correction.

Module Contents#

linumpy.gpu.n4.sharpen_residual(log_v, mask, *, n_bins=200, fwhm_log=0.15, wiener_noise=0.01, use_gpu=True, mask_weights=None)[source]#

Return the per-voxel sharpened log-intensity (LUT-mapped).

Implements the Sled/Tustison histogram sharpening: build the weighted log-intensity histogram restricted to mask, deconvolve it by a Gaussian PSF (Wiener-regularised), and return the LUT E[v_true | v_obs] evaluated at every voxel in log_v.

Parameters:
  • log_v (np.ndarray) – Log-intensity volume (any shape, float32).

  • mask (np.ndarray or None) – Boolean mask; only masked voxels contribute to the histogram. When None, all voxels are used.

  • n_bins (int) – Histogram bin count.

  • fwhm_log (float) – Full-width-half-maximum of the Gaussian PSF in log-intensity units. Controls how much sharpening is applied (smaller FWHM means less sharpening, since the deconvolution kernel is narrower). N4 default is approximately 0.15.

  • wiener_noise (float) – Wiener regularisation term. Larger values stabilise the deconvolution at the expense of sharpening.

  • use_gpu (bool) – Use CuPy when available.

  • mask_weights (np.ndarray, optional) – Float32 view of mask (mask.astype(float32)). When provided, skips the per-call cast – a full-volume allocation that the N4 fit loop otherwise repeats every iteration.

Returns:

Sharpened log-intensity, same shape and dtype as log_v. Outside the mask, the input log-intensity is returned unchanged.

Return type:

np.ndarray

linumpy.gpu.n4.n4_correct_gpu(vol, mask=None, *, shrink_factor=4, n_iterations=None, spline_distance_mm=10.0, voxel_size_mm=(1.0, 1.0, 1.0), n_bins=200, fwhm_log=0.15, wiener_noise=0.01, convergence_tol=0.001, use_gpu=True, out=None, bias_out=None)[source]#

GPU-accelerated N4 bias field correction.

Faithful CuPy/NumPy port of the Tustison 2010 N4 algorithm: at each fitting level, alternate Sled-style histogram sharpening and tensor cubic B-spline scattered-data fitting until convergence. The B-spline control mesh is fixed across levels (matching SimpleITK’s behaviour); n_iterations only controls per-level iteration counts and the residual is composed across levels.

Parameters mirror linumpy.intensity.bias_field.n4_correct() so the two backends are interchangeable. Extra knobs (n_bins, fwhm_log, wiener_noise) tune the sharpening histogram.

Parameters:
  • vol (np.ndarray) – Float32 input volume (Z, Y, X).

  • mask (np.ndarray or None) – Boolean tissue mask. None = full volume.

  • shrink_factor (int) – Isotropic spatial subsampling factor for the fit (>=1).

  • n_iterations (list of int or None) – Max iterations per fitting level. Length sets the number of levels. Default [20, 20, 20]. Fewer iterations than the SimpleITK CPU backend because the GPU PSDB residual update has no internal multilevel dampening, so each iteration has full effect; more than ~20 per level causes the bias field to absorb true tissue contrast (verified empirically on live OCT).

  • spline_distance_mm (float) – Approximate distance between B-spline control knots at level 0.

  • voxel_size_mm (3-tuple of float) – Voxel size (z, y, x) in mm.

  • n_bins (sharpening parameters) – See sharpen_residual().

  • fwhm_log (sharpening parameters) – See sharpen_residual().

  • wiener_noise (sharpening parameters) – See sharpen_residual().

  • convergence_tol (float) – Per-iteration convergence threshold on the relative L2 change of log_bias. Iterations stop early when the change drops below this value.

  • use_gpu (bool) – Use CuPy when available.

  • out (np.ndarray, optional) – Destination buffer for the corrected output (full vol.shape, float32). When provided, the streaming Z-tile loop writes results directly into this buffer instead of allocating a fresh array. May safely alias the input vol (the host buffer is not read after the initial H2D upload at function entry), saving a full-volume float32 allocation on large mosaics.

  • bias_out (np.ndarray, optional) – Destination buffer for the bias-field output, same shape and dtype constraints as out.

Returns:

  • corrected (np.ndarray) – Bias-corrected float32 volume (Z, Y, X), full resolution.

  • bias_field (np.ndarray) – Estimated multiplicative bias field, float32, full resolution.

Return type:

tuple[numpy.ndarray, numpy.ndarray]