histo_kit.tissue_seg.find_thr¶
Functions
|
Perform one iteration of the Expectation-Maximization (EM) algorithm for a Gaussian Mixture Model (GMM) on binned data. |
|
Estimate noise components using a Gaussian Mixture Model (GMM) fitted to histogram data. |
|
Determine the threshold separating informative and non-informative GMM components. |
Compute the pixel value distribution for each RGB channel of an image. |
|
|
Compute per-channel thresholds for an RGB image using the GaMRed algorithm. |
|
Compute initial parameters for a Gaussian Mixture Model using dynamic programming on binned data. |
|
Evaluate the probability density function (PDF) of a normal distribution. |
|
Compute Otsu's threshold and effectiveness metric for a histogram. |
|
Compute a threshold using a two-step Otsu algorithm. |
- histo_kit.tissue_seg.find_thr.EM_iter_hist(x, y, alpha, mu, sig, SW)[source]¶
Perform one iteration of the Expectation-Maximization (EM) algorithm for a Gaussian Mixture Model (GMM) on binned data.
The function iteratively updates the mixture weights, means, and standard deviations of a GMM fitted to histogram data using the EM procedure until convergence or a maximum number of iterations is reached. It returns the updated parameters and the log-likelihood of the fitted model.
- Parameters:
x (ndarray of shape (N,)) – Bin centers of the histogram.
y (ndarray of shape (N,)) – Counts or frequencies for each bin.
alpha (ndarray of shape (K,)) – Initial weights of the Gaussian components (sum to 1).
mu (ndarray of shape (K,)) – Initial means of the Gaussian components.
sig (ndarray of shape (K,)) – Initial standard deviations of the Gaussian components.
SW (float) – Minimum allowed standard deviation (safeguard against too small variances).
- Returns:
alpha (ndarray of shape (K,)) – Updated weights of the Gaussian components.
mu (ndarray of shape (K,)) – Updated means of the Gaussian components, sorted in ascending order.
sig (ndarray of shape (K,)) – Updated standard deviations of the Gaussian components, sorted according to mu.
logL (float) – Log-likelihood of the histogram data under the fitted GMM.
Notes
The algorithm stops when the change in parameters is below a threshold (1e-6) or after 10,000 iterations.
Examples
>>> alpha, mu, sig, logL = EM_iter_hist(x, y, alpha_init, mu_init, sig_init, SW=0.01) >>> print("Updated means:", mu) >>> print("Log-likelihood:", logL)
- histo_kit.tissue_seg.find_thr.GaMRed_hist(x, y, K, draw, SW)[source]¶
Estimate noise components using a Gaussian Mixture Model (GMM) fitted to histogram data.
This function fits a Gaussian mixture model to histogrammed data using the Expectation-Maximization (EM) algorithm. It computes the Bayesian Information Criterion (BIC) for model evaluation and determines a threshold separating the mixture components, if applicable. The function returns the threshold, BIC value, and a dictionary of fitted model statistics.
- Parameters:
x (ndarray of shape (M,)) – Bin centers of the histogram (can be unsorted; will be sorted internally).
y (ndarray of shape (M,)) – Bin counts (frequency of observations for each bin).
K (int) – Number of Gaussian components to fit in the mixture model.
draw (bool) – Whether to print diagnostic information or draw plots.
SW (object) – Settings structure or dictionary controlling the EM algorithm behavior (e.g., convergence criteria, maximum iterations). The exact format depends on the helper functions (e.g.,
EM_iter_hist()).
- Returns:
thr (float) – Threshold value separating mixture components along the x-axis. If no threshold can be determined, returns
np.nan.bic (float) – Bayesian Information Criterion (BIC) value for the fitted model.
stats (dict) – Dictionary containing model statistics and parameters: -
"thr": float, threshold value -"alpha": ndarray, mixture component weights -"mu": ndarray, component means -"K": int, number of fitted components -"sigma": ndarray, component standard deviations -"logL": float, log-likelihood of the fitted model
- Raises:
ValueError – If the EM algorithm fails (e.g., returns invalid log-likelihood leading to infinite or zero BIC), prompting a rerun with different initialization.
Notes
For
K == 1, no separation is possible, and the threshold is set below the minimum x value.For
K == 2, the threshold is computed directly between the two Gaussian components.For
K > 2, components are grouped into two clusters usingsklearn.cluster.KMeansbefore determining the separating threshold.
References
MATLAB implementation : Michal Marczyk (Michal.Marczyk@polsl.pl) Algorithm is described in [1]
Examples
>>> thr_back = {"R": 0.6, "G": 0.6, "B": 0.6} >>> thr, bic, stats = GaMRed_hist(x, y, K=2, draw=False, SW=my_settings) >>> print(f"Threshold: {thr:.3f}, BIC: {bic:.2f}") >>> print("Component means:", stats["mu"])
- histo_kit.tissue_seg.find_thr.find_thr(data, alpha, mi, sigma, idx, draw)[source]¶
Determine the threshold separating informative and non-informative GMM components.
The function evaluates the Gaussian Mixture Model (GMM) over a finely spaced grid, computes the combined densities of informative and non-informative components, and identifies the threshold where the absolute difference between these densities is minimized. Optionally, plots the component densities.
- Parameters:
data (ndarray) – Input data points used to determine the threshold range.
alpha (ndarray of shape (K,)) – Weights of the GMM components.
mi (ndarray of shape (K,)) – Means of the GMM components.
sigma (ndarray of shape (K,)) – Standard deviations of the GMM components.
idx (array_like of shape (K,)) – Boolean index indicating which components are considered informative (True) and non-informative (False).
draw (bool) – If True, plots the component densities, their difference, and the threshold.
- Returns:
thr – Threshold value along the data axis that best separates informative from non-informative components.
- Return type:
float
Notes
The function uses a high-resolution grid (1e7 points) for precise threshold determination.
If no valid index is found within the constraints, a ValueError is raised.
Examples
>>> idx = np.array([False, True]) >>> thr = find_thr(data, alpha, mu, sigma, idx, draw=True) >>> print("Threshold value:", thr)
- histo_kit.tissue_seg.find_thr.get_pixel_distribution(img)[source]¶
Compute the pixel value distribution for each RGB channel of an image.
This function calculates histograms of pixel intensities (0–255) separately for the Red, Green, and Blue channels of an input RGB image. Counts for near-white pixels (values 254 and 255) are set to zero to remove artificial or background artifacts.
- Parameters:
img (ndarray of shape (M, N, 3)) – Input RGB image as a NumPy array with values in the range [0, 255].
- Returns:
R (ndarray of shape (256,)) – Histogram of Red channel pixel intensities (with counts for 254 and 255 set to zero).
G (ndarray of shape (256,)) – Histogram of Green channel pixel intensities (with counts for 254 and 255 set to zero).
B (ndarray of shape (256,)) – Histogram of Blue channel pixel intensities (with counts for 254 and 255 set to zero).
Examples
>>> R, G, B = get_pixel_distribution(image) >>> print("Red channel counts:", R) >>> print("Green channel counts:", G) >>> print("Blue channel counts:", B)
- histo_kit.tissue_seg.find_thr.get_thr_image(img, thr_min=178.5, verbose=False)[source]¶
Compute per-channel thresholds for an RGB image using the GaMRed algorithm.
This function estimates thresholds for each color channel (Red, Green, Blue) of an RGB image using the Gaussian Mixture Reduction (GaMRed) method. If a computed threshold is lower than thr_min, the function falls back to Otsu’s method for that channel. The function also returns the histogram of pixel values for each channel.
- Parameters:
img (ndarray of shape (M, N, 3)) – Input RGB image with pixel values in the range [0, 255].
thr_min (float, optional) – Minimum allowable threshold. If a GaMRed threshold is below this value, Otsu’s method is used instead. Default is 0.7*255.
verbose (bool, optional) – If True, prints messages when Otsu’s method is used due to low thresholds. Default is False.
- Returns:
thr (dict) – Dictionary of thresholds for each color channel: -
"R": float, threshold for the Red channel -"G": float, threshold for the Green channel -"B": float, threshold for the Blue channelR (ndarray of shape (256,)) – Histogram of Red channel pixel values.
G (ndarray of shape (256,)) – Histogram of Green channel pixel values.
B (ndarray of shape (256,)) – Histogram of Blue channel pixel values.
Notes
Uses
get_pixel_distribution()to compute per-channel histograms.Thresholds are initially estimated using
GaMRed_hist().If a threshold is below thr_min, the function uses
two_step_otsu()as a fallback for robustness.K=2 and SW=5 are fixed parameters for the GaMRed algorithm.
Examples
>>> thr, R, G, B = get_thr_image(image, thr_min=180, verbose=True) >>> print("Thresholds:", thr) >>> print("Red channel histogram:", R)
- histo_kit.tissue_seg.find_thr.gmm_init_dp_hist(x, y, K)[source]¶
Compute initial parameters for a Gaussian Mixture Model using dynamic programming on binned data.
This function estimates starting values for the mixture weights, means, and standard deviations of a GMM by partitioning the histogram (binned data) into K segments.
- Parameters:
x (ndarray of shape (N,)) – Bin centers of the histogram.
y (ndarray of shape (N,)) – Counts (frequencies) of each bin.
K (int) – Number of Gaussian components (partitions) to initialize.
- Returns:
alpha (ndarray of shape (K,)) – Initial weights of the Gaussian components, proportional to the sum of counts in each partition.
mu (ndarray of shape (K,)) – Initial means of the Gaussian components, computed as weighted averages of bin centers within each partition.
sigma (ndarray of shape (K,)) – Initial standard deviations of the Gaussian components, corrected for binned data using Sheppard’s correction.
Notes
Sheppard’s correction is applied to account for variance underestimation due to binning: s_corr = ((x[1] - x[0]) ** 2) / 12.
Examples
>>> alpha_init, mu_init, sigma_init = gmm_init_dp_hist(x, y, K=3) >>> print("Initial weights:", alpha_init) >>> print("Initial means:", mu_init) >>> print("Initial standard deviations:", sigma_init)
- histo_kit.tissue_seg.find_thr.norm_pdf(x, mu, sigma)[source]¶
Evaluate the probability density function (PDF) of a normal distribution.
Computes the PDF values of a normal (Gaussian) distribution with specified mean and standard deviation at the given input points.
- Parameters:
x (array_like) – Input values where the PDF is evaluated.
mu (float) – Mean of the normal distribution.
sigma (float) – Standard deviation of the normal distribution.
- Returns:
y – PDF values evaluated at x, returned as a 2D array with one row.
- Return type:
ndarray of shape (1, N)
Examples
>>> y = norm_pdf(np.array([0, 1, 2]), mu=0, sigma=1) >>> print(y)
- histo_kit.tissue_seg.find_thr.otsuthresh(counts)[source]¶
Compute Otsu’s threshold and effectiveness metric for a histogram.
This function implements Otsu’s method in Python, based on the MATLAB implementation. It calculates the threshold that maximizes the between-class variance for a histogram of pixel counts, along with an effectiveness metric indicating the separation quality.
- Parameters:
counts (array_like) – Histogram of pixel intensities (counts per bin).
- Returns:
t (float) – Normalized threshold value in the range [0, 1].
em (float) – Effectiveness metric of the threshold (ratio of between-class variance to total variance). Higher values indicate better separation.
Notes
Counts are converted to probabilities and cumulative sums are computed to calculate the between-class variance.
NaN values arising from division by zero are safely replaced with -inf.
Function is based on MATLAB’s implementation.
Examples
>>> t, em = otsuthresh(hist) >>> print("Normalized Otsu threshold:", t) >>> print("Effectiveness metric:", em)
References
Algorithm is described in [2]
- histo_kit.tissue_seg.find_thr.two_step_otsu(hist)[source]¶
Compute a threshold using a two-step Otsu algorithm.
This function applies a hierarchical, two-step version of Otsu’s method to determine a threshold from a histogram. The first Otsu threshold divides the histogram roughly in half, and the second Otsu threshold refines the separation within the upper segment of the histogram. This is useful for images with uneven lighting or bimodal intensity distributions.
- Parameters:
hist (ndarray of shape (256,)) – Histogram of pixel intensities (counts per bin).
- Returns:
thr – Computed threshold value in the range [0, 255].
- Return type:
float
Notes
Relies on
otsuthresh()(assumed available) for standard Otsu threshold computation.The second step focuses on the upper portion of the histogram to refine the threshold.
The final threshold is scaled to the 0–255 range and rounded to the nearest integer.
Examples
>>> thr = two_step_otsu(hist) >>> print("Two-step Otsu threshold:", thr)