histo_kit.grand_qc.dataset

Classes

GrandQCDataset(region, bg, bbox_list[, ...])

Pytorch Dataset for extracting fixed-size patches from a region while applying padding to boundary areas.

class histo_kit.grand_qc.dataset.GrandQCDataset(region, bg, bbox_list, patch_size=512, overlap=0.7, pad_value=0, encoder='timm-efficientnet-b0', weights='imagenet')[source]

Bases: Dataset

Pytorch Dataset for extracting fixed-size patches from a region while applying padding to boundary areas. Also returns a background mask patch and metadata describing the location of each patch.

Parameters:
  • region (np.ndarray) – Source RGB region image from which patches will be extracted. Expected shape is (H, W, 3).

  • bg (np.ndarray) – Background mask associated with region, matching spatial dimensions (H, W).

  • bbox_list (list of tuples) – List of bounding boxes defining areas of interest. Each bounding box should be represented as (x_start, y_start, x_end, y_end).

  • patch_size (int, optional) – Target size (height and width) for the extracted patches (default is 512 which is valid for the GrandQC model).

  • overlap (float, optional) – Fractional overlap between neighboring patches (default is 0.7).

  • pad_value (int, optional) – Value used to pad pixels when patches extend beyond the region boundary. Typically background (default is Artifact.BG_THR.value, which corresponds to 0).

  • encoder (str, optional) – Name of the encoder used for preprocessing, passed to segmentation_models_pytorch.encoders.get_preprocessing_fn.

  • weights (str, optional) – Pre-trained weights to use with the encoder (default is "imagenet").

coords

Dictionary of patch coordinates with keys "x_start", "y_start", "x_end", "y_end".

Type:

dict

prep_fn

Preprocessing function for encoder normalization.

Type:

callable

patch_size

Final patch spatial size.

Type:

int

pad_value

Background padding value.

Type:

int

Notes

Returned items are dictionaries to allow downstream inference pipelines to use bounding box metadata.

preprocess(img)[source]

Apply encoder-specific preprocessing and convert the image to a tensor.

Parameters:

img (np.ndarray) – Input patch of shape (patch_size, patch_size, 3).

Returns:

Preprocessed tensor suitable for the GrandQC model input.

Return type:

torch.Tensor