Classify Tumors in Digital Pathology Images Using Cell Nucleus Morphology
This example shows how to classify whole slide image (WSI) patches as normal or cancerous using pathomics.
Digital pathology involves quantitatively analyzing digital microscopy images, or whole slide images (WSI), to diagnose cancer and other diseases. Deep learning can facilitate the segmentation, object detection, and classification of high-resolution WSIs. For example, the deep learning network used in the Classify Tumors in Multiresolution Blocked Images example classifies WSIs as containing normal or tumor tissue. However, deep learning results can be difficult to interpret because networks use abstract image features internal to their complex architecture. In contrast, pathomics focuses on training models using interpretable image features related to cell morphology, texture, and intensity, providing valuable insight into disease development and progression.
In this example, you use nuclear morphology features, such as size and circularity, to train a machine learning classification model that labels blocks of WSIs as normal or containing tumor cells. Additionally, this example addresses two common technical challenges in pathomics analyses:
Managing large WSIs — WSIs can have sizes on the order of 200,000-by-100,000 pixels, and often contain multiple resolution levels. This example uses
blockedImage
andblockedImageDatastore
objects to process WSIs without loading the full images into system memory.Accurately labeling cell nuclei — This example uses a pretrained nucleus segmentation model from the Cellpose Library [1] [2]. You can configure Cellpose models in MATLAB® by using the
cellpose
object and its object functions.
This example requires the Medical Imaging Toolbox™ Interface for Cellpose Library. You can install the Medical Imaging Toolbox Interface for Cellpose Library from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. The Medical Imaging Toolbox Interface for Cellpose Library requires Deep Learning Toolbox™ and Computer Vision Toolbox™.
About Data Set
This example uses WSIs from the Camelyon16 challenge data set [3]. The data from this challenge contains a total of 400 WSIs of lymph nodes of breast cancer patients from two independent sources, separated into 270 training images and 130 test images. The WSIs are stored as TIF files in a stripped format with an 11-level pyramid structure.
The training data set consists of 159 WSIs of normal lymph nodes and 111 WSIs of lymph nodes containing tumors. Ground truth coordinates of the tumor boundaries accompany the tumor images. The size of the training data set is approximately 451 GB.
The testing data set consists of 130 WSIs including a mix of normal and tumor tissue images. Tumor boundary coordinates are included for images with tumors. The size of the test data set is approximately 205 GB.
Download Data Set
First, create folders where you want to store the downloaded data on your machine.
dataDir = fullfile(tempdir,"Camelyon16"); trainingDir = fullfile(dataDir,"training"); testDir = fullfile(dataDir,"testing"); trainNormalDir = fullfile(trainingDir,"normal"); trainTumorDir = fullfile(trainingDir,"tumor"); trainAnnotationDir = fullfile(trainingDir,"lesion_annotations"); testImageDir = fullfile(testDir,"images"); testAnnotationDir = fullfile(testDir,"lesion_annotations"); if ~exist(dataDir,"dir") mkdir(trainingDir) mkdir(trainNormalDir) mkdir(trainTumorDir) mkdir(trainAnnotationDir) mkdir(testImageDir) mkdir(testAnnotationDir) end
To download the data, go to the GigaDB website for the data set, and, in the table, navigate to the Files tab. Download individual files by clicking on the name in the File Name column, and move them to the correct subfolders by following these steps:
Download all TIF images with filenames starting with
normal_
, and store them in the folder specified by thetrainNormalDir
variable.Download all TIF images with filenames starting with
tumor_
, and store them in the folder specified by thetrainTumorDir
variable.Download all TIF images with filenames starting with
test_
, and store them in the folder specified by thetestImageDir
variable.Download the
lesion_annotations.zip
file for the training data set, and extract the zipped files to the folder specified by thetrainAnnotationDir
variable. You can differentiate between the lesions file for the training and testing data set using the information in the Description column of the table.Download the
lesion_annotations.zip
file for the testing data set of CAMELYON16, and extract the zipped files to the folder specified by thetestAnnotationDir
variable.
Create blockedImage
Objects to Manage Training Images
Get the filenames of the normal and tumor training images.
normalFileSet = matlab.io.datastore.FileSet(fullfile(trainNormalDir,"normal*")); normalFilenames = normalFileSet.FileInfo.Filename; tumorFileSet = matlab.io.datastore.FileSet(fullfile(trainTumorDir,"tumor*")); tumorFilenames = tumorFileSet.FileInfo.Filename;
One of the training images of normal tissue, normal_144.tif
, is missing important metadata required to deduce the physical extents of the image. Exclude this file.
normalFilenames(contains(normalFilenames,"normal_144")) = [];
Create two arrays of blockedImage
objects, one for normal images and one for the tumor images. Each blockedImage
object points to the corresponding image file on disk.
normalImages = blockedImage(normalFilenames); tumorImages = blockedImage(tumorFilenames);
Display WSI Thumbnails
To get a better understanding of the training data, display thumbnails of the normal images in the Image Browser app using the browseBlockedImages
helper function. This helper function is attached to the example as a supporting file.
Note that the majority of each image contains white background, and that the tissue occupies only a small portion of the image. These characteristics are typical of WSIs.
browseBlockedImages(normalImages,8)
Preprocess Training Images
Align Spatial Extents of Blocked Images
When you work with multiple resolution levels of a multiresolution image, the spatial extents of each level must match. If the spatial extents are aligned, then you can extract information, such as masks, at coarse resolution levels and correctly apply them to matching locations at finer resolution levels. For more information, see the Set Up Spatial Referencing for Blocked Images example.
Set the spatial referencing for all training data by using the setSpatialReferencingForCamelyon16
helper function. This function is attached to the example as a supporting file. The setSpatialReferencingForCamelyon16
function sets the WorldStart
and WorldEnd
properties of each blockedImage
object using the spatial referencing information from the TIF file metadata.
normalImages = setSpatialReferencingForCamelyon16(normalImages); tumorImages = setSpatialReferencingForCamelyon16(tumorImages);
Create Tissue Masks for Normal Images
The majority of a typical WSI consists of background pixels. To process WSI data efficiently, you can create a region of interest (ROI) mask from a coarse resolution level, and then limit computations at finer resolution levels to regions within the ROI. For more information, see the Process Blocked Images Efficiently Using Mask example.
This example uses a coarse resolution level to create masks for the training images of normal tissue.
maskLevel = 8;
Specify a target location for the tissue masks.
trainNormalMaskDir = fullfile(trainingDir,"normal_mask_level"+num2str(maskLevel));
To create the tissue masks, apply a threshold that excludes light background regions, and use imclose
to fill holes in the thresholded image. Apply the threshold in a block-wise manner to all normal training images using the apply
function. Save the results to disk by specifying the OutputLocation
name-value argument. If the masks already exist in the specified folder, load them instead.
if ~isfolder(trainNormalMaskDir) normalMasks = apply(normalImages, @(bs)imclose(im2gray(bs.Data)<150, ones(5)), ... BlockSize=[512 512],... Level=maskLevel, ... OutputLocation=trainNormalMaskDir); save(fullfile(trainNormalMaskDir,"normalMasks.mat"),"normalMasks") else load(fullfile(trainNormalMaskDir,"normalMasks.mat"),"normalMasks"); end
Display Tissue Mask Thumbnails
Display thumbnails of the normal tissue mask images in the Image Browser app using the browseBlockedImages
helper function. This helper function is attached to the example as a supporting file. Specify the resolution level as 1
because the tissue masks have only one level.
browseBlockedImages(normalMasks,1)
Display a single WSI with its tissue mask as an overlay. Before displaying the image, extract the same resolution level used to create the mask.
sampleNormalImage = gather(normalImages(1),Level=maskLevel); imageshow(sampleNormalImage,OverlayData=normalMasks(1));
Create Tumor Masks for Tumor Images
The Camelyon16 data set provides ground truth annotations of the tumors for the tumor images. Each annotation consists of hand-drawn tumor boundary coordinates saved as an XML file.
Display Tumor Boundaries
Read the ground truth boundary coordinates and display them using the showCamelyon16TumorAnnotations
helper function. This helper function is attached to the example as a supporting file. Notice that normal regions (shown with a green boundary) can occur inside tumor regions.
idxSampleTumorImage = 64; tumorImage = tumorImages(idxSampleTumorImage); showCamelyon16TumorAnnotations(tumorImage,trainAnnotationDir); xlim([77810 83602]) ylim([139971 145763])
Convert Polygon Coordinates to Binary Blocked Images
Specify a target location for the tumor masks.
trainTumorMaskDir = fullfile(trainingDir,"tumor_mask_level"+num2str(maskLevel));
Create a tumor mask for each tumor image using the createMaskForCamelyon16TumorTissue
helper function. This helper function is attached to the example as a supporting file. For each image, the function performs these operations.
Reads the (x, y) boundary coordinates for all ROIs in the annotated XML file.
Separates the boundary coordinates for tumor and normal tissue ROIs into corresponding cell arrays.
Converts the cell arrays of boundary coordinates to a binary blocked image using the
polyToBlockedImage
function. In the binary image, the ROI indicates tumor pixels and the background indicates normal tissue pixels. Pixels that are within both tumor and normal tissue ROIs are classified as background.
If the masks already exist in the specified folder, load them instead.
if ~isfolder(trainTumorMaskDir) mkdir(trainTumorMaskDir) tumorMasks = createMaskForCamelyon16TumorTissue( ... tumorImages,trainAnnotationDir,trainTumorMaskDir,maskLevel); save(fullfile(trainTumorMaskDir,"tumorMasks.mat"),"tumorMasks") else load(fullfile(trainTumorMaskDir,"tumorMasks.mat"),"tumorMasks"); end
Display Tumor Mask Thumbnails
Display thumbnails of the tumor mask images in the Image Browser app using the browseBlockedImages
helper function. This helper function is attached to the example as a supporting file. Specify the resolution level as 1
because the tissue masks have only one level.
browseBlockedImages(tumorMasks,1)
Select Blocks for Training
In this example, you train a statistical model that classifies image blocks, or tiles, extracted from WSIs as normal or containing tumor cells. To train the model, select a subset of blocks from each of the normal and tumor images.
Specify the size and resolution level for the image tiles. For this example, create image tiles at the finest resolution level, at which individual cell nuclei are visible.
networkBlockSize = [512 512]; trainingLevel = 1;
Create sets of normal and tumor blocks using the selectBlockLocations
function. This function selects blocks that are within the specified mask area. You can refine the number of selected blocks for each class by specifying the BlockOffsets
and InclusionThreshold
name-value arguments. Consider these two factors when calling the selectBlockLocations
function.
Using more data helps to generalize the classifier during training and ensures a good class balance in the selected block set. Increase the number of selected blocks by decreasing the values of the
BlockOffsets
andInclusionThreshold
name-value arguments.Using more blocks requires more training time or more powerful hardware. Decrease the number of selected blocks by increasing the values of the
BlockOffsets
andInclusionThreshold
name-value arguments.
Select blocks within the normal images using the tissue masks. This example specifies values of BlockOffsets
and InclusionThreshold
that result in a relatively small number of blocks.
normalOffsetFactor = 28; normalInclusionThreshold = 0.99; blsNormalData = selectBlockLocations(normalImages, ... BlockSize=networkBlockSize(1:2), ... BlockOffsets=round(networkBlockSize(1:2)*normalOffsetFactor), ... Levels=trainingLevel, ... Masks=normalMasks, ... InclusionThreshold=normalInclusionThreshold, ... ExcludeIncompleteBlocks=true);
Select blocks within the tumor images using the tumor masks. To ensure a balanced data set, specify a lower offset value and inclusion threshold for tumor images to include a larger proportion of tumor blocks. This is necessary because the tissue masks are larger and cover a greater total number of blocks than the tumor masks.
tumorOffsetFactor = 7; tumorInclusionThreshold = 0.95; blsTumorData = selectBlockLocations(tumorImages, ... BlockSize=networkBlockSize(1:2), ... BlockOffsets=round(networkBlockSize(1:2)*tumorOffsetFactor), ... Levels=trainingLevel, ... Masks=tumorMasks, ... InclusionThreshold=tumorInclusionThreshold, ... ExcludeIncompleteBlocks=true);
Create Blocked Image Datastores for Training
Create blockedImageDatastore
objects to manage the collection of blocks in the blsNormalData
and blsTumorData
block location sets.
bimds_normal = blockedImageDatastore(normalImages,"BlockLocationSet",blsNormalData); bimds_tumor = blockedImageDatastore(tumorImages,"BlockLocationSet",blsTumorData);
Configure Cellpose Model
Create a cellpose
object for the "nuclei"
pretrained model. Use the canUseGPU
function to check whether a supported GPU and Parallel Computing Toolbox™ are available. By default, a cellpose
object processes images on a GPU if one is available. If a GPU is not available, enable OpenVINO acceleration for best performance on the CPU.
if canUseGPU cp = cellpose(Model="nuclei"); else cp = cellpose(Model="nuclei",Acceleration="openvino"); end
Display Nuclei Masks for Representative Tissue Blocks
To get a better understanding of the nuclei segmentation masks predicted by Cellpose, segment representative normal and tumor blocks and display their nuclei labels as an overlay.
First, read one block from each of the training datastores.
sampleNormalBlock = read(bimds_normal); sampleNormalBlock = sampleNormalBlock{1}; sampleTumorBlock = read(bimds_tumor); sampleTumorBlock = sampleTumorBlock{1};
Predict nuclei masks for the sample blocks using the segmentNuclei
helper function, which is defined at the end of this example. The helper function performs basic preprocessing, and segments the nuclei using the segmentCells2D
function.
sampleLabelsNormal = segmentNuclei(sampleNormalBlock,cp); sampleLabelsTumor = segmentNuclei(sampleTumorBlock,cp);
Display the overlays side-by-side. The nuclei in the normal tissue block appear smaller and closer together compared to the tumor block.
normalOverlay = labeloverlay(sampleNormalBlock,sampleLabelsNormal); tumorOverlay = labeloverlay(sampleTumorBlock,sampleLabelsTumor); imshowpair(normalOverlay,tumorOverlay,"montage") title("Normal (left) and Tumor (right)")
Calculate Nuclear Morphology Image Features for Training Images
Segment and calculate image features for all of the training image blocks by using the calculateNuclearFeatures
helper function. The helper function is defined at the end of this example, and performs these steps for each block in an input blockedImageDatastore
:
Preprocesses the image block, and segments all cell nuclei using the
"nuclei"
Cellpose model by using thesegmentCells2D
function.For each nucleus, calculates the area, major axis length, minor axis length, equivalent diameter, eccentricity, and circularity by using the
regionprops
function. The parameters used in this example have been selected based on visual differences between normal and tumor cells, and they represent a subset of potential pathomics biomarkers to explore.Calculates the mean value for each parameter within the block, averaging the results across individual nuclei. Additionally, calculates the number of nuclei within the block.
Returns a structure containing a field for each parameter and a row for each block in the datastore.
Process the normal and tumor training image blocks. This can take several minutes to an hour depending on your system configuration.
statsTrainingNormal = calculateNuclearFeatures(bimds_normal,cp); statsTrainingTumor = calculateNuclearFeatures(bimds_tumor,cp);
Visualize Nuclear Features for Training Images
Combine the output statistics for all training images. Create a vector of strings that labels each training image block as "normal"
or "tumor"
.
statsTraining = [statsTrainingTumor, statsTrainingNormal]; labels = strings(length(statsTraining),1); labels(:) = "normal"; labels(1:length(statsTrainingTumor)) = "tumor";
Visualize each nuclear morphology feature alongside its label.
The first bar in the visualization shows the ground truth classification, with tumor blocks in yellow and normal tissue blocks in blue. The remaining bars show each of the nuclear morphology features. Note that the colorbars are independently scaled within each variable. Within each feature, there is visual separation in the distribution of values for tumor versus normal blocks. Nuclear area, major axis length, minor axis length, and equivalent diameter are all greater for tumor blocks, consistent with larger cell nuclei. Nuclear count is higher for normal cells, consistent with decreased cell density within tumors. There is a more subtle difference in eccentricity and circularity, consistent with more circular nuclei in normal cells and elongated nuclei in tumor cells. These quantitative differences in nuclear morphology are consistent with visual differences observed for the representative blocks in the training datastores.
featureNames = fieldnames(statsTraining); f = length(featureNames); statsTraining = struct2table(statsTraining); statsTraining = table2array(statsTraining); labelsTrainLogical = labels == "tumor"; figure(Position=[0 0 1500 500]); tiledlayout(1,f + 1) nexttile imagesc(labelsTrainLogical) xticklabels({}) yticklabels({}) ylabel("Malignancy",Interpreter="none") for i = 1:f nexttile imagesc(statsTraining(:,i)) xticklabels({}) yticklabels({}) ylabel(featureNames{i},Interpreter="none") end
Train Classifier
Train a classification model that classifies blocks as normal or containing tumor cells. Statistics and Machine Learning Toolbox™ provides several model types. You can use the fitcauto
function to try several models with different hyperparameters. The function returns the best model for your data, and prints the hyperparameters to pass to the training function for the selected model type. This example uses a model selected using this syntax.
model = fitcauto(statsTraining,labels);
Based on the output of fitcauto
for this training data set, train a neural network classification model using the fitcnet
function. This example uses a network with sigmoid
activations and three fully connected layers with 4, 8, and 292 outputs, respectively.
model = fitcnet(statsTraining,labels,Activations="sigmoid", ... Standardize=true,Lambda=0.000045945,LayerSizes=[4 8 292]);
Evaluate Classifier Accuracy
Create blockedImage
Objects to Manage Test Images
Get the filenames of the test images. Then, create an array of blockedImage
objects that manage the test images. Each blockedImage
object points to the corresponding image file on disk. Then, set the spatial referencing for all testing data by using the setSpatialReferencingForCamelyon16
helper function. This function is attached to the example as a supporting file. The setSpatialReferencingForCamelyon16
function sets the WorldStart
and WorldEnd
properties of each blockedImage
object using the spatial referencing information from the TIF file metadata.
testFileSet = matlab.io.datastore.FileSet(testImageDir); testImages = blockedImage(testFileSet); testImages = setSpatialReferencingForCamelyon16(testImages);
Create Tissue Masks for Test Images
Similar to the training data set, create binary tissue masks that include tissue and exclude light background areas using a coarse resolution level. Create tissue masks for all test images, including those with tumors.
To create the tissue masks, apply a threshold that excludes light background regions, and use imclose
to fill holes in the thresholded image. Apply the threshold in a block-wise manner to all normal training images by using the apply
function. Save the results to disk by specifying the OutputLocation
name-value argument. If the masks already exist in the specified folder, load them instead.
testTissueMaskDir = fullfile(testDir,"test_tissue_mask_level"+num2str(maskLevel)); if ~isfolder(testTissueMaskDir) testTissueMasks = apply(testImages, @(bs)imclose(im2gray(bs.Data)<150, ones(5)), ... BlockSize=[512 512], ... Level=maskLevel, ... OutputLocation=testTissueMaskDir); save(fullfile(testTissueMaskDir,"testTissueMasks.mat"),"testTissueMasks") else load(fullfile(testTissueMaskDir,"testTissueMasks.mat"),"testTissueMasks"); end
Create Tumor Masks for Test Images
Similar to the training data set, create binary tumor masks using the ground truth annotations of the tumors provided with the Camelyon 16 data set. The test data set includes both normal and tumor images.
Specify a target location for the tumor masks.
testTumorMaskDir = fullfile(testDir,"test_tumor_mask_level"+num2str(maskLevel));
Create the tumor masks using the createMaskForCamelyon16TumorTissue
helper function. This helper function is attached to the example as a supporting file. If the masks already exist in the specified folder, load them instead. Note that the function returns warnings for test images that do not include annotation files because the test data set includes some images with only normal tissue.
if ~isfolder(testTumorMaskDir) mkdir(testTumorMaskDir) testTumorMasks = createMaskForCamelyon16TumorTissue(testImages, ... testAnnotationDir,testTumorMaskDir,maskLevel); save(fullfile(testTumorMaskDir,"testTumorMasks.mat"),"testTumorMasks") else load(fullfile(testTumorMaskDir,"testTumorMasks.mat"),"testTumorMasks"); end
Create Normal Tissue Masks for Test Images
The test data set includes both tumor and normal images. To isolate regions with only normal tissue, create a third set of masks, testNormalMasks
, that includes non-tumor tissue regions. Create the masks by using the apply
function, which processes the masks block-by-block with operations defined by the subtractMasks
helper function. The subtractMasks
function calculates the Boolean subtraction of the tumor mask from the tissue mask.
for i = 1:numel(testTissueMasks) testNormalMasks(i) = apply(testTissueMasks(i),@subtractMasks,ExtraImages=testTumorMasks(i)); end function normalMask = subtractMasks(bsTissue,bwTumor) bwTissue = bsTissue.Data; % Resize the tumor block to ensure it is the same size as the tissue block bwTumor = imresize(bwTumor,size(bwTissue),"nearest"); normalMask = bwTissue&(~bwTumor); end
Select Blocks for Testing
Similar to the training data, create sets of normal and tumor blocks using the selectBlockLocations
function.
Use the tumor masks to select blocks within the test images that contain tumor tissue. This example specifies values of BlockOffsets
and InclusionThreshold
that result in a relatively small number of blocks.
testTumorOffsetFactor = 10; testTumorInclusionThreshold = 0.95; blsTestTumorData = selectBlockLocations(testImages, ... BlockSize=networkBlockSize(1:2), ... BlockOffsets=round(networkBlockSize(1:2)*testTumorOffsetFactor), ... Levels=trainingLevel, ... Masks=testTumorMasks, ... InclusionThreshold=testTumorInclusionThreshold, ... ExcludeIncompleteBlocks=true);
Use the normal masks to select blocks within the test images that contain normal tissue. This example specifies values of BlockOffsets
and InclusionThreshold
that result in a relatively small number of blocks.
testNormalOffsetFactor = 32; testNormalInclusionThreshold = 0.99; blsTestNormalData = selectBlockLocations(testImages, ... BlockSize=networkBlockSize(1:2), ... BlockOffsets=round(networkBlockSize(1:2)*testNormalOffsetFactor), ... Levels=trainingLevel, ... Masks=testNormalMasks, ... InclusionThreshold=testNormalInclusionThreshold, ... ExcludeIncompleteBlocks=true);
Create Blocked Image Datastores for Testing
Create blockedImageDatastore
objects to manage the blocks in the blsTestTumorData
and blsTestNormalData
block location sets.
bimds_testTumor = blockedImageDatastore(testImages,"BlockLocationSet",blsTestTumorData); bimds_testNormal = blockedImageDatastore(testImages,"BlockLocationSet",blsTestNormalData);
Calculate Nuclear Morphology Features for Test Images
Similar to the process used for the training data, segment and calculate image features for all of the testing image blocks by using the calculateNuclearFeatures
helper function. The helper function is defined at the end of this example. This can take several minutes to an hour depending on your system configuration.
statsTestNormal = calculateNuclearFeatures(bimds_testNormal,cp); statsTestTumor = calculateNuclearFeatures(bimds_testTumor,cp);
Combine the output statistics for all testing image blocks.
StatsTest = [statsTestTumor, statsTestNormal];
Classify Test Data Using Trained Classifier
Use the trained classifier model to predict whether each testing image block contains only normal or some tumor tissue. First, convert the output statistics from a table to an array, which is the format expected by the predict
(Statistics and Machine Learning Toolbox) function. Then, use the predict
function to make predictions for the testing image blocks.
StatsTest = struct2table(StatsTest); StatsTest = table2array(StatsTest); predictedLabelsTest = predict(model,StatsTest);
Display Confusion Matrix
To serve as known ground truth labels for the confusion matrix, create a vector of strings that labels each testing image block as "normal"
or "tumor"
.
labelsTest = strings(length(StatsTest),1); labelsTest(:) = "normal"; labelsTest(1:length(statsTestTumor)) = "tumor";
Calculate the confusion matrix from the known ground truth labels and the predicted labels.
confusionMatrix = confusionmat(labelsTest,predictedLabelsTest)
confusionMatrix = 2×2
494 61
51 512
Calculate the overall accuracy of the predicted labels from the confusion matrix.
accuracy = sum(diag(confusionMatrix))/sum(confusionMatrix,"all")
accuracy = 0.8998
Display a normalized confusion matrix chart. Normalize values by row, such that cells sum to 100% across each row. For example, the top row shows the percentage of actually normal blocks that the model correctly predicted to be normal on the left, and the percentage of normal blocks the model incorrectly predicted to contain tumor cells on the right.
figure
confusionchart(labelsTest,string(predictedLabelsTest),Normalization="row-normalized")
Helper Functions
The segmentNuclei
helper function segments the nuclei in an input image block, sampleBlock
, using the Cellpose model specified by cp
. The function preprocesses the input image for segmentation, including converting it to grayscale, rescaling its intensities to the range [0, 255], and inverting the image. The function assumes an average diameter for the model to segment of 20 pixels.
function labels = segmentNuclei(sampleBlock,cp) img = im2gray(sampleBlock); gmin = 0; gmax = 255; img = rescale(img,InputMin=gmin,InputMax=gmax); imgc = imcomplement(img); averageNucleusDiameter = 20; labels = segmentCells2D(cp,imgc,ImageCellDiameter=averageNucleusDiameter,NormalizeInput=false); end
The calculateNuclearFeatures
helper function segments nuclei and calculates nuclear image features for all image blocks in a blockedImagedatastore
using the Cellpose model cp
. The helper function performs these steps for each block in the datastore:
Preprocesses the image block, and segments all cell nuclei using the
"nuclei"
Cellpose model by using thesegmentCells2D
function.For each nucleus, calculates the centroid, area, major axis length, minor axis length, equivalent diameter, eccentricity, and circularity by using the
regionprops
function. The parameters used in this example were selected based on visual differences between normal and tumor cells, and represent a subset of potential pathomics biomarkers to explore.Calculates the mean value for each parameter within the block, averaging the results across individual nuclei. Additionally, calculate the number of nuclei within the block.
Returns a structure containing a field for each parameter, in which each row corresponds to a block in the datastore.
function stats = calculateNuclearFeatures(bimds,cp) % Specify parameters for cellpose model averageNucleusDiameter = 20; gmin = 0; gmax = 255; % Reset the datastore reset(bimds) % For each block in the datastore idx = 1; while hasdata(bimds) % Get block block_i = read(bimds); % Preprocess block for segmentation img = im2gray(block_i{1}); img = rescale(img,InputMin=gmin,InputMax=gmax); imgc = imcomplement(img); % Segment nuclei labels = segmentCells2D(cp,imgc, ... ImageCellDiameter=averageNucleusDiameter, ... NormalizeInput=false); % Calculate morphological parameters for each nucleus in the block rstats = regionprops(labels,["Centroid","Area","MajorAxisLength","MinorAxisLength", ... "EquivDiameter","Eccentricity","Circularity"]); % Calculate the mean value for each parameter within the block, and % save them in a new row of the output structure stats stats(idx).Area = mean([rstats.Area]); stats(idx).MajorAxisLength = mean([rstats.MajorAxisLength]); stats(idx).MinorAxisLength = mean([rstats.MinorAxisLength]); stats(idx).EquivDiameter = mean([rstats.EquivDiameter]); stats(idx).Eccentricity = mean([rstats.Eccentricity]); stats(idx).Circularity = mean([rstats.Circularity]); stats(idx).Count = numel(rstats); idx = idx + 1; end end
References
[1] Stringer, Carsen, Tim Wang, Michalis Michaelos, and Marius Pachitariu. “Cellpose: A Generalist Algorithm for Cellular Segmentation.” Nature Methods 18, no. 1 (January 2021): 100–106. https://doi.org/10.1038/s41592-020-01018-x.
[2] Pachitariu, Marius, and Carsen Stringer. “Cellpose 2.0: How to Train Your Own Model.” Nature Methods 19, no. 12 (December 2022): 1634–41. https://doi.org/10.1038/s41592-022-01663-4.
[3] Ehteshami Bejnordi, Babak, Mitko Veta, Paul Johannes van Diest, Bram van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen A. W. M. van der Laak, et al. “Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.” JAMA 318, no. 22 (December 12, 2017): 2199–2210. https://doi.org/10.1001/jama.2017.14585.
See Also
blockedImage
| blockedImageDatastore
| blockLocationSet
| cellpose
| segmentCells2D
| fitcnet
(Statistics and Machine Learning Toolbox) | confusionchart
(Statistics and Machine Learning Toolbox)
Topics
- Preprocess Multiresolution Images for Training Classification Network
- Classify Tumors in Multiresolution Blocked Images
- Detect Nuclei in Large Whole Slide Images Using Cellpose
- Getting Started with Cellpose
- Set Up Spatial Referencing for Blocked Images
- Process Blocked Images Efficiently Using Mask