Main Content

Classify Tumors in Digital Pathology Images Using Cell Nucleus Morphology

Since R2025a

This example shows how to classify whole slide image (WSI) patches as normal or cancerous using pathomics.

Digital pathology involves quantitatively analyzing digital microscopy images, or whole slide images (WSI), to diagnose cancer and other diseases. Deep learning can facilitate the segmentation, object detection, and classification of high-resolution WSIs. For example, the deep learning network used in the Classify Tumors in Multiresolution Blocked Images example classifies WSIs as containing normal or tumor tissue. However, deep learning results can be difficult to interpret because networks use abstract image features internal to their complex architecture. In contrast, pathomics focuses on training models using interpretable image features related to cell morphology, texture, and intensity, providing valuable insight into disease development and progression.

In this example, you use nuclear morphology features, such as size and circularity, to train a machine learning classification model that labels blocks of WSIs as normal or containing tumor cells. Additionally, this example addresses two common technical challenges in pathomics analyses:

  • Managing large WSIs — WSIs can have sizes on the order of 200,000-by-100,000 pixels, and often contain multiple resolution levels. This example uses blockedImage and blockedImageDatastore objects to process WSIs without loading the full images into system memory.

  • Accurately labeling cell nuclei — This example uses a pretrained nucleus segmentation model from the Cellpose Library [1] [2]. You can configure Cellpose models in MATLAB® by using the cellpose object and its object functions.

This example requires the Medical Imaging Toolbox™ Interface for Cellpose Library. You can install the Medical Imaging Toolbox Interface for Cellpose Library from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. The Medical Imaging Toolbox Interface for Cellpose Library requires Deep Learning Toolbox™ and Computer Vision Toolbox™.

About Data Set

This example uses WSIs from the Camelyon16 challenge data set [3]. The data from this challenge contains a total of 400 WSIs of lymph nodes of breast cancer patients from two independent sources, separated into 270 training images and 130 test images. The WSIs are stored as TIF files in a stripped format with an 11-level pyramid structure.

  • The training data set consists of 159 WSIs of normal lymph nodes and 111 WSIs of lymph nodes containing tumors. Ground truth coordinates of the tumor boundaries accompany the tumor images. The size of the training data set is approximately 451 GB.

  • The testing data set consists of 130 WSIs including a mix of normal and tumor tissue images. Tumor boundary coordinates are included for images with tumors. The size of the test data set is approximately 205 GB.

Download Data Set

First, create folders where you want to store the downloaded data on your machine.

dataDir = fullfile(tempdir,"Camelyon16");

trainingDir = fullfile(dataDir,"training");
testDir = fullfile(dataDir,"testing");

trainNormalDir = fullfile(trainingDir,"normal");
trainTumorDir = fullfile(trainingDir,"tumor");
trainAnnotationDir = fullfile(trainingDir,"lesion_annotations");
testImageDir = fullfile(testDir,"images");
testAnnotationDir = fullfile(testDir,"lesion_annotations");
if ~exist(dataDir,"dir")
    mkdir(trainingDir)
    mkdir(trainNormalDir)
    mkdir(trainTumorDir)
    mkdir(trainAnnotationDir)
    mkdir(testImageDir)
    mkdir(testAnnotationDir)
end

To download the data, go to the GigaDB website for the data set, and, in the table, navigate to the Files tab. Download individual files by clicking on the name in the File Name column, and move them to the correct subfolders by following these steps:

  1. Download all TIF images with filenames starting with normal_, and store them in the folder specified by the trainNormalDir variable.

  2. Download all TIF images with filenames starting with tumor_, and store them in the folder specified by the trainTumorDir variable.

  3. Download all TIF images with filenames starting with test_, and store them in the folder specified by the testImageDir variable.

  4. Download the lesion_annotations.zip file for the training data set, and extract the zipped files to the folder specified by the trainAnnotationDir variable. You can differentiate between the lesions file for the training and testing data set using the information in the Description column of the table.

  5. Download the lesion_annotations.zip file for the testing data set of CAMELYON16, and extract the zipped files to the folder specified by the testAnnotationDir variable.

Create blockedImage Objects to Manage Training Images

Get the filenames of the normal and tumor training images.

normalFileSet = matlab.io.datastore.FileSet(fullfile(trainNormalDir,"normal*"));
normalFilenames = normalFileSet.FileInfo.Filename;
tumorFileSet = matlab.io.datastore.FileSet(fullfile(trainTumorDir,"tumor*"));
tumorFilenames = tumorFileSet.FileInfo.Filename;

One of the training images of normal tissue, normal_144.tif, is missing important metadata required to deduce the physical extents of the image. Exclude this file.

normalFilenames(contains(normalFilenames,"normal_144")) = [];

Create two arrays of blockedImage objects, one for normal images and one for the tumor images. Each blockedImage object points to the corresponding image file on disk.

normalImages = blockedImage(normalFilenames);
tumorImages = blockedImage(tumorFilenames);

Display WSI Thumbnails

To get a better understanding of the training data, display thumbnails of the normal images in the Image Browser app using the browseBlockedImages helper function. This helper function is attached to the example as a supporting file.

Note that the majority of each image contains white background, and that the tissue occupies only a small portion of the image. These characteristics are typical of WSIs.

browseBlockedImages(normalImages,8)

Image Browser app window showing thumbnails of the normal whole slide images

Preprocess Training Images

Align Spatial Extents of Blocked Images

When you work with multiple resolution levels of a multiresolution image, the spatial extents of each level must match. If the spatial extents are aligned, then you can extract information, such as masks, at coarse resolution levels and correctly apply them to matching locations at finer resolution levels. For more information, see the Set Up Spatial Referencing for Blocked Images example.

Set the spatial referencing for all training data by using the setSpatialReferencingForCamelyon16 helper function. This function is attached to the example as a supporting file. The setSpatialReferencingForCamelyon16 function sets the WorldStart and WorldEnd properties of each blockedImage object using the spatial referencing information from the TIF file metadata.

normalImages = setSpatialReferencingForCamelyon16(normalImages);
tumorImages = setSpatialReferencingForCamelyon16(tumorImages);

Create Tissue Masks for Normal Images

The majority of a typical WSI consists of background pixels. To process WSI data efficiently, you can create a region of interest (ROI) mask from a coarse resolution level, and then limit computations at finer resolution levels to regions within the ROI. For more information, see the Process Blocked Images Efficiently Using Mask example.

This example uses a coarse resolution level to create masks for the training images of normal tissue.

maskLevel = 8;

Specify a target location for the tissue masks.

trainNormalMaskDir = fullfile(trainingDir,"normal_mask_level"+num2str(maskLevel));

To create the tissue masks, apply a threshold that excludes light background regions, and use imclose to fill holes in the thresholded image. Apply the threshold in a block-wise manner to all normal training images using the apply function. Save the results to disk by specifying the OutputLocation name-value argument. If the masks already exist in the specified folder, load them instead.

if ~isfolder(trainNormalMaskDir)
    normalMasks = apply(normalImages, @(bs)imclose(im2gray(bs.Data)<150, ones(5)), ...
        BlockSize=[512 512],...
        Level=maskLevel, ...
        OutputLocation=trainNormalMaskDir);
    save(fullfile(trainNormalMaskDir,"normalMasks.mat"),"normalMasks")
else
    load(fullfile(trainNormalMaskDir,"normalMasks.mat"),"normalMasks");
end

Display Tissue Mask Thumbnails

Display thumbnails of the normal tissue mask images in the Image Browser app using the browseBlockedImages helper function. This helper function is attached to the example as a supporting file. Specify the resolution level as 1 because the tissue masks have only one level.

browseBlockedImages(normalMasks,1)

Image Browser app window showing thumbnails of the tissue masks for the normal whole slide images

Display a single WSI with its tissue mask as an overlay. Before displaying the image, extract the same resolution level used to create the mask.

sampleNormalImage = gather(normalImages(1),Level=maskLevel);
imageshow(sampleNormalImage,OverlayData=normalMasks(1));

Create Tumor Masks for Tumor Images

The Camelyon16 data set provides ground truth annotations of the tumors for the tumor images. Each annotation consists of hand-drawn tumor boundary coordinates saved as an XML file.

Display Tumor Boundaries

Read the ground truth boundary coordinates and display them using the showCamelyon16TumorAnnotations helper function. This helper function is attached to the example as a supporting file. Notice that normal regions (shown with a green boundary) can occur inside tumor regions.

idxSampleTumorImage = 64;
tumorImage = tumorImages(idxSampleTumorImage);
showCamelyon16TumorAnnotations(tumorImage,trainAnnotationDir);
xlim([77810 83602])
ylim([139971 145763])

Zoomed in view of tumor region showing tumor boundary in red and normal regions inside the tumor in green

Convert Polygon Coordinates to Binary Blocked Images

Specify a target location for the tumor masks.

trainTumorMaskDir = fullfile(trainingDir,"tumor_mask_level"+num2str(maskLevel));

Create a tumor mask for each tumor image using the createMaskForCamelyon16TumorTissue helper function. This helper function is attached to the example as a supporting file. For each image, the function performs these operations.

  • Reads the (x, y) boundary coordinates for all ROIs in the annotated XML file.

  • Separates the boundary coordinates for tumor and normal tissue ROIs into corresponding cell arrays.

  • Converts the cell arrays of boundary coordinates to a binary blocked image using the polyToBlockedImage function. In the binary image, the ROI indicates tumor pixels and the background indicates normal tissue pixels. Pixels that are within both tumor and normal tissue ROIs are classified as background.

If the masks already exist in the specified folder, load them instead.

if ~isfolder(trainTumorMaskDir)
    mkdir(trainTumorMaskDir)
    tumorMasks = createMaskForCamelyon16TumorTissue( ...
        tumorImages,trainAnnotationDir,trainTumorMaskDir,maskLevel);
    save(fullfile(trainTumorMaskDir,"tumorMasks.mat"),"tumorMasks")
else
    load(fullfile(trainTumorMaskDir,"tumorMasks.mat"),"tumorMasks");
end

Display Tumor Mask Thumbnails

Display thumbnails of the tumor mask images in the Image Browser app using the browseBlockedImages helper function. This helper function is attached to the example as a supporting file. Specify the resolution level as 1 because the tissue masks have only one level.

browseBlockedImages(tumorMasks,1)

Image Browser app window showing thumbnails of tumor masks

Select Blocks for Training

In this example, you train a statistical model that classifies image blocks, or tiles, extracted from WSIs as normal or containing tumor cells. To train the model, select a subset of blocks from each of the normal and tumor images.

Specify the size and resolution level for the image tiles. For this example, create image tiles at the finest resolution level, at which individual cell nuclei are visible.

networkBlockSize = [512 512];
trainingLevel = 1;

Create sets of normal and tumor blocks using the selectBlockLocations function. This function selects blocks that are within the specified mask area. You can refine the number of selected blocks for each class by specifying the BlockOffsets and InclusionThreshold name-value arguments. Consider these two factors when calling the selectBlockLocations function.

  • Using more data helps to generalize the classifier during training and ensures a good class balance in the selected block set. Increase the number of selected blocks by decreasing the values of the BlockOffsets and InclusionThreshold name-value arguments.

  • Using more blocks requires more training time or more powerful hardware. Decrease the number of selected blocks by increasing the values of the BlockOffsets and InclusionThreshold name-value arguments.

Select blocks within the normal images using the tissue masks. This example specifies values of BlockOffsets and InclusionThreshold that result in a relatively small number of blocks.

normalOffsetFactor = 28;
normalInclusionThreshold = 0.99;

blsNormalData = selectBlockLocations(normalImages, ...
    BlockSize=networkBlockSize(1:2), ...
    BlockOffsets=round(networkBlockSize(1:2)*normalOffsetFactor), ...
    Levels=trainingLevel, ...
    Masks=normalMasks, ...
    InclusionThreshold=normalInclusionThreshold, ...
    ExcludeIncompleteBlocks=true);

Select blocks within the tumor images using the tumor masks. To ensure a balanced data set, specify a lower offset value and inclusion threshold for tumor images to include a larger proportion of tumor blocks. This is necessary because the tissue masks are larger and cover a greater total number of blocks than the tumor masks.

tumorOffsetFactor = 7;
tumorInclusionThreshold = 0.95;

blsTumorData = selectBlockLocations(tumorImages, ...
    BlockSize=networkBlockSize(1:2), ...
    BlockOffsets=round(networkBlockSize(1:2)*tumorOffsetFactor), ...
    Levels=trainingLevel, ...
    Masks=tumorMasks, ...
    InclusionThreshold=tumorInclusionThreshold, ...
    ExcludeIncompleteBlocks=true);

Create Blocked Image Datastores for Training

Create blockedImageDatastore objects to manage the collection of blocks in the blsNormalData and blsTumorData block location sets.

bimds_normal = blockedImageDatastore(normalImages,"BlockLocationSet",blsNormalData);
bimds_tumor = blockedImageDatastore(tumorImages,"BlockLocationSet",blsTumorData);

Configure Cellpose Model

Create a cellpose object for the "nuclei" pretrained model. Use the canUseGPU function to check whether a supported GPU and Parallel Computing Toolbox™ are available. By default, a cellpose object processes images on a GPU if one is available. If a GPU is not available, enable OpenVINO acceleration for best performance on the CPU.

if canUseGPU
    cp = cellpose(Model="nuclei");
else
    cp = cellpose(Model="nuclei",Acceleration="openvino");
end

Display Nuclei Masks for Representative Tissue Blocks

To get a better understanding of the nuclei segmentation masks predicted by Cellpose, segment representative normal and tumor blocks and display their nuclei labels as an overlay.

First, read one block from each of the training datastores.

sampleNormalBlock = read(bimds_normal);
sampleNormalBlock = sampleNormalBlock{1};

sampleTumorBlock = read(bimds_tumor);
sampleTumorBlock = sampleTumorBlock{1};

Predict nuclei masks for the sample blocks using the segmentNuclei helper function, which is defined at the end of this example. The helper function performs basic preprocessing, and segments the nuclei using the segmentCells2D function.

sampleLabelsNormal = segmentNuclei(sampleNormalBlock,cp);
sampleLabelsTumor = segmentNuclei(sampleTumorBlock,cp);

Display the overlays side-by-side. The nuclei in the normal tissue block appear smaller and closer together compared to the tumor block.

normalOverlay = labeloverlay(sampleNormalBlock,sampleLabelsNormal);
tumorOverlay = labeloverlay(sampleTumorBlock,sampleLabelsTumor);
imshowpair(normalOverlay,tumorOverlay,"montage")
title("Normal (left) and Tumor (right)")

Calculate Nuclear Morphology Image Features for Training Images

Segment and calculate image features for all of the training image blocks by using the calculateNuclearFeatures helper function. The helper function is defined at the end of this example, and performs these steps for each block in an input blockedImageDatastore:

  • Preprocesses the image block, and segments all cell nuclei using the "nuclei" Cellpose model by using the segmentCells2D function.

  • For each nucleus, calculates the area, major axis length, minor axis length, equivalent diameter, eccentricity, and circularity by using the regionprops function. The parameters used in this example have been selected based on visual differences between normal and tumor cells, and they represent a subset of potential pathomics biomarkers to explore.

  • Calculates the mean value for each parameter within the block, averaging the results across individual nuclei. Additionally, calculates the number of nuclei within the block.

  • Returns a structure containing a field for each parameter and a row for each block in the datastore.

Process the normal and tumor training image blocks. This can take several minutes to an hour depending on your system configuration.

statsTrainingNormal = calculateNuclearFeatures(bimds_normal,cp);
statsTrainingTumor = calculateNuclearFeatures(bimds_tumor,cp);

Visualize Nuclear Features for Training Images

Combine the output statistics for all training images. Create a vector of strings that labels each training image block as "normal" or "tumor".

statsTraining = [statsTrainingTumor, statsTrainingNormal];
labels = strings(length(statsTraining),1);
labels(:) = "normal";
labels(1:length(statsTrainingTumor)) = "tumor";

Visualize each nuclear morphology feature alongside its label.

The first bar in the visualization shows the ground truth classification, with tumor blocks in yellow and normal tissue blocks in blue. The remaining bars show each of the nuclear morphology features. Note that the colorbars are independently scaled within each variable. Within each feature, there is visual separation in the distribution of values for tumor versus normal blocks. Nuclear area, major axis length, minor axis length, and equivalent diameter are all greater for tumor blocks, consistent with larger cell nuclei. Nuclear count is higher for normal cells, consistent with decreased cell density within tumors. There is a more subtle difference in eccentricity and circularity, consistent with more circular nuclei in normal cells and elongated nuclei in tumor cells. These quantitative differences in nuclear morphology are consistent with visual differences observed for the representative blocks in the training datastores.

featureNames = fieldnames(statsTraining);
f = length(featureNames);
statsTraining = struct2table(statsTraining);
statsTraining = table2array(statsTraining);

labelsTrainLogical = labels == "tumor";

figure(Position=[0 0 1500 500]);
tiledlayout(1,f + 1)
nexttile
imagesc(labelsTrainLogical)
xticklabels({})
yticklabels({})
ylabel("Malignancy",Interpreter="none")
for i = 1:f
    nexttile
    imagesc(statsTraining(:,i))
    xticklabels({})
    yticklabels({})
    ylabel(featureNames{i},Interpreter="none")
end

Train Classifier

Train a classification model that classifies blocks as normal or containing tumor cells. Statistics and Machine Learning Toolbox™ provides several model types. You can use the fitcauto function to try several models with different hyperparameters. The function returns the best model for your data, and prints the hyperparameters to pass to the training function for the selected model type. This example uses a model selected using this syntax.

model = fitcauto(statsTraining,labels);

Based on the output of fitcauto for this training data set, train a neural network classification model using the fitcnet function. This example uses a network with sigmoid activations and three fully connected layers with 4, 8, and 292 outputs, respectively.

model = fitcnet(statsTraining,labels,Activations="sigmoid", ...
    Standardize=true,Lambda=0.000045945,LayerSizes=[4 8 292]);

Evaluate Classifier Accuracy

Create blockedImage Objects to Manage Test Images

Get the filenames of the test images. Then, create an array of blockedImage objects that manage the test images. Each blockedImage object points to the corresponding image file on disk. Then, set the spatial referencing for all testing data by using the setSpatialReferencingForCamelyon16 helper function. This function is attached to the example as a supporting file. The setSpatialReferencingForCamelyon16 function sets the WorldStart and WorldEnd properties of each blockedImage object using the spatial referencing information from the TIF file metadata.

testFileSet = matlab.io.datastore.FileSet(testImageDir);
testImages = blockedImage(testFileSet);
testImages = setSpatialReferencingForCamelyon16(testImages);

Create Tissue Masks for Test Images

Similar to the training data set, create binary tissue masks that include tissue and exclude light background areas using a coarse resolution level. Create tissue masks for all test images, including those with tumors.

To create the tissue masks, apply a threshold that excludes light background regions, and use imclose to fill holes in the thresholded image. Apply the threshold in a block-wise manner to all normal training images by using the apply function. Save the results to disk by specifying the OutputLocation name-value argument. If the masks already exist in the specified folder, load them instead.

testTissueMaskDir = fullfile(testDir,"test_tissue_mask_level"+num2str(maskLevel));

if ~isfolder(testTissueMaskDir)
    testTissueMasks = apply(testImages, @(bs)imclose(im2gray(bs.Data)<150, ones(5)), ...
        BlockSize=[512 512], ...
        Level=maskLevel, ...
        OutputLocation=testTissueMaskDir);
    save(fullfile(testTissueMaskDir,"testTissueMasks.mat"),"testTissueMasks")
else
    load(fullfile(testTissueMaskDir,"testTissueMasks.mat"),"testTissueMasks");
end

Create Tumor Masks for Test Images

Similar to the training data set, create binary tumor masks using the ground truth annotations of the tumors provided with the Camelyon 16 data set. The test data set includes both normal and tumor images.

Specify a target location for the tumor masks.

testTumorMaskDir = fullfile(testDir,"test_tumor_mask_level"+num2str(maskLevel));

Create the tumor masks using the createMaskForCamelyon16TumorTissue helper function. This helper function is attached to the example as a supporting file. If the masks already exist in the specified folder, load them instead. Note that the function returns warnings for test images that do not include annotation files because the test data set includes some images with only normal tissue.

if ~isfolder(testTumorMaskDir)
    mkdir(testTumorMaskDir)
    testTumorMasks = createMaskForCamelyon16TumorTissue(testImages, ...
        testAnnotationDir,testTumorMaskDir,maskLevel);    
    save(fullfile(testTumorMaskDir,"testTumorMasks.mat"),"testTumorMasks")
else
    load(fullfile(testTumorMaskDir,"testTumorMasks.mat"),"testTumorMasks");
end

Create Normal Tissue Masks for Test Images

The test data set includes both tumor and normal images. To isolate regions with only normal tissue, create a third set of masks, testNormalMasks, that includes non-tumor tissue regions. Create the masks by using the apply function, which processes the masks block-by-block with operations defined by the subtractMasks helper function. The subtractMasks function calculates the Boolean subtraction of the tumor mask from the tissue mask.

for i = 1:numel(testTissueMasks)
    testNormalMasks(i) = apply(testTissueMasks(i),@subtractMasks,ExtraImages=testTumorMasks(i));
end

function normalMask = subtractMasks(bsTissue,bwTumor)
bwTissue = bsTissue.Data;
% Resize the tumor block to ensure it is the same size as the tissue block
bwTumor = imresize(bwTumor,size(bwTissue),"nearest");
normalMask = bwTissue&(~bwTumor);
end

Select Blocks for Testing

Similar to the training data, create sets of normal and tumor blocks using the selectBlockLocations function.

Use the tumor masks to select blocks within the test images that contain tumor tissue. This example specifies values of BlockOffsets and InclusionThreshold that result in a relatively small number of blocks.

testTumorOffsetFactor = 10;
testTumorInclusionThreshold = 0.95;

blsTestTumorData = selectBlockLocations(testImages, ...
    BlockSize=networkBlockSize(1:2), ...
    BlockOffsets=round(networkBlockSize(1:2)*testTumorOffsetFactor), ...
    Levels=trainingLevel, ...
    Masks=testTumorMasks, ...
    InclusionThreshold=testTumorInclusionThreshold, ...
    ExcludeIncompleteBlocks=true);

Use the normal masks to select blocks within the test images that contain normal tissue. This example specifies values of BlockOffsets and InclusionThreshold that result in a relatively small number of blocks.

testNormalOffsetFactor = 32;
testNormalInclusionThreshold = 0.99;

blsTestNormalData = selectBlockLocations(testImages, ...
    BlockSize=networkBlockSize(1:2), ...
    BlockOffsets=round(networkBlockSize(1:2)*testNormalOffsetFactor), ...
    Levels=trainingLevel, ...
    Masks=testNormalMasks, ...
    InclusionThreshold=testNormalInclusionThreshold, ...
    ExcludeIncompleteBlocks=true);

Create Blocked Image Datastores for Testing

Create blockedImageDatastore objects to manage the blocks in the blsTestTumorData and blsTestNormalData block location sets.

bimds_testTumor = blockedImageDatastore(testImages,"BlockLocationSet",blsTestTumorData);
bimds_testNormal = blockedImageDatastore(testImages,"BlockLocationSet",blsTestNormalData);

Calculate Nuclear Morphology Features for Test Images

Similar to the process used for the training data, segment and calculate image features for all of the testing image blocks by using the calculateNuclearFeatures helper function. The helper function is defined at the end of this example. This can take several minutes to an hour depending on your system configuration.

statsTestNormal = calculateNuclearFeatures(bimds_testNormal,cp);
statsTestTumor = calculateNuclearFeatures(bimds_testTumor,cp);

Combine the output statistics for all testing image blocks.

StatsTest = [statsTestTumor, statsTestNormal];

Classify Test Data Using Trained Classifier

Use the trained classifier model to predict whether each testing image block contains only normal or some tumor tissue. First, convert the output statistics from a table to an array, which is the format expected by the predict (Statistics and Machine Learning Toolbox) function. Then, use the predict function to make predictions for the testing image blocks.

StatsTest = struct2table(StatsTest);
StatsTest = table2array(StatsTest);

predictedLabelsTest = predict(model,StatsTest);

Display Confusion Matrix

To serve as known ground truth labels for the confusion matrix, create a vector of strings that labels each testing image block as "normal" or "tumor".

labelsTest = strings(length(StatsTest),1);
labelsTest(:) = "normal";
labelsTest(1:length(statsTestTumor)) = "tumor";

Calculate the confusion matrix from the known ground truth labels and the predicted labels.

confusionMatrix = confusionmat(labelsTest,predictedLabelsTest)
confusionMatrix = 2×2

   494    61
    51   512

Calculate the overall accuracy of the predicted labels from the confusion matrix.

accuracy = sum(diag(confusionMatrix))/sum(confusionMatrix,"all")
accuracy = 
0.8998

Display a normalized confusion matrix chart. Normalize values by row, such that cells sum to 100% across each row. For example, the top row shows the percentage of actually normal blocks that the model correctly predicted to be normal on the left, and the percentage of normal blocks the model incorrectly predicted to contain tumor cells on the right.

figure
confusionchart(labelsTest,string(predictedLabelsTest),Normalization="row-normalized")

Helper Functions

The segmentNuclei helper function segments the nuclei in an input image block, sampleBlock, using the Cellpose model specified by cp. The function preprocesses the input image for segmentation, including converting it to grayscale, rescaling its intensities to the range [0, 255], and inverting the image. The function assumes an average diameter for the model to segment of 20 pixels.

function labels = segmentNuclei(sampleBlock,cp)
img = im2gray(sampleBlock);
gmin = 0;
gmax = 255;
img = rescale(img,InputMin=gmin,InputMax=gmax);
imgc = imcomplement(img);

averageNucleusDiameter = 20;
labels = segmentCells2D(cp,imgc,ImageCellDiameter=averageNucleusDiameter,NormalizeInput=false);
end

The calculateNuclearFeatures helper function segments nuclei and calculates nuclear image features for all image blocks in a blockedImagedatastore using the Cellpose model cp. The helper function performs these steps for each block in the datastore:

  • Preprocesses the image block, and segments all cell nuclei using the "nuclei" Cellpose model by using the segmentCells2D function.

  • For each nucleus, calculates the centroid, area, major axis length, minor axis length, equivalent diameter, eccentricity, and circularity by using the regionprops function. The parameters used in this example were selected based on visual differences between normal and tumor cells, and represent a subset of potential pathomics biomarkers to explore.

  • Calculates the mean value for each parameter within the block, averaging the results across individual nuclei. Additionally, calculate the number of nuclei within the block.

  • Returns a structure containing a field for each parameter, in which each row corresponds to a block in the datastore.

function stats = calculateNuclearFeatures(bimds,cp)

% Specify parameters for cellpose model
averageNucleusDiameter = 20;
gmin = 0;
gmax = 255;

% Reset the datastore
reset(bimds)

% For each block in the datastore
idx = 1;
while hasdata(bimds)

    % Get block
    block_i = read(bimds);

    % Preprocess block for segmentation
    img = im2gray(block_i{1});
    img = rescale(img,InputMin=gmin,InputMax=gmax);
    imgc = imcomplement(img);

    % Segment nuclei
    labels = segmentCells2D(cp,imgc, ...
        ImageCellDiameter=averageNucleusDiameter, ...
        NormalizeInput=false);

    % Calculate morphological parameters for each nucleus in the block
    rstats = regionprops(labels,["Centroid","Area","MajorAxisLength","MinorAxisLength", ...
        "EquivDiameter","Eccentricity","Circularity"]);

    % Calculate the mean value for each parameter within the block, and
    % save them in a new row of the output structure stats
    stats(idx).Area = mean([rstats.Area]);
    stats(idx).MajorAxisLength = mean([rstats.MajorAxisLength]);
    stats(idx).MinorAxisLength = mean([rstats.MinorAxisLength]);
    stats(idx).EquivDiameter = mean([rstats.EquivDiameter]);
    stats(idx).Eccentricity = mean([rstats.Eccentricity]);
    stats(idx).Circularity = mean([rstats.Circularity]);
    stats(idx).Count = numel(rstats);

    idx = idx + 1;
end

end

References

[1] Stringer, Carsen, Tim Wang, Michalis Michaelos, and Marius Pachitariu. “Cellpose: A Generalist Algorithm for Cellular Segmentation.” Nature Methods 18, no. 1 (January 2021): 100–106. https://doi.org/10.1038/s41592-020-01018-x.

[2] Pachitariu, Marius, and Carsen Stringer. “Cellpose 2.0: How to Train Your Own Model.” Nature Methods 19, no. 12 (December 2022): 1634–41. https://doi.org/10.1038/s41592-022-01663-4.

[3] Ehteshami Bejnordi, Babak, Mitko Veta, Paul Johannes van Diest, Bram van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen A. W. M. van der Laak, et al. “Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.” JAMA 318, no. 22 (December 12, 2017): 2199–2210. https://doi.org/10.1001/jama.2017.14585.

See Also

| | | | | (Statistics and Machine Learning Toolbox) | (Statistics and Machine Learning Toolbox)

Topics

External Websites