Contenido principal

imsegsam

Perform automatic full image segmentation using Segment Anything Model 2 (SAM 2)

Since R2024b

Description

Use the imsegsam function to automatically segment an entire image or all of the objects inside a region of interest (ROI) using the Segment Anything Model 2 (SAM 2) or Segment Anything Model (SAM). The function samples a regular grid of points on an image and returns a set of predicted masks for each point, which enables the model to produce multiple masks for each object and its subregions. You can customize various segmentation settings based on your application, such as the ROI in which to segment objects, the size range of objects which to segment, and the confidence score threshold with which to filter mask predictions.

Note

To use any of the SAM 2 models, this functionality requires the Image Processing Toolbox™ Model for Segment Anything Model 2 add-on if you use any of the SAM 2 models. To use the base SAM model, this functionality requires the Image Processing Toolbox Model for Segment Anything Model add-on.

[masks,scores] = imsegsam(I) automatically segments all objects in an image, I, using the Segment Anything Model 2 (SAM 2) and returns the masks and the prediction confidence scores for each segmented object.

example

[masks,scores] = imsegsam(I,Name=Value) specifies options using one or more name-value arguments. For example, PointGridSize=[64 64] specifies the number of grid points that the imsegsam function samples along the x- and y- directions of the input image as 64 each.

example

Examples

collapse all

Load an image into the workspace.

I = imread("pears.png");
imshow(I)

Automatically segment the full image using the Segment Anything Model 2 (SAM 2).

[masks,scores] = imsegsam(I);
Loading Large variant of the SegmentAnythingModel-2.
Loading of SegmentAnythingModel-2 Complete.

Display the masks output, which is a connected component structure.

masks
masks = struct with fields:
    Connectivity: 8
       ImageSize: [486 732]
      NumObjects: 45
    PixelIdxList: {1×45 cell}

Convert the masks to a label matrix format using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image, in the order of the smallest object masks on top, using the labeloverlay function.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay,[])

Load an image into the workspace.

I = imread("pears.png");
imshow(I)

Specify an ROI.

roiPosition = [50 100 350 350];
roi = drawrectangle(Position=roiPosition);

roiMask = createMask(roi);

Segment objects within the ROI using SAM.

masks = imsegsam(I,PointGridMask=roiMask);

Convert the masks to a label matrix format using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image, in the order of the smallest object masks on top, using the labeloverlay function.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay)

Load an image into the workspace.

I = imread("visionteam.jpg");
imshow(I)

Segment Image Using SAM

Segment the entire image by using the Segment Anything Model (SAM). Reduce the number of segmented objects by increasing the MinObjectArea name-value argument to 3000. Reduce the number of false positive objects by increasing the ScoreThreshold name-value argument to 0.8. Display the progress of the segmentation by specifying the Verbose name-value argument as true.

[masks,scores] = imsegsam(I,ModelName="sam-base",MinObjectArea=3000,ScoreThreshold=0.8,Verbose=true);
Loading SegmentAnythingModel.
Loading SegmentAnythingModel Complete.

Segmenting using Segment Anything Model.
---------------------------------------------
Processing crop 1/1. 
Processed 1024/1024 point prompts.

Display Masks in Order of Decreasing Mask Area

Convert the masks to a label matrix format by using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image by using the labeloverlay function. By default, the masks are displayed in order of decreasing area, and the smallest masks are on the top of the overlay.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay)

Display Masks in Order of Increasing Mask Area

Reverse the order of the masks so that the masks are sorted in order of increasing mask area. The masks are contained in the PixelIdxList field of the masks structure.

numObjects = masks.NumObjects;
masks.PixelIdxList = masks.PixelIdxList(numObjects:-1:1);

Convert the masks to a label matrix format by using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image by using the labeloverlay function. The masks are displayed in order of increasing area, and the largest masks are on the top of the overlay.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay)

Load an image into the workspace.

I = imread("DogTrio.jpg");
imshow(I)

Automatically segment the full image using the Segment Anything Model (SAM). To reduce the number of segmented objects, specify the MinObjectArea name-value argument as 5500. Specify the ScoreThreshold name-value argument as 0.65, and the Verbose name-value argument as false.

[masks,scores] = imsegsam(I,ModelName="sam2-baseplus",MinObjectArea=5500,ScoreThreshold=0.65,Verbose=false);

Convert masks, a connected component structure, to a stack of binary masks, maskStack.

maskStack = false(masks.ImageSize(1),masks.ImageSize(2),masks.NumObjects);
for idx = 1:masks.NumObjects
    mask = false(masks.ImageSize(1),masks.ImageSize(2));
    mask(masks.PixelIdxList{idx}) = true;
    maskStack(:,:,masks.NumObjects-idx+1) = mask;
end

Display the masks with white outlines overlaid on the image, with the smallest object masks on top, using the insertObjectMask (Computer Vision Toolbox) function.

overlayedImg = insertObjectMask(I,maskStack,"MaskColor",lines(masks.NumObjects),"LineColor","white");
imshow(overlayedImg)

Input Arguments

collapse all

Image to segment, specified as one of these values.

Image TypeData Format
Grayscale image2-D matrix of size H-by-W.
RGB image3-D array of size H-by-W-by-3.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: imsegsam(I,PointGridSize=[64 64]) specifies the number of grid points that the imsegsam function samples along the x- and y- directions of the input image as 64.

Since R2026a

SAM variant, specified as one of these values.

SAM VariantDescription
"sam2-large"Selects the large SAM 2 model trained on the Segment Anything Video (SA-V) data set. This model is the largest, slowest, and most accurate SAM 2 model, suitable for high-accuracy applications such as medical imaging and video post-production. It requires the most computational resources, including a GPU with 40 to 80 GB of VRAM.
"sam2-baseplus"Selects the base SAM 2 model trained on the SA-V data set. It is suitable for general applications such as land cover classification from geospatial imagery. It requires fewer computational resources than the large SAM 2 model, including a GPU with 6 to 32 GB of VRAM.
"sam2-small"Selects the small SAM 2 model trained on the SA-V data set. This model balances size, speed, and accuracy, and is suitable for efficiently segmenting objects across video frames. Use this model for real-time or near-real-time video frame analysis on mid-range GPUs or edge devices.
"sam2-tiny"Selects the tiny SAM 2 model trained on the SA-V data set. This model is the smaller, faster, and less accurate than other SAM 2 models. You can use this model with a standard CPU or in mobile applications, because of its fast inference.
"sam-base"Selects the base SAM ViT-B model trained on the Segment Anything 1 Billion (SA-1B) data set. Use SAM instead of SAM 2 if you require compatibility with legacy systems, have hardware constraints, or need to reproduce results achieved using the base SAM model.

Note

The SAM 2 models require the Image Processing Toolbox Model for Segment Anything Model 2 add-on. The base SAM model requires the Image Processing Toolbox Model for Segment Anything Model add-on.

Data Types: char | string

Point grid size along the x- and y- directions of the image, specified as a 1-by-2 vector. The imsegsam function uses the grid points sampled along each direction as visual prompts for the SAM.

Increase the PointGridSize value for a more precise segmentation at the cost of additional processing time.

Tip

Use a higher value if your image contains small, densely packed objects relative to the image size. For example, if the PointGridSize value is [32 32] and your input image is 1024-by-1024 pixels in size, there are 32 pixels between each grid point. If the smallest object to segment is smaller than 32-by-32 pixels in size, increase the PointGridSize value to sample more grid points and ensure that imsegsam segments the smallest objects.

ROI to segment, specified as an H-by-W logical matrix, where H and W are the height and width of the input image, respectively. The ROI consists of pixels in PointGridMask with value true. The imsegsam function segments objects that are fully or partially inside the ROI. Segmenting objects within an ROI can help decrease processing time and improve object localization compared to segmenting a full image.

By default, all pixels in PointGridMask are true and the ROI includes all image pixels.

Data Types: logical

Number of crop levels, specified as a positive integer. For each level n, the function splits the image into cropped, zoomed-in point grids of size 2(n – 1)-by- 2(n – 1).

To improve the quality of smaller masks, increase the number of crop levels.

Point batch size, specified as a positive integer. Increase the batch size to increase the number of points the function batches and processes together, which can increase processing speed at the expense of higher memory usage.

Increase the batch size to improve processing speed at the expense of higher memory usage.

Point grid downscale factor at each crop level, specified as a positive integer. For a crop level, n, the imsegsam function scales down the PointGridSize value by a factor of DF(n – 1), where DF is the downscale factor. If you specify NumCropLevels as a value greater than 1, you can specify a higher PointGridDownscaleFactor value to decrease computation time.

Confidence score threshold, specified as a numeric scalar in the range [0, 1]. The imsegsam function filters out predictions with confidence scores less than the threshold value. Increase this value to reduce the number of false positives, at the possible expense of missing some true positives.

Overlap threshold, specified as a numeric scalar in the range [0, 1]. When the overlap proportion between two object segmentations is above this value, the function removes the overlapping segmentation with the lower confidence score. Decrease the threshold to reduce the number of overlapping segmentations. However, decreasing the threshold too much can eliminate segmentations with only minor overlap in the image.

Minimum object area to segment, in pixels, specified as a nonnegative numeric scalar. The function discards object segmentations with fewer than the specified number of pixels, which can reduce computation time.

Maximum object area to segment, in pixels, specified as a positive number. The function discards object segmentations with greater than the specified number of pixels, which can reduce computation time. To reduce computation time, set this value to the largest known object area for the objects being detected in the image. The default value is 0.95*size(I,1)*size(I,2).

Hardware resource on which to process images with the network, specified as one of the execution environment options in this table.

ExecutionEnvironmentDescription
"auto"Use a GPU if available. Otherwise, use the CPU. The use of a GPU requires Parallel Computing Toolbox™ and a CUDA® enabled NVIDIA® GPU. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
"gpu"Use the GPU. Using a GPU requires Parallel Computing Toolbox and a CUDA-enabled NVIDIA GPU. If Parallel Computing Toolbox or a suitable GPU is not available, then the function returns an error. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
"cpu"Use the CPU.

Visible progress display, specified as a numeric or logical 1 (true) or 0 (false).

Output Arguments

collapse all

Object masks, returned as a structure with these fields.

FieldDescription
ConnectivityConnectivity of the objects
ImageSizeSize of the binary image
NumObjectsNumber of objects in the binary image
PixelIdxList

Linear indices of pixels in each object. The PixelIdxList field is a 1-by-NumObjects cell array. The k-th element in the cell array is a numeric vector that contains the linear indices of pixels in the k-th object. The elements of the cell array are sorted in order of decreasing mask area.

Prediction scores for the segmentation, returned as an N-by-1 numeric vector, where N is the number of objects detected in the input image.

Tips

  • For best model performance, use an image with a data range of [0, 255], such as one with a uint8 data type. If your input image has a larger data range, rescale your image to the range [0, 1] by using the rescale function and then convert the image to the uint8 data type by using the im2uint8 function.

  • To visualize object masks, you can display the masks as a label matrix or a stack of binary masks.

References

[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.

[2] Ravi, Nikhila, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, et al. “SAM 2: Segment Anything in Images and Videos.” arXiv, October 28, 2024. https://doi.org/10.48550/arXiv.2408.00714.

Version History

Introduced in R2024b

expand all