imsegsam

Perform automatic full image segmentation using Segment Anything Model 2 (SAM 2)

Since R2024b

Syntax

[masks,scores] = imsegsam(I)

[masks,scores] = imsegsam(I,Name=Value)

Description

Add-On Required: This feature requires one of these add-ons.

Use the imsegsam function to automatically segment an entire image or all of the objects inside a region of interest (ROI) using the Segment Anything Model 2 (SAM 2) or Segment Anything Model (SAM). The function samples a regular grid of points on an image and returns a set of predicted masks for each point, which enables the model to produce multiple masks for each object and its subregions. You can customize various segmentation settings based on your application, such as the ROI in which to segment objects, the size range of objects which to segment, and the confidence score threshold with which to filter mask predictions.

Note

To use any of the SAM 2 models, this functionality requires the Image Processing Toolbox™ Model for Segment Anything Model 2 add-on if you use any of the SAM 2 models. To use the base SAM model, this functionality requires the Image Processing Toolbox Model for Segment Anything Model add-on.

[masks,scores] = imsegsam(I) automatically segments all objects in an image, I, using the Segment Anything Model 2 (SAM 2) and returns the masks and the prediction confidence scores for each segmented object.

example

[masks,scores] = imsegsam(I,Name=Value) specifies options using one or more name-value arguments. For example, PointGridSize=[64 64] specifies the number of grid points that the imsegsam function samples along the x- and y- directions of the input image as 64 each.

example

Examples

collapse all

Automatically Segment Full Image Using Segment Anything Model 2

This example uses:

Open Live Script

Load an image into the workspace.

I = imread("pears.png");
imshow(I)

Automatically segment the full image using the Segment Anything Model 2 (SAM 2).

[masks,scores] = imsegsam(I);

Loading Large variant of the SegmentAnythingModel-2.
Loading of SegmentAnythingModel-2 Complete.

Display the masks output, which is a connected component structure.

masks

masks = struct with fields:
    Connectivity: 8
       ImageSize: [486 732]
      NumObjects: 45
    PixelIdxList: {1×45 cell}

Convert the masks to a label matrix format using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image, in the order of the smallest object masks on top, using the labeloverlay function.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay,[])

Segment Objects Within Rectangular ROI

This example uses:

Open Live Script

Load an image into the workspace.

I = imread("pears.png");
imshow(I)

Specify an ROI.

roiPosition = [50 100 350 350];
roi = drawrectangle(Position=roiPosition);

roiMask = createMask(roi);

Segment objects within the ROI using SAM.

masks = imsegsam(I,PointGridMask=roiMask);

Convert the masks to a label matrix format using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image, in the order of the smallest object masks on top, using the labeloverlay function.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay)

Customize Full Image Segmentation

This example uses:

Open Live Script

Load an image into the workspace.

I = imread("visionteam.jpg");
imshow(I)

Segment Image Using SAM

Segment the entire image by using the Segment Anything Model (SAM). Reduce the number of segmented objects by increasing the MinObjectArea name-value argument to 3000. Reduce the number of false positive objects by increasing the ScoreThreshold name-value argument to 0.8. Display the progress of the segmentation by specifying the Verbose name-value argument as true.

[masks,scores] = imsegsam(I,ModelName="sam-base",MinObjectArea=3000,ScoreThreshold=0.8,Verbose=true);

Loading SegmentAnythingModel.
Loading SegmentAnythingModel Complete.

Segmenting using Segment Anything Model.
---------------------------------------------
Processing crop 1/1. 
Processed 1024/1024 point prompts.

Display Masks in Order of Decreasing Mask Area

Convert the masks to a label matrix format by using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image by using the labeloverlay function. By default, the masks are displayed in order of decreasing area, and the smallest masks are on the top of the overlay.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay)

Display Masks in Order of Increasing Mask Area

Reverse the order of the masks so that the masks are sorted in order of increasing mask area. The masks are contained in the PixelIdxList field of the masks structure.

numObjects = masks.NumObjects;
masks.PixelIdxList = masks.PixelIdxList(numObjects:-1:1);

Convert the masks to a label matrix format by using the labelmatrix function.

labelMatrix = labelmatrix(masks);

Display the masks overlaid on the image by using the labeloverlay function. The masks are displayed in order of increasing area, and the largest masks are on the top of the overlay.

maskOverlay = labeloverlay(I,labelMatrix);
imshow(maskOverlay)

Segment Objects as Mask Stack Using Segment Anything Model

This example uses:

Open Live Script

Load an image into the workspace.

I = imread("DogTrio.jpg");
imshow(I)

Automatically segment the full image using the Segment Anything Model (SAM). To reduce the number of segmented objects, specify the MinObjectArea name-value argument as 5500. Specify the ScoreThreshold name-value argument as 0.65, and the Verbose name-value argument as false.

[masks,scores] = imsegsam(I,ModelName="sam2-baseplus",MinObjectArea=5500,ScoreThreshold=0.65,Verbose=false);

Convert masks, a connected component structure, to a stack of binary masks, maskStack.

maskStack = false(masks.ImageSize(1),masks.ImageSize(2),masks.NumObjects);
for idx = 1:masks.NumObjects
    mask = false(masks.ImageSize(1),masks.ImageSize(2));
    mask(masks.PixelIdxList{idx}) = true;
    maskStack(:,:,masks.NumObjects-idx+1) = mask;
end

Display the masks with white outlines overlaid on the image, with the smallest object masks on top, using the insertObjectMask (Computer Vision Toolbox) function.

overlayedImg = insertObjectMask(I,maskStack,"MaskColor",lines(masks.NumObjects),"LineColor","white");
imshow(overlayedImg)

Input Arguments

collapse all

`I` — Image to segment
H-by-W-by-3 array | H-by-W matrix

Image to segment, specified as one of these values.

Image Type	Data Format
Grayscale image	2-D matrix of size H-by-W.
RGB image	3-D array of size H-by-W-by-3.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: imsegsam(I,PointGridSize=[64 64]) specifies the number of grid points that the imsegsam function samples along the x- and y- directions of the input image as 64.

`ModelName` — SAM variant
`"sam2-large"` (default) | `"sam2-baseplus"` | `"sam2-small"` | `"sam2-tiny"` | `"sam-base"`

Since R2026a

SAM variant, specified as one of these values.

SAM Variant	Description
`"sam2-large"`	Selects the large SAM 2 model trained on the Segment Anything Video (SA-V) data set. This model is the largest, slowest, and most accurate SAM 2 model, suitable for high-accuracy applications such as medical imaging and video post-production. It requires the most computational resources, including a GPU with 40 to 80 GB of VRAM.
`"sam2-baseplus"`	Selects the base SAM 2 model trained on the SA-V data set. It is suitable for general applications such as land cover classification from geospatial imagery. It requires fewer computational resources than the large SAM 2 model, including a GPU with 6 to 32 GB of VRAM.
`"sam2-small"`	Selects the small SAM 2 model trained on the SA-V data set. This model balances size, speed, and accuracy, and is suitable for efficiently segmenting objects across video frames. Use this model for real-time or near-real-time video frame analysis on mid-range GPUs or edge devices.
`"sam2-tiny"`	Selects the tiny SAM 2 model trained on the SA-V data set. This model is the smaller, faster, and less accurate than other SAM 2 models. You can use this model with a standard CPU or in mobile applications, because of its fast inference.
`"sam-base"`	Selects the base SAM ViT-B model trained on the Segment Anything 1 Billion (SA-1B) data set. Use SAM instead of SAM 2 if you require compatibility with legacy systems, have hardware constraints, or need to reproduce results achieved using the base SAM model.

Note

The SAM 2 models require the Image Processing Toolbox Model for Segment Anything Model 2 add-on. The base SAM model requires the Image Processing Toolbox Model for Segment Anything Model add-on.

Data Types: char | string

`PointGridSize` — Point grid size
`[32 32]` (default) | 1-by-2 vector

Point grid size along the x- and y- directions of the image, specified as a 1-by-2 vector. The imsegsam function uses the grid points sampled along each direction as visual prompts for the SAM.

Increase the PointGridSize value for a more precise segmentation at the cost of additional processing time.

Tip

Use a higher value if your image contains small, densely packed objects relative to the image size. For example, if the PointGridSize value is [32 32] and your input image is 1024-by-1024 pixels in size, there are 32 pixels between each grid point. If the smallest object to segment is smaller than 32-by-32 pixels in size, increase the PointGridSize value to sample more grid points and ensure that imsegsam segments the smallest objects.

`PointGridMask` — ROI to segment
H-by-W logical matrix

ROI to segment, specified as an H-by-W logical matrix, where H and W are the height and width of the input image, respectively. The ROI consists of pixels in PointGridMask with value true. The imsegsam function segments objects that are fully or partially inside the ROI. Segmenting objects within an ROI can help decrease processing time and improve object localization compared to segmenting a full image.

By default, all pixels in PointGridMask are true and the ROI includes all image pixels.

Data Types: logical

`NumCropLevels` — Number of crop levels
`1` (default) | positive integer

Number of crop levels, specified as a positive integer. For each level n, the function splits the image into cropped, zoomed-in point grids of size 2⁽ⁿ^{– 1)}-by- 2⁽ⁿ^{– 1)}.

To improve the quality of smaller masks, increase the number of crop levels.

`PointBatchSize` — Point batch size
`64` (default) | positive integer

Point batch size, specified as a positive integer. Increase the batch size to increase the number of points the function batches and processes together, which can increase processing speed at the expense of higher memory usage.

Increase the batch size to improve processing speed at the expense of higher memory usage.

`PointGridDownscaleFactor` — Point grid downscale factor
`2` (default) | positive integer

Point grid downscale factor at each crop level, specified as a positive integer. For a crop level, n, the imsegsam function scales down the PointGridSize value by a factor of DF⁽ⁿ^{– 1)}, where DF is the downscale factor. If you specify NumCropLevels as a value greater than 1, you can specify a higher PointGridDownscaleFactor value to decrease computation time.

`ScoreThreshold` — Confidence score threshold
`0.5` (default) | numeric scalar in range [0, 1]

Confidence score threshold, specified as a numeric scalar in the range [0, 1]. The imsegsam function filters out predictions with confidence scores less than the threshold value. Increase this value to reduce the number of false positives, at the possible expense of missing some true positives.

`SelectStrongestThreshold` — Overlap threshold
`0.7` (default) | numeric scalar in range [0, 1]

Overlap threshold, specified as a numeric scalar in the range [0, 1]. When the overlap proportion between two object segmentations is above this value, the function removes the overlapping segmentation with the lower confidence score. Decrease the threshold to reduce the number of overlapping segmentations. However, decreasing the threshold too much can eliminate segmentations with only minor overlap in the image.

`MinObjectArea` — Minimum area of object
`0` (default) | nonnegative numeric scalar

Minimum object area to segment, in pixels, specified as a nonnegative numeric scalar. The function discards object segmentations with fewer than the specified number of pixels, which can reduce computation time.

`MaxObjectArea` — Maximum area of object
positive number

Maximum object area to segment, in pixels, specified as a positive number. The function discards object segmentations with greater than the specified number of pixels, which can reduce computation time. To reduce computation time, set this value to the largest known object area for the objects being detected in the image. The default value is 0.95*size(I,1)*size(I,2).

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

Hardware resource on which to process images with the network, specified as one of the execution environment options in this table.

`ExecutionEnvironment`	Description
`"auto"`	Use a GPU if available. Otherwise, use the CPU. The use of a GPU requires Parallel Computing Toolbox™ and a CUDA^® enabled NVIDIA^® GPU. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
`"gpu"`	Use the GPU. Using a GPU requires Parallel Computing Toolbox and a CUDA-enabled NVIDIA GPU. If Parallel Computing Toolbox or a suitable GPU is not available, then the function returns an error. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
`"cpu"`	Use the CPU.

`Verbose` — Visible progress display
`true` or `1` (default) | `false` or `0`

Visible progress display, specified as a numeric or logical 1 (true) or 0 (false).

Output Arguments

collapse all

`masks` — Object masks
structure

Object masks, returned as a structure with these fields.

Field	Description
`Connectivity`	Connectivity of the objects
`ImageSize`	Size of the binary image
`NumObjects`	Number of objects in the binary image
`PixelIdxList`	Linear indices of pixels in each object. The `PixelIdxList` field is a 1-by-`NumObjects` cell array. The k-th element in the cell array is a numeric vector that contains the linear indices of pixels in the k-th object. The elements of the cell array are sorted in order of decreasing mask area.

`scores` — Prediction scores
N-by-1 numeric vector

Prediction scores for the segmentation, returned as an N-by-1 numeric vector, where N is the number of objects detected in the input image.

Tips

For best model performance, use an image with a data range of [0, 255], such as one with a uint8 data type. If your input image has a larger data range, rescale your image to the range [0, 1] by using the rescale function and then convert the image to the uint8 data type by using the im2uint8 function.
To visualize object masks, you can display the masks as a label matrix or a stack of binary masks.
- To create a label matrix from the masks structure, use the labelmatrix function. For an example of how to visualize masks using a label matrix, see the Customize Full Image Segmentation example.
- To create a stack of binary masks, convert the masks structure to a logical array. For an example of how to create and display a stack of masks, see the Segment Objects as Mask Stack Using Segment Anything Model.

References

[1] Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. "Segment Anything," April 5, 2023. https://doi.org/10.48550/arXiv.2304.02643.

[2] Ravi, Nikhila, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, et al. “SAM 2: Segment Anything in Images and Videos.” arXiv, October 28, 2024. https://doi.org/10.48550/arXiv.2408.00714.

Version History

Introduced in R2024b

expand all

R2026a: Segment images using Segment Anything Model 2

The imsegsam function now segments images using the large Segment Anything Model 2 by default. You can specify any SAM 2 model variant or use the base SAM model by using the ModelName name-value argument. To use any of the SAM 2 models, you must install the Image Processing Toolbox Model for Segment Anything Model 2 add-on.

imsegsam

Syntax

Description

Examples

Automatically Segment Full Image Using Segment Anything Model 2

Segment Objects Within Rectangular ROI

Customize Full Image Segmentation

Segment Objects as Mask Stack Using Segment Anything Model

Input Arguments

`I` — Image to segment
H-by-W-by-3 array | H-by-W matrix

Name-Value Arguments

`ModelName` — SAM variant
`"sam2-large"` (default) | `"sam2-baseplus"` | `"sam2-small"` | `"sam2-tiny"` | `"sam-base"`

`PointGridSize` — Point grid size
`[32 32]` (default) | 1-by-2 vector

`PointGridMask` — ROI to segment
H-by-W logical matrix

`NumCropLevels` — Number of crop levels
`1` (default) | positive integer

`PointBatchSize` — Point batch size
`64` (default) | positive integer

`PointGridDownscaleFactor` — Point grid downscale factor
`2` (default) | positive integer

`ScoreThreshold` — Confidence score threshold
`0.5` (default) | numeric scalar in range [0, 1]

`SelectStrongestThreshold` — Overlap threshold
`0.7` (default) | numeric scalar in range [0, 1]

`MinObjectArea` — Minimum area of object
`0` (default) | nonnegative numeric scalar

`MaxObjectArea` — Maximum area of object
positive number

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

`Verbose` — Visible progress display
`true` or `1` (default) | `false` or `0`

Output Arguments

`masks` — Object masks
structure

`scores` — Prediction scores
N-by-1 numeric vector

Tips

References

Version History

R2026a: Segment images using Segment Anything Model 2

See Also

Functions

Apps

Topics

imsegsam

Syntax

Description

Examples

Automatically Segment Full Image Using Segment Anything Model 2

Segment Objects Within Rectangular ROI

Customize Full Image Segmentation

Segment Objects as Mask Stack Using Segment Anything Model

Input Arguments

I — Image to segment H-by-W-by-3 array | H-by-W matrix

Name-Value Arguments

ModelName — SAM variant "sam2-large" (default) | "sam2-baseplus" | "sam2-small" | "sam2-tiny" | "sam-base"

PointGridSize — Point grid size [32 32] (default) | 1-by-2 vector

PointGridMask — ROI to segment H-by-W logical matrix

NumCropLevels — Number of crop levels 1 (default) | positive integer

PointBatchSize — Point batch size 64 (default) | positive integer

PointGridDownscaleFactor — Point grid downscale factor 2 (default) | positive integer

ScoreThreshold — Confidence score threshold 0.5 (default) | numeric scalar in range [0, 1]

SelectStrongestThreshold — Overlap threshold 0.7 (default) | numeric scalar in range [0, 1]

MinObjectArea — Minimum area of object 0 (default) | nonnegative numeric scalar

MaxObjectArea — Maximum area of object positive number

ExecutionEnvironment — Hardware resource "auto" (default) | "gpu" | "cpu"

Verbose — Visible progress display true or 1 (default) | false or 0

Output Arguments

masks — Object masks structure

scores — Prediction scores N-by-1 numeric vector

Tips

References

Version History

R2026a: Segment images using Segment Anything Model 2

See Also

Functions

Apps

Topics

`I` — Image to segment
H-by-W-by-3 array | H-by-W matrix

`ModelName` — SAM variant
`"sam2-large"` (default) | `"sam2-baseplus"` | `"sam2-small"` | `"sam2-tiny"` | `"sam-base"`

`PointGridSize` — Point grid size
`[32 32]` (default) | 1-by-2 vector

`PointGridMask` — ROI to segment
H-by-W logical matrix

`NumCropLevels` — Number of crop levels
`1` (default) | positive integer

`PointBatchSize` — Point batch size
`64` (default) | positive integer

`PointGridDownscaleFactor` — Point grid downscale factor
`2` (default) | positive integer

`ScoreThreshold` — Confidence score threshold
`0.5` (default) | numeric scalar in range [0, 1]

`SelectStrongestThreshold` — Overlap threshold
`0.7` (default) | numeric scalar in range [0, 1]

`MinObjectArea` — Minimum area of object
`0` (default) | nonnegative numeric scalar

`MaxObjectArea` — Maximum area of object
positive number

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

`Verbose` — Visible progress display
`true` or `1` (default) | `false` or `0`

`masks` — Object masks
structure

`scores` — Prediction scores
N-by-1 numeric vector