Instance Segmentation

Label ground truth and perform instance segmentation using pretrained AI models like SOLOv2, Mask R-CNN, and SAM, or train custom networks with transfer learning

Instance Segmentation tools in Computer Vision Toolbox™ enable you to detect, classify, and segment individual objects within an image, even when multiple objects are overlapping. You can start by creating labeled ground truth using the Image Labeler and Video Labeler apps, which support interactive and AI-assisted annotation of object instances with polygons or rectangle ROIs. For more information, see Label Objects Using Polygons for Instance Segmentation.

The toolbox provides pretrained instance segmentation networks such as SOLOv2 and Mask R-CNN. You can use these models directly for inference or adapt them to specific applications through transfer learning. For more information, see Get Started with Instance Segmentation Using Deep Learning and Get Started with SOLOv2 for Instance Segmentation. For class agnostic instance segmentation, the toolbox supports the Segment Anything Model (SAM) through the imsegsam function and the segmentAnythingModel object.

To prepare training data, the toolbox offers utilities for managing and organizing data sets along with data augmentation and preprocessing. For more information, see Postprocess Exported Labels for Instance Segmentation Training.

After you generate predictions using pretrained or custom models, you can evaluate instance segmentation performance and generate detailed insights into segmentation accuracy, object-level precision, and performance across different object sizes. These metrics help assess the quality of both mask predictions and bounding box localization. For more information, see evaluateInstanceSegmentation.

The toolbox also supports 3-D object pose estimation using instance segmentation through the Pose Mask R-CNN framework, enabling fine-grained analysis of object orientation and structure. For more information, see Perform 6-DoF Pose Estimation for Bin Picking Using Deep Learning.

Instance segmentation using SOLOv2: Left — A segmented and labeled road scenario using a sample modified RGB image from the CamVid data set, Right — A segmented image of PVC pipe connectors

Apps

Image Labeler	Label images for computer vision applications
Video Labeler	Label video for computer vision applications

Functions

expand all

Pretrained Instance Segmentation Networks

SOLOv2

`solov2`	Segment objects using SOLOv2 instance segmentation network (Since R2023b)
`segmentObjects`	Segment objects using SOLOv2 instance segmentation (Since R2023b)

Mask R-CNN

`maskrcnn`	Detect objects using Mask R-CNN instance segmentation (Since R2021b)
`segmentObjects`	Segment objects using Mask R-CNN instance segmentation (Since R2021b)

Segment Anything Model (SAM)

`imsegsam`	Perform automatic full image segmentation using Segment Anything Model 2 (SAM 2) (Since R2024b)
`segmentAnythingModel`	Pretrained Segment Anything Model 2 (SAM 2) for image segmentation (Since R2024a)

Train Custom Instance Segmentation Networks

Load Training Data

`boxLabelDatastore`	Datastore for bounding box label data
`groundTruth`	Ground truth label data
`imageDatastore`	Datastore for image data
`combine`	Combine data from multiple datastores

Train Instance Segmentation Networks

`trainSOLOV2`	Train SOLOv2 network to perform instance segmentation (Since R2023b)
`trainMaskRCNN`	Train Mask R-CNN network to perform instance segmentation (Since R2022a)

Augment and Preprocess Training Data

`poly2mask`	Convert region of interest (ROI) polygon to region mask
`bwboundaries`	Trace object boundaries in binary image
`balanceBoxLabels`	Balance bounding box labels for object detection
`bboxcrop`	Crop bounding boxes
`bboxerase`	Remove bounding boxes
`bboxresize`	Resize bounding boxes
`bboxwarp`	Apply geometric transformation to bounding boxes
`bbox2points`	Convert rectangle to corner points list
`imwarp`	Apply geometric transformation to image
`imcrop`	Crop image
`imresize`	Resize image
`randomAffine2d`	Create randomized 2-D affine transformation
`centerCropWindow2d`	Create rectangular center cropping window
`randomWindow2d`	Randomly select rectangular region in image

Evaluate Predicted Results

`evaluateInstanceSegmentation`	Evaluate instance segmentation data set against ground truth (Since R2022b)
`instanceSegmentationMetrics`	Instance segmentation quality metrics (Since R2022b)
`metricsByArea`	Evaluate instance segmentation across object mask size ranges (Since R2023b)

Visualize Results

`insertObjectMask`	Insert masks in image or video stream
`insertObjectAnnotation`	Annotate truecolor or grayscale image or video
`insertShape`	Insert shapes in image or video
`insertText`	Insert text in image or video
`showShape`	Display shapes on image, video, or point cloud

Perform Pose Estimation Using Instance Segmentation

`posemaskrcnn`	Predict object pose using Pose Mask R-CNN pose estimation (Since R2024a)
`predictPose`	Estimate object pose using Pose Mask R-CNN deep learning network (Since R2024a)
`trainPoseMaskRCNN`	Train Pose Mask R-CNN network to perform pose estimation (Since R2024a)

Topics

Get Started

Get Started with Instance Segmentation Using Deep Learning
Segment objects using an instance segmentation model such as SOLOv2 or Mask R-CNN.
Get Started with SOLOv2 for Instance Segmentation
Perform multiclass instance segmentation using SOLOv2 and deep learning.
Getting Started with Mask R-CNN for Instance Segmentation
Perform multiclass instance segmentation using Mask R-CNN and deep learning.
Get Started with Segment Anything Model for Image Segmentation
Perform interactive image segmentation using Segment Anything Model 2 (SAM 2) and deep learning.

Create Ground Truth for Instance Segmentation

Label Objects Using Polygons for Instance Segmentation
Label ground truth objects using polygons for instance segmentation.
Postprocess Exported Labels for Instance Segmentation Training
Postprocess exported ground truth labels and create training datastore for training instance segmentation networks such as SOLOv2 or Mask R-CNN.

Prepare Training Data for Instance Segmentation

Create Instance Segmentation Training Data From Ground Truth
This example shows how to create instance segmentation training data from a groundTruth object.
Get Started with Image Preprocessing and Augmentation for Deep Learning
Preprocess data for deep learning applications with deterministic operations such as resizing, or augment training data with randomized operations such as random cropping.
Datastores for Deep Learning (Deep Learning Toolbox)
Learn how to use datastores in deep learning applications.

Featured Examples

New

Automate Ground Truth Polygon Labeling Using Grounded SAM Model

Combine Grounding DINO and the Segment Anything Model 2 (SAM 2) to automatically produce polygon labels using the Video Labeler app.

Since R2026a
Open Live Script

New

Automate Ground Truth Labeling for Instance Segmentation

Create an automation algorithm to automatically label data for instance segmentation using a pretrained SOLOv2 network in the Video Labeler app.

Since R2026a
Open Live Script

New

Automatically Search and Label Video Frames Using VLMs

Automatically search and detect objects based on natural language text queries using vision-language models (VLMs).

Since R2026a
Open Live Script

Perform Instance Segmentation Using SOLOv2

Segment object instances of randomly rotated machine parts in a bin using a deep learning SOLOv2 network.

Open Live Script

Perform Instance Segmentation Using Mask R-CNN

Segment individual instances of people and cars using a multiclass mask region-based convolutional neural network (R-CNN).

Open Live Script

Automatically Label Ground Truth Using Segment Anything Model

Produce pixel labels for semantic segmentation using the Segment Anything Model (SAM) in the Image Labeler app. The SAM is an automatic segmentation technique that you can use to segment object regions to label with just a few clicks, or automatically segment the entire image and instantaneously create labels for selected regions. In this example, you interactively label pixels for semantic segmentation in two ways.

Since R2024b
Open Live Script

Segment Objects in Interactive ROI Using Segment Anything Model

Perform interactive segmentation of an object in a selected region of interest (ROI) of an image using the Segment Anything Model (SAM).

Perform 6-DoF Pose Estimation for Bin Picking Using Deep Learning

Perform six degrees-of-freedom (6-DoF) pose estimation by estimating the 3-D position and orientation of machine parts in a bin using RGB-D images and a deep learning network.

Open Live Script