Multiple Object Tracking Tutorial

Open Live Script

This example shows how to perform automatic detection and motion-based tracking of moving objects in a video using the multiObjectTracker System object™.

Moving object detection and motion-based tracking are important components of automated driver assistance systems such as adaptive cruise control, automatic emergency braking, and autonomous driving. You can divide motion-based object tracking into two parts:

Detecting moving objects in each frame.
Tracking the moving objects from frame to frame.

Use a pretrained aggregate channel features (ACF) vehicle detector to detect the moving objects.

Then, use the multiObjectTracker object to track the moving objects from frame to frame. The multiObjectTracker object is responsible for:

Assigning detections to tracks.
Initializing new tracks based on unassigned detections.
Confirming tracks if they have more than M assigned detections in N frames.
Updating existing tracks based on assigned detections.
Coasting (predicting) existing unassigned tracks.
Deleting tracks if they have remained unassigned (coasted) for too long.

In this example, you track vehicles in the frame of the camera, measuring vehicle positions in pixels and time in frame counts. You estimate the motion of each track using a Kalman filter. The filter predicts the pixel location of the track in each frame, and determines the likelihood of each detection being assigned to each track. To initialize the filter that you design, use the FilterInitializationFcn property of the multiObjectTracker.

For more information, see Multiple Object Tracking.

Set Up Vehicle Detector and Video Objects

Create an ACF vehicle detector, pretrained with unoccluded images from the front and rear sides of vehicles.

detector = vehicleDetectorACF("front-rear-view");

Create objects to read and display the video frames.

vReader = VideoReader("05_highway_lanechange_25s.mp4");
trackPlayer = vision.VideoPlayer(Position=[700 400 700 400]);

Create Multi-Object Tracker

Create a multiObjectTracker, specifying these properties:

FilterInitializationFcn — Function that specifies the motion model and measurement model for the Kalman filter. In this example, because you expect the vehicles to have a constant velocity, specify the helperInitDemoFilter function, which configures a linear Kalman filter to track the vehicle motion. For more information, see Supporting Functions section.
AssignmentThreshold — Maximum accurate normalized distance from a track at which the tracker can assign a detection to that track. If there are detections that are not assigned to tracks, but should be, increase this value. If there are detections that get assigned to tracks that are too far, decrease this value. For this example, specify a threshold of 30.
DeletionThreshold — Number of updates for which the tracker maintains a track without a detection before deletion. In this example, specify a value of 15 frames. Because the video has 20 frames per second, , the tracker deletes tracks that go 0.75 seconds without an assigned detection.
ConfirmationThreshold — Number of detections a track must receive and the number of updates in which it must receive them for confirmation. The tracker initializes a track with every unassigned detection. Because some of these detections might be false, so initially, all tracks are 'Tentative'. To confirm a track, it has to be detected at least M out of N frames. The choice of M and N depends on the visibility of the objects. This example assumes a visibility of 3 out of 5 frames.

tracker = multiObjectTracker(...
FilterInitializationFcn=@helperInitDemoFilter, ...
AssignmentThreshold=30, ...
DeletionThreshold=15, ...
ConfirmationThreshold=[3 5] ... 
);

Detect and Track Objects

Use a loop to run the video clip, detect moving objects in the video, and track them across video frames using these steps in each iteration:

Obtain the bounding boxes for all vehicles in the frame using the pretrained ACF vehicle detector.
Discard bounding boxes with a lower than 5% confidence score, and calculate centroids for the rest.
Create a cell array of objectDetection objects, using the centroids of the detected bounding boxes as the measurements and the current frameCount as the time input.
Obtain the confirmed tracks, based on the new detections, by using the tracker function.
Display the tracking results for each frame.

When creating the objectDetection cell array,specify these properties for each objectDetection object.:

MeasurementNoise — The object detection measurements are noisy. To model that, this example defines a measurement noise covariance of 100. This means that the variance in measurements is 10 pixels in both the x- and y- directions.
ObjectClassID — Object class identifier, specified in this example as 1. The tracker treats all subsequent detections with this ID as known objects. If a detection with this ID cannot be assigned to an existing track, then the tracker immediately confirms a new track from it.
ObjectAttributes — The detected bounding boxes that get passed to the track display are added to this argument.

% Measurement noise for vehicle detections 
measurementNoise = 100;

frameCount = 1;
while hasFrame(vReader)
    % Read new frame and resize it to the YOLO v2 detector input size
    frame = readFrame(vReader);

    % Detect vehicles in the frame and retain bounding boxes with greater than 5% confidence score
    [bboxes,scores] = detect(detector,frame);
    bboxes = bboxes(scores>5,:);

    % Calculate the centroids of the bounding boxes
    centroids = [bboxes(:,1)+floor(bboxes(:,3)/2) bboxes(:,2)+floor(bboxes(:,4)/2)];

    % Formulate the detections as a list of objectDetection objects
    numDetections = size(centroids,1);
    detections = cell(numDetections,1);
    for i = 1:numDetections
        detections{i} = objectDetection(frameCount,centroids(i,:)', ...
            MeasurementNoise=measurementNoise, ...
            ObjectAttributes=struct(BoundingBox=bboxes(i,:)),ObjectClassID=1);
    end
    
    % Update tracks based on detections
    confirmedTracks = tracker(detections,frameCount);

    % Display tracking results and increase frame count by 1    
    displayTrackingResults(trackPlayer,confirmedTracks,frame);
    frameCount = frameCount + 1; 
end

Figure Video Player contains an axes object and other objects of type uiflowcontainer, uimenu, uitoolbar. The hidden axes object contains an object of type image.

Conclusion and Next Steps

In this example, you created a motion-based system for detecting and tracking multiple moving objects. Try using a different video to see if you can detect and track objects. Try modifying the parameters of the multiObjectTracker.

The tracking in this example was based solely on motion, with the assumption that all objects move in a straight line with constant speed. When the motion of an object significantly deviates from this model, the example can produce tracking errors. Notice the mistake in tracking partially occluded vehicles when the ego vehicle changes lanes.

You can reduce the likelihood of tracking errors by using a more complex motion model, such as constant acceleration or constant turn. You can also try defining a different tracking filter, such as trackingEKF or trackingUKF.

Supporting Functions

Define a Kalman Filter

When defining a tracking filter for the motion in this example, helperInitDemoFilter follows these steps:

Step 1: Define the motion model and state

In this example, use a constant velocity model in a 2-D rectangular frame.

The state is [x;vx;y;vy].
The state transition model matrix is A = [1 dt 0 0; 0 1 0 0; 0 0 1 dt; 0 0 0 1].
Assume that dt = 1.

Step 2: Define the process noise

The process noise represents the parts of the process that are not taken into account in the model. For example, in a constant velocity model, the acceleration is neglected.

Step 3: Define the measurement model

In this example, only the position ([x;y]) is measured. So, the measurement model is H = [1 0 0 0; 0 0 1 0].

Note: To preconfigure these parameters, define the 'MotionModel' property as '2D Constant Velocity'.

Step 4: Initialize the state vector based on the sensor measurement

In this example, because the measurement is [x;y] and the state is [x;vx;y;vy], initializing the state vector is straightforward. Because there is no measurement of the velocity, initialize the vx and vy components to 0.

Step 5: Define an initial state covariance

In this example, only positions are measured directly. Hence define the initial state covariance for position components to be same as the corresponding measurement noise values. Because there are no direct measurements for velocity, define the covariance for velocity components to have a larger value.

Step 6: Create the correct filter

In this example, all the models are linear, so use trackingKF as the tracking filter.

function filter = helperInitDemoFilter(detection)
    % Initialize a Kalman filter for this example.
    
    % Define the initial state.
    state = [detection.Measurement(1); 0; detection.Measurement(2); 0];
    
    % Define the initial state covariance.
    stateCov = diag([detection.MeasurementNoise(1,1) ...
                    detection.MeasurementNoise(1,1)*100 ...
                    detection.MeasurementNoise(2,2) ...
                    detection.MeasurementNoise(2,2)*100]);
    
    % Create the tracking filter.
    filter = trackingKF('MotionModel','2D Constant Velocity', ...    
        'State',state, ...
        'StateCovariance',stateCov, ... 
        'MeasurementNoise',detection.MeasurementNoise);
end

Display Tracking Results

The displayTrackingResults function draws a bounding box and label ID for each track on the video frame. It then displays the frame in the video player.

function displayTrackingResults(videoPlayer,confirmedTracks,frame)
    if ~isempty(confirmedTracks)
        % Display the objects. If an object has not been detected
        % in this frame, display its predicted bounding box.
        numRelTr = numel(confirmedTracks);
        boxes = zeros(numRelTr,4);
        ids = zeros(numRelTr,1, 'int32');
        predictedTrackInds = zeros(numRelTr,1);
        for tr = 1:numRelTr
            % Get bounding boxes.
            boxes(tr,:) = confirmedTracks(tr).ObjectAttributes.BoundingBox;

            % Get IDs.
            ids(tr) = confirmedTracks(tr).TrackID;

            if confirmedTracks(tr).IsCoasted
                predictedTrackInds(tr) = tr;
            end
        end

        predictedTrackInds = predictedTrackInds(predictedTrackInds > 0);

        % Create labels for objects that display the predicted rather
        % than the actual location.
        labels = cellstr(int2str(ids));

        isPredicted = cell(size(labels));
        isPredicted(predictedTrackInds) = {' predicted'};
        labels = strcat(labels,isPredicted);

        % Draw the objects on the frame.
        frame = insertObjectAnnotation(frame,"rectangle",boxes,labels);
    end

    % Display the mask and the frame.
    videoPlayer.step(frame);
end