Main Content


Explain object detection network predictions using D-RISE

Since R2024a


    scoreMap = drise(detector,I) returns a saliency map for the specified image I and object detection network detector. The function calculates the saliency map by using the detector randomized input sampling for explanation (D-RISE) algorithm. This function requires Deep Learning Toolbox™ Verification Library and Computer Vision Toolbox™.


    scoreMap = drise(customDetection,I) specifies a custom detection function.


    scoreMap = drise(___,bboxIn,labelIn) also specifies the bounding boxes bboxIn and labels labelIn corresponding to the detections you want to explain.


    [scoreMap,bboxOut,scores,labelOut] = drise(detector,I) also returns the bounding boxes bboxOut, scores scores, and labels labelOut made by the object detector network.

    ___ = drise(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of input and output arguments from the previous syntaxes.



    collapse all

    Load a YOLO v2 object detector trained to detect vehicles.

    s = load("yolov2VehicleDetector.mat");
    detector = s.detector;

    Read in a test image. This image comes from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona. The image is used with permission.

    img = imread("testCar.png");
    img = im2single(img);

    Detect vehicles in the test image by using the trained YOLO v2 detector. Pass the test image and the detector as input to the detect function. The detect function returns the bounding boxes and the detection scores.

    [bboxes,scores,labels] = detect(detector,img);
    annotatedImage = insertObjectAnnotation(img,"rectangle",bboxes,scores);

    Figure contains an axes object. The axes object contains an object of type image.

    Use the drise function to create saliency maps explaining the detections made by the YOLO v2 object detector.

    scoreMap = drise(detector,img);

    Plot the saliency map over the image. Areas highlighted in red are more significant in the detection than areas highlighted in blue.

    for i = 1:2
        annotatedImage = insertObjectAnnotation(img,"rectangle",bboxes(i,:),scores(i));
        hold on
        title("DRISE Map: Detection " + i)
        hold off
    colormap jet

    Figure contains 2 axes objects. Axes object 1 with title DRISE Map: Detection 1 contains 2 objects of type image. Axes object 2 with title DRISE Map: Detection 2 contains 2 objects of type image.

    Load a YOLO v2 object detector pretrained to detect vehicles.

    s = load('yolov2VehicleDetector.mat');
    detector = s.detector;

    Read in a test image. This image comes from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona. The image is used with permission.

    img = imread("testCar.png");
    img = im2single(img);

    Specify the target detections you want to understand.

    targetBbox = [125 64 116 85];
    targetLabel = 1;

    Use the drise function and the target bounding boxes and labels to create saliency maps explaining the detections made by the YOLO v2 object detector.

    scoreMap = drise(detector,img,targetBbox,targetLabel);

    Plot the saliency map over the image. Areas highlighted in red are more significant in the detection than areas highlighted in blue.

    annotatedImage = insertObjectAnnotation(img,"rectangle",targetBbox,"vehicle");
    hold on
    title("DRISE Map")
    hold off
    colormap jet

    Figure contains an axes object. The axes object with title DRISE Map contains 2 objects of type image.

    Load a YOLO v2 object detector pretrained to detect vehicles.

    s = load('yolov2VehicleDetector.mat');
    detector = s.detector;

    Read in a test image. This image comes from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona. The image is used with permission.

    img = imread("testCar.png");
    img = im2single(img);

    Detect vehicles in the test image by using the trained YOLO v2 detector. Pass the test image and the detector as input to the detect function. The detect function returns the bounding boxes and the detection scores.

    [bboxes,scores,labels] = detect(detector,img);
    annotatedImage = insertObjectAnnotation(img,"rectangle",bboxes,scores);

    Figure contains an axes object. The hidden axes object contains an object of type image.

    Use the drise function to create saliency maps explaining the detections made by the YOLO v2 object detector. To increase the number of mask images that the function uses to generate the saliency maps, set the number of samples to 16,384. Use a mask resolution of 8-by-8 pixels and a mask probability of 0.85. With the increase in the number of samples, the drise function takes longer to run. To track the progress, return the verbose output.

    scoreMap = drise(detector,img, ...
        NumSamples=16384, ...
        MaskResolution=[8 8], ...
        MaskProbability=0.85, ...
        MiniBatchSize=256, ...
    Computing target detections...Explaining 2 detections.
    Number of mini-batches to process: 64
    ..........   ..........   ..........   ..........   .......... (50 mini-batches)
    ..........   ....                                              (64 mini-batches)
    Total time = 101.3secs.

    Plot the saliency map over the image. Areas highlighted in red are more significant in the detection than areas highlighted in blue.

    for i = 1:2
    annotatedImage = insertObjectAnnotation(img,"rectangle",bboxes(i,:),scores(i));
    hold on
    title("DRSIE Map: Detection " + i);
    hold off
    colormap jet

    Figure contains 2 axes objects. Hidden axes object 1 with title DRSIE Map: Detection 1 contains 2 objects of type image. Hidden axes object 2 with title DRSIE Map: Detection 2 contains 2 objects of type image.

    Load a YOLO v2 object detector pretrained to detect vehicles.

    s = load("yolov2VehicleDetector.mat");
    detector = s.detector;

    Read in a test image. This image comes from the Caltech Cars 1999 and 2001 data sets, created by Pietro Perona. The image is used with permission.

    img = imread("testCar.png");
    img = im2single(img);

    You can create saliency maps for an object detector that you call using a function handle. The function handle must take exactly one input argument, which is the image, and return exactly three output arguments: the bounding boxes, the class probabilities, and the objectness score.

    Modify the YOLO v2 detector to create a custom detector. You can use the function handle input to specify additional name-value arguments to the detect method. Return all detected bounding boxes by setting SelectStrongest to false.

    function [bboxes,classProbs,objectness] = customDetector(detector,img)
    [bboxes,~,~,intermediates] = detect(detector,img,SelectStrongest=false);
    if isa(intermediates,"cell")
        classProbs = cellfun(@(x)getFields(x,"ClassProbabilities"), ...
        objectness = cellfun(@(x)getFields(x,"ObjectnessScores"), ...
        classProbs = intermediates.ClassProbabilities;
        objectness = intermediates.ObjectnessScores;
    function z = getFields(x,fieldName)
    if ~isempty(x)
        z = x.(fieldName);
        z = [];

    Specify target detections to understand. For a function handle input, you must specify a numeric value corresponding to the index of the class label.

    targetBbox = [125 64 116 85];
    targetLabel = 1;

    Generate the saliency map.

    scoreMap = drise(@(img)customDetector(detector,img),img,targetBbox,targetLabel);

    Plot the results.

    annotatedImage = insertObjectAnnotation(img,"rectangle",targetBbox,"vehicle");
    hold on
    title("DRISE Map: Custom Detector")
    hold off
    colormap jet

    Figure contains an axes object. The axes object with title DRISE Map: Custom Detector contains 2 objects of type image.

    Input Arguments

    collapse all

    Object detection network, specified as a yolov2ObjectDetector (Computer Vision Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), yolov4ObjectDetector (Computer Vision Toolbox), or yoloxObjectDetector (Computer Vision Toolbox) object.

    Input image, specified as a real-valued H-by-W-by-C array, where H, W, and C are the height, width, and channel size of the image, respectively.

    The image must be a real, nonsparse grayscale or RGB image.

    The channel size in each image must be equal to the network input channel size. For example, C must be 1 for a grayscale image and 3 for an RGB image.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Input bounding boxes, specified as a real-valued M-by-4 matrix, where M is the number of detections. Specify each bounding box as a four-element row vector in the form [x y width height], where:

    • x is the x-coordinate of the upper-left corner of the bounding box.

    • y is the y-coordinate of the upper-left corner of the bounding box.

    • width is the width of the bounding box.

    • height is the height of the bounding box.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Input labels, specified as an integer-valued column vector, a categorical array, or a string array. This input must have a size of M-by-1, where M is the number of detections. When you specify a function handle input, you must specify this input as an integer-valued column vector.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | string | categorical

    Custom detection options, specified as a function handle. The custom detection function must take a single input image and return three outputs:

    • Bounding boxes, returned as an M-by-4 array of positive, real numbers where M is the number of detections.

    • Class probabilities, returned as an M-by-C array of positive, real numbers corresponding to the class probabilities for each detection, where C is the number of classes. If your detector does not return class probabilities, then return an empty array, [ ], instead.

    • Objectness score, returned as a M-by-1 vector corresponding to the probability an object is present in each detection. If your detector does not return objectness scores, then return an empty array, [ ], instead.

    Use this input to specify additional options for the detect function, to use other built-in detectors such as an ssdObjectDetector, or to use detectors from other frameworks.

    If the function takes a batch of images as input, then the output must be a N-by-1 cell array, where N is the number of images. Each element of the cell must include the bounding boxes, class probabilities, and objectness score for the corresponding image.


    For YOLO v2, v3, v4, and X detectors, the class probabilities and objectness are returned by the fourth output argument from the detect function. For more information, see yolov2ObjectDetector (Computer Vision Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), yolov4ObjectDetector (Computer Vision Toolbox), and yoloxObjectDetector (Computer Vision Toolbox).

    Data Types: function_handle

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: Threshold=0.75,Verbose=true sets the detection threshold to 0.75 and enables verbose output.

    Detection threshold, specified as a scalar in the range [0, 1]. The software removes detections whose scores are lower than this value. The default value is 0.5 when you specify detector as a yolov2ObjectDetector (Computer Vision Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), or yolov4ObjectDetector (Computer Vision Toolbox) object. The default value is 0.25 when you specify detector as a yoloxObjectDetector (Computer Vision Toolbox) object.

    This argument applies only if your function syntax does not include the customDetection input.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Number of samples, specified as a positive integer. This value specifies the number of mask images that the function uses to generate the saliency map. A larger number of samples yields better results but requires more computation time.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Mask resolution, specified as a positive integer or a two-element row vector of positive integers. If you specify a single positive integer k then the function uses a map with resolution [k k].

    The function uses bilinear interpolation to upscale the mask to the size of the image. A small mask resolution returns a masked image with fewer but larger occluded regions. A large mask resolution returns a masked image with more but smaller occluded regions.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Mask probability, specified as a scalar in the range [0, 1].

    Each pixel in the mask is randomly populated with either 0 or 1, where the probability of 1 is set by the mask probability value. A value of 1 means that the pixel is not masked and none of the image is occluded.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical

    Size of the mini-batch, specified as a positive integer.

    The mini-batch size specifies the number of masked images that are passed to the detector at a time. Larger mini-batch sizes lead to faster computation, at the cost of more memory.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Option to enable verbose output, specified as a numeric or logical 1 (true) or 0 (false). When you set this input to 1 (true), the function returns the progress of the D-RISE algorithm by indicating which mini-batch the function is processing and the total number of mini-batches. The function also returns the amount of time computation takes.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical

    Output Arguments

    collapse all

    Saliency map, returned as a numeric matrix or numeric array. Areas in the map with higher positive values correspond to regions of input data that contribute positively to the detection.

    If the image has multiple detections, scoreMap is specified as a 3-D array, and the ith element, scoreMap(:,:,i), corresponds to the saliency map for the ith detection.

    Data Types: double

    Location of objects detected within the input image or images, returned as an M-by-4 matrix. M is the number of bounding boxes in an image.

    Each row of bboxOut contains a four-element vector of the form [x y width height]. This vector specifies the upper-left corner and size of that corresponding bounding box in pixels.

    Detection confidence scores, returned as an M-by-1 vector. M is the number of bounding boxes in an image. A higher score indicates higher confidence in the detection.

    Labels for bounding boxes, returned as an M-by-1 categorical array. M is the number of labels in an image.


    [1] Petsiuk, Vitali, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko. “Black-Box Explanation of Object Detectors via Saliency Maps.” Preprint, submitted June 10, 2021.

    Version History

    Introduced in R2024a

    See Also

