Hand Pose Estimation Using HRNet Deep Learning

This example uses:

This example shows how to detect keypoints in a human hand and estimate hand pose using the HRNet deep learning network.

Overview

Hand pose estimation detects and estimates the 2D pose and configuration of a human hand from an image or a video. It identifies the position and orientation of the hand joints, such as the locations of fingertips, knuckles, and the palm. The applications of hand pose estimation include virtual and augmented reality, human-computer interaction, sign language recognition, gesture-based interfaces, robotics, and medical diagnosis.

This example uses a High-Resolution Net (HRNet) [1] deep learning network to detect keypoints in a human hand. To learn more about the HRNet deep learning network, see Getting Started with HRNet

In this example, you:

Estimate hand pose using a pretrained HRNet object keypoint detector.
Configure the pretrained HRNet keypoint detector, using a transfer learning approach, to detect keypoints in a human hand image.

Download Pretrained Network

Download a pretrained hand pose keypoint detector by using the helperDownloadHandPoseKeypointDetector helper function. If you want to train the keypoint detector with a new set of data, set the doTraining variable to true.

doTraining = false;
downloadFolder = tempdir;
pretrainedKeypointDetector = helperDownloadHandPoseKeypointDetector(downloadFolder);

Downloading pretrained hand pose keypoint detector (102 MB)...

Detect Hand Keypoints

Read a test image into the workspace.

I = imread("handPose.jpg");

Specify the bounding box location of the hand region in the form [x y w h]. The x and y values specify the upper-left corner of the bounding box. w specifies the width of the box, which is its length along the x-axis. h specifies the height of the box, which is its length along the y-axis.

Alternatively, you can get bounding box locations by training object detectors like yolov3ObjectDetector and yolov4ObjectDetector to detect object locations.

handBoundingBoxes = [188 123 246 205];

Use the pretrained keypoint detector to detect the hand keypoints in the image.

[keypoints,scores,visibility] = detect(pretrainedKeypointDetector,I,handBoundingBoxes);

Visualize the detected keypoints. The image represents the detected hand keypoints as yellow dots, and the keypoint connections using green lines.

outputImg = insertObjectKeypoints(I,keypoints, ...
    Connections = pretrainedKeypointDetector.KeypointConnections, ...
    ConnectionColor="green", ...
    KeypointColor="yellow",KeypointSize=3,LineWidth=3);
outputImg = insertShape(outputImg,rectangle=handBoundingBoxes);
figure
imshow(outputImg)

Figure contains an axes object. The axes object contains an object of type image.

The remainder of this example shows how to configure the pretrained object keypoint detector using a transfer learning approach, and train an HRNet deep learning network on a hand pose data set.

Load Training Data

To illustrate the training procedure, this example uses a labeled data set that contains 2500 images from the Large-Scale Multiview Hand Pose Dataset [2]. Each image in the data set contains a human hand with 21 annotated keypoints.

Download and load the hand pose data set.

dataset = helperDownloadHandPoseDataset(downloadFolder);

Downloading hand pose dataset (98 MB)...

data = load(dataset);
handPoseDataset = data.handPoseDataset;

The hand pose data table contains three columns. The first, second, and third columns contain the image filenames, keypoint locations, and hand bounding boxes, respectively. Keypoints consist of N-by-2 matrices, where N is the number of keypoints present in the hand. Each image contains only one hand, which is one object. Therefore, each row represent one object in an image. If a custom data set contains more than one object in an image, create a row of data for each object in that image.

% Display first few rows of the data set.
handPoseDataset(1:4,:)

ans=4×3 table
          imageFilename             keypoints         boundingBoxes   
    __________________________    _____________    ___________________

    {'0_webcam_1_data_1.jpg' }    {21×2 double}    {[185 156 249 211]}
    {'0_webcam_1_data_2.jpg' }    {21×2 double}    {[222 163 239 247]}
    {'0_webcam_1_data_4.jpg' }    {21×2 double}    {[268 143 231 204]}
    {'0_webcam_1_data_11.jpg'}    {21×2 double}    {[205 157 253 184]}

% Add the full data path to the locally stored hand pose data folder.
handPoseDataset.imageFilename = fullfile(downloadFolder,"2DHandPoseDataAndGroundTruth","2DHandPoseImages",handPoseDataset.imageFilename);

Configure Keypoint Detector

Use the hrnetObjectKeypointDetector function to configure an existing pretrained network for custom keypoint classes.

Read the keypoint class names and keypoint connection information using the helperHandPoseDatasetKeypointNames and helperKeypointConnection helper functions. keypointClasses contains the categorical class labels for every hand keypoint. keypointConnections contains the connectivity information between pairs of keypoints.

keypointClasses = helperHandPoseDatasetKeypointNames;
keypointConnections = helperKeypointConnection;

Create an hrnetObjectKeypointDetector object, and configure it to detect keypoints in a human hand. The pretrained HRNet deep learning network provided by the Computer Vision Toolbox™ Model for Object Keypoint Detection has been trained on the COCO keypoint data set for keypoint detection in humans.

handPoseKeypointDetector = hrnetObjectKeypointDetector("human-full-body-w32",keypointClasses,KeypointConnection=keypointConnections);

Prepare Data for Training

Use imageDatastore to create ImageDatastore objects for loading the image data.

handPoseImds = imageDatastore(handPoseDataset.imageFilename);

Use arrayDatastore to create ArrayDatastore objects for loading the groud truth keypoint location data.

handPoseArrds = arrayDatastore(handPoseDataset(:,2));

Use boxLabelDatastore to create boxLabelDatastore objects for loading the bounding box locations.

handPoseBlds = boxLabelDatastore(handPoseDataset(:,3));

Combine the image, array, and box label datastores.

handPoseCds = combine(handPoseImds,handPoseArrds,handPoseBlds);

The HRNet deep learning network has been trained on image patches that contain only one object in each image. Use the transform function and the helperPreprocessCropped helper function to preprocess the images in the datastore. Use the functions to crop image patches that contain the object of interest and rescale the keypoints to the new image size. Then, store the preprocessed data by using the writeall function. The function stores the image patches as JPEG files and the hand keypoint data as a MAT file.

% Define the input size and number of keypoints to process.
inputSize = handPoseKeypointDetector.InputSize;
numKeypoints = size(handPoseKeypointDetector.KeyPointClasses,1);

% Preprocess and store all the data.
imagesPatchHandPoseData = transform(handPoseCds,@(data)helperPreprocessCropped(data,inputSize,numKeypoints));
imagesPatchDataLocation = fullfile(downloadFolder,"imagesPatchHandPoseData");
writeall(imagesPatchHandPoseData,imagesPatchDataLocation,"WriteFcn",@helperDataStoretWriteFcn,FolderLayout="flatten");

Load the data. Create an ImageDatastore object for the image patches and a FileDatastore object for the keypoints.

handPosePatchImds = imageDatastore(fullfile(imagesPatchDataLocation,"imagePatches"));
handPoseKptfileds = fileDatastore(fullfile(imagesPatchDataLocation,"Keypoints"),"ReadFcn",@load,FileExtensions=".mat");

Split the data set into training, validation, and test sets. Select 80% of the data for training, 10% for validation, and rest for testing the trained detector.

rng(0);
numFiles = numel(handPosePatchImds.Files);
shuffledIndices = randperm(numFiles);

numTrain = round(0.8*numFiles);
trainingIdx = shuffledIndices(1:numTrain);

numVal = round(0.10*numFiles);
valIdx = shuffledIndices(numTrain+1:numTrain+numVal);

testIdx = shuffledIndices(numTrain+numVal+1:end);

Create ImageDatastore objects and for training, validation, and test sets.

trainingImages = handPosePatchImds.Files(trainingIdx);
valImages = handPosePatchImds.Files(valIdx);
testImages = handPosePatchImds.Files(testIdx);
imdsTrain = imageDatastore(trainingImages);
imdsValidation = imageDatastore(valImages);
imdsTest = imageDatastore(testImages);

Create FileDatastore objects for training, validation, and test sets.

trainingKeypoints = handPoseKptfileds.Files(trainingIdx);
valKeypoints = handPoseKptfileds.Files(valIdx);
testKeypoints = handPoseKptfileds.Files(testIdx);
fdsTrain = fileDatastore(trainingKeypoints,"ReadFcn",@load,FileExtensions=".mat");
fdsValidation = fileDatastore(valKeypoints,"ReadFcn",@load,FileExtensions=".mat");
fdsTest = fileDatastore(testKeypoints,"ReadFcn",@load,FileExtensions=".mat");

Create CombinedDatastore objects for training, validation, and test set by combining the respective image datastore and file data store of each set.

trainingData = combine(imdsTrain,fdsTrain);
validationData = combine(imdsValidation,fdsValidation);
testData = combine(imdsTest,fdsTest);

Visualize the data set. Render the ground truth keypoints in yellow and the keypoint connections in green color.

data = read(trainingData);
I = data{1};
keypoints = data{2}.keypoint;
Iout = insertObjectKeypoints(I,keypoints, ...
    Connections=keypointConnections, ...
    ConnectionColor="green", ...
    KeypointColor="yellow",KeypointSize=3,LineWidth=3);
figure
imshow(Iout)

Figure contains an axes object. The axes object contains an object of type image.

Train HRNet Object Keypoint Detector

Use the handPoseKeypointDetector object and the minibatchqueue (Deep Learning Toolbox) function to train the HRNet deep learning network on the hand pose data set with a mini-batch size of 8. Decrease the mini-batch size if you run out of memory during training. Create mini-batch queues for the training and validation data. The minibatchqueue function automatically detects whether a GPU is available and uses it by default. If you do not have a compatible GPU, or prefer to train on a CPU, you can specify the OutputEnvironment name-value argument as "cpu" when calling the minibatchqueue function.

miniBatchSize = 8;
mbqTrain = minibatchqueue(trainingData,3, ...
        MiniBatchSize=miniBatchSize, ...
        MiniBatchFcn=@(images,keypoints)helperCreateBatchData(images,keypoints,handPoseKeypointDetector), ...
        MiniBatchFormat=["SSCB","SSCB","SSCB"]);

mbqValidation = minibatchqueue(validationData,3, ...
        MiniBatchSize=miniBatchSize, ...
        MiniBatchFcn=@(images,keypoints)helperCreateBatchData(images,keypoints,handPoseKeypointDetector), ...
        MiniBatchFormat=["SSCB","SSCB","SSCB"]);

Specify these training options.

Set the number of epochs to 10. For larger data sets, you must train for a higher number of epochs.
Set the learning rate to 0.001.

numEpochs = 10;
initialLearnRate = 0.001;

Initialize the velocity, averageGrad, and averageSqGrad parameters for Adam optimization.

velocity = [];
averageGrad = [];
averageSqGrad = [];

To monitor training progress, calculate the total number of iterations.

numObservationsTrain = numel(imdsTrain.Files);
numIterationsPerEpoch = floor(numObservationsTrain/miniBatchSize);
numIterations = numEpochs*numIterationsPerEpoch;

Initialize the trainingProgressMonitor (Deep Learning Toolbox) object to create a training progress plotter. You must create the object close to when you start the training loop, because the timer starts when you create the monitor object.

Train the HRNet hand pose keypoint detector on hand pose data. Observe the training progress plotter to monitor the training of the detector object on a custom training loop.

Read data from the training minibatchqueue. If it does not have any more data, reset and shuffle the minibatchqueue.
Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function. The modelGradients function, listed as a supporting function, returns the gradients of the loss with respect to the learnable parameters in the network, the corresponding mini-batch loss, and the state of the current batch.
Update the detector parameters using the adamupdate (Deep Learning Toolbox) function.
Update the state of non-learnable parameters of the detector.
Update the training progress plot.

if doTraining
    monitor = trainingProgressMonitor( ...
        Metrics=["TrainingLoss","ValidationLoss"], ...
        Info=["Epoch","Iteration","LearningRate"], ...
        XLabel="Iteration");
    groupSubPlot(monitor,"Loss",["TrainingLoss","ValidationLoss"])
    iteration = 0;
    monitor.Status = "Running";
    
    % Custom training loop.
    for epoch = 1:numEpochs

        reset(mbqTrain)
        shuffle(mbqTrain)

        if epoch >= 7
            currentLR = initialLearnRate/10;
        elseif epoch >= 10
            currentLR = initialLearnRate/100;
        else
            currentLR = initialLearnRate;
        end

        while(hasdata(mbqTrain) && ~monitor.Stop)
            iteration = iteration + 1;

            [XTrain,YTrain,WTrain] = next(mbqTrain);

            % Calculate modelGradients using the dlfeval function.
            [gradients,trainingLoss,dlYPred,state] = dlfeval(@modelGradients,handPoseKeypointDetector,XTrain,YTrain,WTrain);

            % Update the state of the non-learnable parameters.
            handPoseKeypointDetector.State = state;

            % Update the network parameters using the ADAM optimizer.
            [handPoseKeypointDetector.Learnables,averageGrad,averageSqGrad] = adamupdate(handPoseKeypointDetector.Learnables,...
                gradients,averageGrad,averageSqGrad,iteration,currentLR);

            % Calculate the validation loss.
            validationLoss = [];
            reset(mbqValidation)
            while (hasdata(mbqValidation))
                [XVal,YVal,WVal] = next(mbqValidation);
                dlValPred = forward(handPoseKeypointDetector,XVal);
                valLoss = helperCalculateLoss(dlValPred,WVal,YVal);
                validationLoss = [validationLoss; valLoss];
            end
            validationLoss = mean(validationLoss);

            updateInfo(monitor, ...
                LearningRate=currentLR, ...
                Epoch=string(epoch) + " of " + string(numEpochs), ...
                Iteration=string(iteration) + " of " + string(numIterations));

            recordMetrics(monitor,iteration, ...
                TrainingLoss=trainingLoss, ...
                ValidationLoss=validationLoss)
            monitor.Progress=100*floor(iteration/numIterations);
        end
    end
else
    handPoseKeypointDetector = pretrainedKeypointDetector;
end

Evaluate Object Keypoint Detector

Evaluate the hand keypoint detection using the percentage of correct keypoints (PCK) metric [3]. The PCK metric measures the percentage of estimated keypoints that fall within a certain radius of the ground truth keypoints. To compute the PCK metric, set a threshold to define whether or not a predicted keypoint is accurate. The optimal distance threshold for comparing the predicted and ground truth keypoints ranges from 0.1 to 0.3. If the distance between them is within the threshold, you can consider the predicted keypoints accurate.

To compute the PCK metric, calculate the Euclidean distance between the predicted keypoints and the ground truth keypoints, and then normalize the value by a specified distance. In the case of hand keypoint detection, use the distance between the middle point and the lowest point of the middle finger as the normalization factor.

testDataPCK = [];
reset(testData)

while testData.hasdata
    data = read(testData);
    I = data{1};
    keypoint = data{2}.keypoint;
    [height, width] = size(I,[1 2]);
    bbox = [1 1 width height];
    % Distance between the middle and the lower point of middle finger.
    normalizationFactor = sqrt((keypoint(5,1)-keypoint(6,1))^2 + (keypoint(5,2)-keypoint(6,2))^2);
    threshold = 0.3;
    predictedKeypoints = detect(handPoseKeypointDetector,I,bbox);
    pck = helperCalculatePCK(predictedKeypoints,keypoint,normalizationFactor,threshold);
    testDataPCK = [testDataPCK;pck];
end

PCK = mean(testDataPCK);
disp("Average PCK on the hand pose test dataset is: " + PCK);

Average PCK on the hand pose test dataset is: 0.94438

A PCK score of 0.9443 on the test data implies that 94.43% of the keypoints have been identified correctly. To improve the results, you can add more data to the data set or use data augmentation. To customize this example for your own data, you might need to reduce the learning rate if the validation loss remains constant and the model does not converge.

Supporting Functions

modelGradients — Calculate gradients for mini-batch input data.

function [gradients,loss,dlYPredOut,state] = modelGradients(detector,dlX,dlY,dlW)  
% Loss and gradient calculation during the forward pass
[dlYPredOut,state] = forward(detector,dlX);
loss = helperCalculateLoss(dlYPredOut,dlW,dlY);
gradients = dlgradient(loss,detector.Learnables);
end

helperCalculateLoss — Calculate the mean squared error (MSE) loss.

function loss = helperCalculateLoss(dlYPred,dlW,dlY)
outputSize = size(dlYPred,[1 2]);
dlW = logical(dlW);
dlY = reshape(dlY.*dlW,size(dlY,1),size(dlY,2),[]);
dlY = dlarray(dlY,"SSB");
dlYPred = reshape(dlYPred.*dlW,size(dlYPred,1),size(dlYPred,2),[]);
dlYPred = dlarray(dlYPred,"SSB");
loss = mse(dlYPred,dlY);
loss = (loss*1./(outputSize(1)*outputSize(2)));
end

Data Processing Helper Functions

helperCreateBatchData — Create mini-batches of data for the minibtachqueue function.

function [X, Y, W] = helperCreateBatchData(images,keypoints,handPoseKeypointDetector)
% Returns the images combined along the batch dimension as X. It also returns the 
% generated heat map and its weights combined along the batch dimension as Y and W respectively.
inputSize = handPoseKeypointDetector.InputSize;
outputSize = [inputSize(1)/4 inputSize(2)/4];
numKeypoints = size(handPoseKeypointDetector.KeyPointClasses,1);
miniBatchSize = size(images,1);
X = zeros(inputSize(1),inputSize(2),inputSize(3),miniBatchSize,"single");
Y = zeros(outputSize(1),outputSize(2),numKeypoints,miniBatchSize,"single");
W = zeros(outputSize(1),outputSize(2),numKeypoints,miniBatchSize,"single");

for k = 1:miniBatchSize
    I = images{k};
    keypoint = keypoints{k}.keypoint;
    X(:,:,:,k) = single(rescale(I));
    [heatmaps,weights] = helperGenerateHeatmap(single(keypoint),inputSize,outputSize);
    Y(:,:,:,k) = single(heatmaps);
    W(:,:,:,k) = repmat(permute(weights,[2 3 1]),outputSize(1:2));
end
end

helperPreprocessCropped — Crop the input images based on their bounding boxes, and transform their corresponding keypoints based on the cropped image coordinates.

function preprocessedData = helperPreprocessCropped(trainingData,inputSize,numKeypoints)
preprocessedData = cell(size(trainingData));
I = trainingData{1};
keypoint = trainingData{2}.keypoints{1};
bbox = trainingData{3};
[center,scale] = helperBoxToCenterScale(bbox,inputSize(1),inputSize(2));
trans = helperGetAffineTransform(center,scale,inputSize,false);
ImageAugmented = imwarp(I,trans,OutputView=imref2d([inputSize(1) inputSize(2)]), ...
    interpolationMethod="linear",FillValues=0);
preprocessedData{1} = ImageAugmented;
for i = 1:numKeypoints
    keypoint(i,1:2) = affineTransform(keypoint(i,1:2),trans);
end
preprocessedData{2} = keypoint;
preprocessedData{3} = trainingData{3};
preprocessedData{4} = trainingData{4};
end

helperDataStoretWriteFcn — Write the preprocessed datastore to files.

function helperDataStoretWriteFcn(data,writeInfo,~)
name = erase(writeInfo.SuggestedOutputName,writeInfo.Location);
name = erase(name,".jpg");
imageFolder = fullfile(writeInfo.Location,"imagePatches");
if ~exist(imageFolder,"dir")
    mkdir(imageFolder)
end
imwrite(data{1},fullfile(imageFolder,name + ".jpeg"))
keypoint = data{2};
keypointFolder = fullfile(writeInfo.Location,"Keypoints");
if ~exist(keypointFolder,"dir")
    mkdir(keypointFolder)
end
save(fullfile(keypointFolder,name + ".mat"),"keypoint")
end

helperGenerateHeatmap — HRNet-based keypoint detection for training typically follows a top-down approach. Before training the model, convert ground truth keypoints to heatmaps to enable proper regression. The size of the heatmap must correspond to the output size of the HRNet deep learning network.

function [heatmaps,weights] = helperGenerateHeatmap(keypoints,inputSize,outputSize)
heatmapSize = [outputSize(2) outputSize(1)];
sigma = 3;
featStride = [inputSize(2) inputSize(1)]./heatmapSize;
numKeypoints = size(keypoints,1);
heatmaps = zeros([heatmapSize(2) heatmapSize(1) numKeypoints]);

if size(keypoints,2) == 2
    weights = ones(numKeypoints,1);
else
    weights = keypoints(:,3);
end
tmpSize = sigma*3;
for k = 1:numKeypoints
    muX = round(keypoints(k,1)/featStride(1) + 0.5);
    muY = round(keypoints(k,2)/featStride(2) + 0.5);
    upperLeft = [floor(muX - tmpSize) floor(muY - tmpSize)];
    bottomRight = [floor(muX + tmpSize + 1),floor(muY + tmpSize + 1)];
    if (upperLeft(1) >= heatmapSize(1) || upperLeft(2) >= heatmapSize(2) || ...
            bottomRight(1) <  0 ||  bottomRight(2) < 0)
        weights(k) = 0;
        continue
    end
    sizeRegion = 2*tmpSize + 1;
    [x,y] = meshgrid(1:sizeRegion,1:sizeRegion);
    x0 = floor(sizeRegion/2);
    y0 = x0;
    g = exp(-((x - x0).^2 + (y - y0).^2) ./ (2*(sigma^2)));
    gx = [max(0, -upperLeft(1)) min(bottomRight(1),heatmapSize(1))-upperLeft(1)-1] + 1;
    gy = [max(0, -upperLeft(2)) min(bottomRight(2),heatmapSize(2))-upperLeft(2)-1] + 1;
    imgx = [max(0, upperLeft(1)) min(bottomRight(1),heatmapSize(1))-1] + 1;
    imgy = [max(0, upperLeft(2)) min(bottomRight(2),heatmapSize(2))-1] + 1;
    if weights(k) > 0.5
        heatmaps(imgy(1):imgy(2),imgx(1):imgx(2),k) = g(gy(1):gy(2),gx(1):gx(2));
    end
end
end

helperBoxToCenterScale — Convert bounding box format from [x y w h] to center and scale. The center is the coordinates of the bounding box center, and the scale is the bounding box width and height normalized by a scale factor.

function [center,scale] = helperBoxToCenterScale(box,modelImageHeight,modelImageWidth)
    boxWidth = box(:,3); 
    boxHeight = box(:,4);
    center(1) = box(:,1) + floor(boxWidth/2);
    center(2) = box(:,2) + floor(boxHeight/2);
    aspectRatio = modelImageWidth*1.0/modelImageHeight;
    % Pixel standard deviation is 200.0, which serves as the normalization factor to
    % to calculate bounding box scales.
    % https://github.com/leoxiaobin/deep-high-resolution-net.pytorch/blob/master/demo/demo.py#L180
    pixelStd = 200;

    if boxWidth > aspectRatio*boxHeight
        boxHeight = boxWidth*1.0/aspectRatio;
    elseif boxWidth < aspectRatio*boxHeight
        boxWidth = boxHeight*aspectRatio;
    end
    scale = double([boxWidth*1.0/pixelStd boxHeight*1.0/pixelStd]);
    if(center(1) ~= -1)
        scale = scale*1.25;
    end
end

helperGetAffineTransform — Calculate the affine transform based on the center and scale of the image.

function transformMatrix = helperGetAffineTransform(center,scale,outputHeatMapSize,invAffineTransform)
% center: Center of the bounding box [x y].
% scale: Scale of the bounding box, normalized by the scale factor, [width height].
% outputHeatMapSize: Size of the destination heatmaps.
% invAffineTransform (boolean): Option to invert the affine transform direction.
% (inv=False: src->dst or inv=True: dst->src).

% shift (0-100%): Shift translation ratio with regard to the width and height.
shift = [0 0];

% pixelStd is 200 as per https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py.
scaleTmp = scale*200.0;
srcWidth = scaleTmp(1);
dstHeight = outputHeatMapSize(1);
dstWidth = outputHeatMapSize(2);

srcPoint = [1 srcWidth*-0.5];
dstDir = double([1 dstWidth*-0.5]);

src = zeros(3,2);
dst = zeros(3,2);
src(1,:) = center + scaleTmp.*shift;
src(2,:) = center + srcPoint + scaleTmp.*shift;
dst(1,:) = [dstWidth*0.5 dstHeight*0.5];
dst(2,:) = [dstWidth*0.5 dstHeight*0.5] + dstDir;

src(3,:) = helperGetThirdPoint(src(1,:),src(2,:));
dst(3,:) = helperGetThirdPoint(dst(1,:),dst(2,:));

if invAffineTransform
    transformMatrix = fitgeotform2d(dst,src,"affine");
else
    transformMatrix = fitgeotform2d(src,dst,"affine");
end
end

helperGetThirdPoint — To calculate the affine matrix, you must have three pairs of points. This function obtains the third point, given 2D points a and b. The function defines the third point by rotating the vector a - b by 90 degrees anticlockwise, using point b as the rotation center.

function thirdPoint =  helperGetThirdPoint(a,b)
% Args:
%     a: point(x,y)
%     b: point(x,y)
% Returns:
%     The third point.
direction = a - b;
thirdPoint = b + [-direction(2)-1 direction(1)+1];
end

function newJoint = affineTransform(keypoint,trans)
    newJoint = [keypoint(1) keypoint(2)  1];
    newJoint = trans.A*newJoint';
    newJoint = newJoint(1:2);
end

Utility Functions

helperDownloadHandPoseKeypointDetector — Download the pretrained hand pose keypoint detector.

function keypointDetector = helperDownloadHandPoseKeypointDetector(downloadFolder)
pretrainedURL = "https://ssd.mathworks.com/supportfiles/vision/data/hrnet2DHandPose.zip";
pretrainedFolder = fullfile(downloadFolder,"pretrainedNetwork");
if ~exist(pretrainedFolder,"dir")
    mkdir(pretrainedFolder)
end
pretrainedDetectorZip = fullfile(pretrainedFolder,"hrnet2DHandPose.zip");
if ~exist(pretrainedDetectorZip,"file")
    disp("Downloading pretrained hand pose keypoint detector (102 MB)...")
    websave(pretrainedDetectorZip,pretrainedURL);
end
unzip(pretrainedDetectorZip,pretrainedFolder)
pretrainedDetector = fullfile(pretrainedFolder,"hrnet2DHandPose.mat");
keypointDetector = load(pretrainedDetector).handPoseKeypointDetector;
end

helperDownloadHandPoseDataset — Download the hand pose data set and ground truth labels.

function dataset = helperDownloadHandPoseDataset(downloadFolder)
dataFilename = "2DHandPoseDataAndGroundTruth.zip";
dataAndImageUrl = "https://ssd.mathworks.com/supportfiles/vision/data/2DHandPose/" + dataFilename;
zipFile = fullfile(downloadFolder,dataFilename);
if ~exist(zipFile,"file")
    disp("Downloading hand pose dataset (98 MB)...")
    websave(zipFile,dataAndImageUrl);
end
unzip(zipFile,downloadFolder)
dataset = fullfile(downloadFolder,"2DHandPoseDataAndGroundTruth","2DHandPoseGroundTruth.mat");
end

helperHandPoseDatasetKeypointNames — Returns the categorical class names of the 21 hand keypoints.

function classes = helperHandPoseDatasetKeypointNames()
classes = ["forefinger3","forefinger4","forefinger2","forefinger1", ...
    "middleFinger3","middleFinger4","middleFinger2","middleFinger1", ...
    "pinkyFinger3","pinkyFinger4","pinkyFinger2","pinkyFinger1", ...
    "ringFinger3","ringFinger4","ringFinger2","ringFinger1", ...
    "thumb3","thumb4","thumb2","thumb1","wrist"]';
end

helperKeypointConnection — Returns the pairs of keypoint connections between the 21 hand keypoints.

function connection = helperKeypointConnection()  
connection = [4 3; 3 1; 1 2; 8 7; 7 5; 5 6 ; 16 15; 15 13; 13 14; 12 11; 11 9; 9 10; 20 19; 19 17; 17 18; 2 21; 6 21; 14 21; 10 21; 18 21];
end

helperCalculatePCK — Calculate the PCK of each predicted keypoint and corresponding ground truth keypoint.

function pckcurrent = helperCalculatePCK(pred,groundtruth,normalizationFactor,threshold)
assert(size(pred,1) == size(groundtruth,1) && size(pred,2) == size(groundtruth,2) && size(pred,3) == size(groundtruth,3))
pckcurrent = [];
for imgidx = 1:size(pred,3)
    pck = mean(sqrt((pred(:,1,imgidx)-groundtruth(:,1,imgidx)).^2+(pred(:,2,imgidx)-groundtruth(:,2,imgidx)).^2)./normalizationFactor<threshold);
    pckcurrent = [pckcurrent pck];
end
pckcurrent = mean(pckcurrent);
end

References

[1] Sun, Ke, Bin Xiao, Dong Liu, and Jingdong Wang. “Deep High-Resolution Representation Learning for Human Pose Estimation.” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5686–96. Long Beach, CA, USA: IEEE, 2019. https://doi.org/10.1109/CVPR.2019.00584.

[2] Gomez-Donoso, Francisco, Sergio Orts-Escolano, and Miguel Cazorla. "Large-Scale Multiview 3D Hand Pose Dataset." Image and Vision Computing 81 (2019): 25–33. https://doi.org/10.1016/j.imavis.2018.12.001.

[3] Yang, Yi, and Deva Ramanan. "Articulated Human Detection with Flexible Mixtures of Parts." IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 12 (December 2013): 2878–90. https://doi.org/10.1109/TPAMI.2012.261.