# Run MATLAB Image Processing Algorithms on Raspberry Pi and NVIDIA Jetson

By Jim Brock and Murat Belge, MathWorks

Thanks to low-cost hardware platforms such as Raspberry Pi™, it is now easier than ever to prototype image processing algorithms on hardware. Most image processing algorithms are computationally intensive, and it can be challenging to run them on an embedded platform with acceptable frame rates. While Raspberry Pi is sufficient for running simple image processing algorithms, large images and complex algorithms are best run on more powerful hardware such as NVIDIA® Jetson.

Using a chroma key effect as an example, this article describes a simple workflow for deploying a MATLAB® image processing algorithm to embedded hardware. We'll generate C code from the algorithm with MATLAB Coder™, and then use the Run on Hardware utility to prototype the algorithm on a Raspberry Pi board. Finally, we'll move the algorithm to an NVIDIA Jetson Tx1 platform to achieve real-time performance.

## The Chroma Keying Algorithm

Widely used in TV weather reports, movie production, and photo editing applications, chroma keying is a video processing technique in which a foreground subject is shot against a solid color background, such as a green screen, that is later replaced by a different scene (Figure 1).

The chroma keying algorithm compares each pixel in the image with a reference color representing the solid background color. If the color of the pixel is close enough to the reference color, the pixel is replaced with the corresponding pixel from a pre-selected scene image. Mathematically, the chroma keying algorithm can be formulated as the following:

$P_{final}(j,k)=m(j,k)*P_{original}(j,k)+(1-m(j,k))*P_{scene}(j,k)$

Where $$P_{final}(j,k)$$ represents the final pixel value at location $$(j,k)$$ after chroma keying, $$P_{original}(j,k)$$ is the pixel value corresponding to the original image, $$P_{scene}(j,k)$$ is the pixel value representing the scene that replaces the solid background color, and $$m(j,k)∈[0,1]$$ is a mask value. The mask value $$m(j,k)$$ should be 1 for foreground pixels and 0 for background pixels. A mask value between 0 and 1 provides a smooth transition from background to foreground.

The mask value at each pixel is usually computed in the YCbCr color space instead of the usual RGB color space. The Y component of the YCbCr image represents the luminance component and determines how light or dark the image is. Cb and Cr components represent the chroma components that can be used to measure similarity to a reference color. Measuring color similarity using only the Cb and Cr components of the image makes the algorithm robust to variations in luminance values in light and dark areas of the solid background color.

To measure the similarity of a pixel color to a reference color, we use the squared Euclidian distance in chroma space:

$d^2 (j,k)=(Cb(j,k)-Cb_{ref} (j,k))^2+(Cr(j,k)-Cr_{ref} (j,k))^2$

Finally, we compute the mask value at location $$(j,k)$$ in the image using the following formula:

$m(j,k)=\left\{\begin{matrix} 1 & if d(j,k)>t_2 \\ 0 & if d(j,k)<t_1 \\ \frac{d^2 (j,k)-t_1^2}{t_2^2-t_1^2} & if t_1<d(j,k)<t_2 \end{matrix}\right.$

Where $$t_{1}$$ and $$t_{2}$$ with $$t_{2} > t_{1}$$ represent threshold values to be determined.

## MATLAB Implementation

Here is the MATLAB implementation of the chroma keying algorithm.

function Pfinal = chromaKey(P, Pscene, refColorYCbCr, t1, t2)
Cbref = double(refColorYCbCr(1,1,2));
Crref = double(refColorYCbCr(1,1,3));
PYCbCr = rgb2ycbcr(P);
Cb = double(PYCbCr(:,:,2));
Cr = double(PYCbCr(:,:,3));
d = (Cb - Cbref).^2 + (Cr - Crref).^2;
t1 = t1^2;
t2 = t2^2;
m = zeros([size(d,1) size(d,2)]);
for j = 1:size(m,1)
for k = 1:size(m,2)
if d(j,k) > t2
m(j,k) = 1;
elseif d(j,k) > t1
m(j,k) = (d(j,k) - t1) / (t2 - t1);
end
end
end
m = repmat(imgaussfilt(m,0.8), [1 1 3]);
Pfinal = uint8(double(P).*m + double(Pscene).*(1-m));
end


In MATLAB, images are represented as [N, M, 3] arrays of type uint8. This means that we'll need to convert the image data type to 'double' before performing mathematical operations. To avoid abrupt transitions from background to foreground, we apply a Gaussian filter to the computed mask.

## Determining Reference Color and Thresholds

A chroma keying algorithm requires a reference color and thresholds. Using the camera interface in MATLAB Support Package for Raspberry Pi, we capture images of the actual scene. We can then empirically determine the approximate reference color for the background and the approximate threshold values.

r = raspi;
cam = cameraboard;
for k = 1:10
img = snapshot(cam);
end


The img = snapshot(cam); command plots the image captured from Raspberry Pi camera in MATLAB. We use the Data Cursor tool in the MATLAB plot to specify the background color (Figure 2).

To determine the thresholds, we run the algorithm in a loop and adjust the threshold values:

refColorRGB = zeros([1,1,3],'uint8');
refColorRGB(1,1,:) = uint8([93 177 21]);
refColorYCbCr = rgb2ycbcr(refColorRGB);
t1 = 28;
t2 = 29;
scene = data.bg;
% Main loop
for k = 1:1000
img = snapshot(cam);
imgFinal = chromaKey(img, scene, refColorYCbCr, t1, t2);
figure(1),image(img);
figure(2),image(imgFinal);
drawnow;
end


When we run the code we get an image shown against the background we selected (Figure 3).

## Deploying the Chroma Keying Algorithm to Raspberry Pi

Before deploying the code, we need to write a loop around the chroma keying algorithm to capture images from a camera and display them on a monitor attached to Raspberry Pi:

function chromaKeyApp()
%Chroma keying example for Raspberry Pi hardware.
%#codegen
% Copyright 2017 The MathWorks, Inc.
w = matlab.raspi.webcam(0,[1280,720]);
d = matlab.raspi.SDLVideoDisplay;

refColorYCbCr = zeros([1,1,3],'uint8');
refColorYCbCr(1,1,:) = uint8([0 76 98]);
scene = imrotate(data.bg,90);
% Main loop
for k = 1:60
img = snapshot(w);
img = chromaKey(img, scene, refColorYCbCr, 28, 29);
displayImage(d,img);
end
release(w);
release(d);
end


matlab.raspi.webcam and matlab.raspi.SDLVideoDisplay are System objects™ in the Run on Hardware utility that facilitate use of camera and Raspberry Pi display in a deployment workflow. To compile and run the code, we execute the following command:

runOnHardware(r,'chromaKeyApp')


The function runOnHardware creates a MATLAB Coder configuration for Raspberry Pi hardware, generates code for the chromaKeyApp.m script, and deploys it. In order to run the algorithm at a reasonable frame rate, the image size can be reduced to 640x480 or 320x240.

## Generating GPU Code

The algorithm is working on the Raspberry Pi, but it is not achieving the real-time performance we're looking for. To accelerate the algorithm, we will use GPU Coder™ to deploy it to the NVIDIA Jetson platform. We need to generate GPU code to take advantage of the inherent parallelism in the algorithm. First, we write a wrapper main function that uses OpenCV to access a USB camera connected to the NVIDIA Jetson. This function will marshal video frames from the camera to our chromaKey algorithm and then display the output on the screen.

When generating GPU code, we first create a GPU Coder configuration object, set the GPU parameters to target the NVIDIA Jetson platform, and include our custom main function. We will not compile the code on the MATLAB host computer, because we are generating code specifically for the NVIDIA Jetson board. We will create a script to set up the GPU Coder configuration, input example data, and generate source code for our application.

% Create GPU Coder configuration for Jetson TX2
cfg = coder.gpuConfig('exe');
cfg.GpuConfig.MallocMode = 'Unified';
cfg.GpuConfig.ComputeCapability = '6.2';
cfg.GenCodeOnly = 1;
cfg.CustomSource = 'main_webcam.cu';

% Create sample inputs
refColorRGB = [70 130 85]; % RGB light Green
tmpColor = zeros([1,1,3],('uint8');
tmpColor(1,1,:) = uint8(refColorRGB);
refColor = rgb2ycbcr(tmpColor);
threshold1 = 14;
threshold2 = 20;

% Generate CUDA code for chromaKey
codegen -config cfg -args {fg,bg,refColor,threshold1,threshold2} chromaKey


We then run the script in MATLAB to generate CUDA code for the chromaKey algorithm.

## Deploying a Green Screen Algorithm to NVIDIA Jetson

To deploy the generated code to the NVIDIA Jetson, we need to package all the required files into the codegen directory, with the following MATLAB commands.

% Prepare files for transfer to NVIDIA Jetson TX2
copyfile('Scenery.jpg','codegen/exe/chromaKey/');
copyfile('main_webcam.cu','codegen/exe/chromaKey/');
copyfile(fullfile(matlabroot,'extern','include','tmwtypes.h'),'codegen/exe/chromaKey/');
copyfile('buildAndRun.sh','codegen/exe/chromaKey/');


The next step is to copy the entire generated codegen folder from the host machine to the NVIDIA Jetson board. After the files have been transferred, we sign in to the NVIDIA Jetson directly to build and run the application.

Once logged in to the NVIDIA Jetson, we run the jetson_clocks.sh script provided by NVIDIA to maximize the performance of the board, change to the codegen directory containing the generated source code we just transferred, and execute the compile command shown below.

Once the executable (chromaKey) has been built, the application is run with a USB-connected webcam on the NVIDIA Jetson board with the following command. The frames-per-second rate will be displayed on the output.

$> sudo ./jetson_clocks.sh$> cd codegen/exe/chromaKey
$> nvcc -o chromaKey *.cu -rdc=true -arch sm_62 -O3 pkg-config --cflags --libs opencv -lcudart$> ./chromaKey 1


Figure 4 shows the output from the NVIDIA Jetson board's USB camera before and after the green screen effect.

## Comparing Raspberry Pi and NVIDIA Jetson Performance

The greater parallel processing power of the GPU on the NVIDIA Jetson significantly improves the algorithm's performance. The Raspberry Pi achieved approximately 1 frame per second, while the NVIDIA Jetson achieved more than 20 frames per second for an image size of 1280x720—we gained a more than twentyfold speedup without making any modifications or optimizations to our algorithm. We could improve performance even more by optimizing the MATLAB algorithm for more efficient GPU code generation.

## Summary

In this example we saw how to rapidly generate code for a MATLAB algorithm and deploy it to embedded hardware like the Raspberry Pi. We quickly determined that our algorithm was working correctly and needed to be parallelized. Using MATLAB and GPU Coder, we generated a highly parallel implementation of the algorithm and deployed it to an NVIDIA Jetson board, achieving a significant performance improvement.

Published 2018