Designing and deploying deep learning and computer vision applications to embedded CPU and GPU platforms is challenging because of resource constraints inherent in embedded devices. A MATLAB® based workflow facilitates the design of these applications, and automatically generated C or CUDA® code can be deployed on boards like the Jetson TX2 and DRIVE PX and achieve very fast inference.
The presentation illustrates how MATLAB supports all major phases of this workflow. Starting with algorithm design, the algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB. Next, these networks are trained using GPU and parallel computing support for MATLAB either on the desktop, cluster, or the cloud. Finally, GPU Coder™ generates portable and optimized C/C++ and/or CUDA® code from the MATLAB algorithm, which is then cross-compiled and deployed to CPUs and/or Tegra® board. Benchmarks show that performance of the auto-generated CUDA code is ~5x faster than TensorFlow® and ~2X faster than MXNet.