How to Develop and Deploy a Neural Network for MCUs in 4 Steps
Learn of a framework for designing neural networks for deployment onto MCUs, presented by MathWorks. Tailored for engineers and developers, this concise tutorial delivers a rapid overview of integrating neural network capabilities into microcontrollers, making it the perfect starting point for your embedded AI projects.
Kickstart your journey with a brief walkthrough on designing and training a neural network. From importing your initial data to evaluating your model’s accuracy, you’ll learn the essentials to get your neural network up and running in no time.
Discover the power of Bayesian optimization for hyperparameter tuning with MATLAB®, a method to fine-tune your network’s performance.
Facing the memory constraints of microcontrollers? This tutorial highlights network compression techniques such as quantization, pruning, and data type conversion. Learn how to make your neural network compact and efficient, ready for embedded use.
Concluding with a brief introduction to Embedded Coder® for generating C/C++ code, this tutorial ensures you’re well-equipped to deploy your neural network model onto microcontrollers.
This fast-paced tutorial is your first step toward mastering AI deployment on MCUs. Embark on your own AI adventure and bring neural networks to your embedded devices today. For further exploration and resources, feel free to reach out to embedded-ai@mathworks.com.
Published: 22 May 2024
Hello, and welcome to a video by Mathworks, the developer of mathematical computing software. In this video, we'll unveil a framework for designing and deploying neural networks tailored to microcontrollers. Step one, design and train a neural network. When you have access to high-quality proprietary data, building a network from scratch is the way to go. We'll take you through our design process, highlighting how we crafted a network that fits our unique data set.
First, we imported our data set. Then we assembled the network. Once the structure is defined, we must train the network. Therefore, we split the data set, set up training options, and started training. The next step is to determine the prediction accuracy. If we want to edit the network, we can analyze it, and if desired, change some training options to achieve our desired results. No matter the origin of your network, it will need fine-tuning, also known as hyperparameter tuning, to achieve optimal performance.
Step two, Bayesian optimization. This is a process of hyperparameter tuning, a critical step in enhancing your network's performance. For a deeper understanding, simply search "deep learning tuning Matlab," and click on the following link. In this example, the function bayesopt minimizes over hyperparameter, nearest neighbor sizes, and various distance functions.
One can also use Bayesian optimization to find the combination of hyperparameters that minimizes a custom metric function. These options are implemented in the training algorithm as well as parameters of the network architecture itself to determine the classification error on a randomly chosen test set. Now that the network has been optimized, we need to ensure our network is compact enough for our hardware before deployment.
Step three, network compression. Compressing your network is crucial for deployment on resource-constrained hardware, and can be done in multiple ways, including quantization, pruning, projection, and data type conversion. For more information, search "deep learning compression Matlab," and click the highlighted link.
Even small trained neural networks require a considerable amount of memory and require hardware that can perform floating-point arithmetic. These restrictions can inhibit deployment of deep learning capabilities to low-power microcontrollers. Using the deep Learning Toolbox Model Quantization Library support package, you can quantize a network to use 8-bit scaled integer data types. This can be done using the Quantizer function or with the Quantizer app.
Another method to reduce the size of a deep neural network is tailor-pruning. By using the tailor-prunable network function, you can reduce the overall network size, which increases the inference speed. Network pruning is a powerful model compression tool that helps identify redundancies that can be removed with little impact on the final network output. Pruning is particularly useful in transfer learning, where the network is often over-parameterized.
Step four, code generation using Embedded Coder. Embedded Coder is an invaluable tool for engineers looking to convert their work to C code. To find out more, search Embedded Coder and click on the following link. Embedded Coder generates readable, customizable and fast C and C++ code for use on embedded devices. It supports industry standards, such as ISO 26262, and IEC 61509.
In summary, Embedded Coder allows an engineer to generate efficient and reliable C or C++ code without having to hand-code the Simulink model or neural network after the steps previously mentioned in this video. You just learned of a framework for using Matlab to design, refine and deploy a neural network onto microcontrollers. Make sure to look at the resources highlighted in this video, and if you're interested in using this framework and have further questions, contact us at embedded-ai@mathworks.com.