Workflow for Deploying a Neural Network to a STM32
This overview provides a high-level explanation of Edge AI and the benefits it can bring developers in the form of virtual sensors. Server-based neural networks require input/output devices to maintain constant communication. In situations where network connectivity or low latency is not guaranteed, it becomes unsafe to employ server-based neural networks. This applies to applications such as cars, robots, or machines on a factory floor. Therefore, neural networks that live on devices become a necessity.
Published: 29 Aug 2023
Hello, and welcome to this workflow demo video by MathWorks. As companies and developers embrace neural networks, the use of Edge AI is quickly becoming a topic of increasing interest. Deployment on embedded chips, like STM32 microcontrollers, allows data to be used more locally on the device, lending itself to a more secure and efficient system. However, there are many challenges in deploying neural networks, such as limited onboard resources, deployment complexity, and the verification of deployed optimized networks.
In this video, we will show how someone without deep knowledge of AI or embedded systems can deploy a neural network to a STM32 device in an efficient and straightforward manner. The workflow highlighted today shows the deployment of an LSTM network representing a virtual sensor. This network is imported from TensorFlow and deployed to an STM32 Nucleo-F767ZI, for state of charge estimation and a battery management system.
A virtual sensor is a type of software that, given the available information, processes what a physical sensor otherwise would. It learns to interpret the relationships between different variables and observes readings from different instruments. The benefits of virtual sensors are that they can be placed anywhere in a system, do not add any weight to a device, and are much cheaper than their physical counterparts.
You might wonder why use AI over conventional estimation methods, like an extended Kalman filter? The answer lies in the accuracy and efficiency that neural networks offer. Unlike traditional methods, AI can handle complex patterns and adapt in real time, making it ideal for a wide range of applications.
There are also the challenges of creating a robust state of charge estimation algorithm which stem from non-linear temperature, battery health, and state-of-charge-dependent behavior. Traditional approaches usually require precise parameters and knowledge of the battery composition and physical response. Neural networks are a data-driven approach that requires minimal knowledge of the battery or its non-linear behavior.
To illustrate the benefit of using a general version of this workflow, let's take a look at a customer use case. Mercedes Benz recently used MATLAB and Simulink to establish a new workflow for deploying virtual sensors, such as those that simulate the functionality of a piston pressure sensor. These sensors are based on deep learning networks, designed to run on resource-limited ECU microcontrollers. This automated workflow replaced the manual workflow that was both slower and relied on a trial and error approach. By leveraging this workflow for virtual sensor development, Mercedes met its CPU memory and performance requirements. A flexible process was established and their development speed increased by 600%.
Before I get into the details of the workflow shown in this video, let me first provide a quick overview for deploying a neural network to an STM32 device. The first step is designing and training a neural network, leveraging AI development techniques to optimize its performance. This can be done in MATLAB and Simulink, or imported from an open source format, using a deep learning converter.
The network then undergoes compression using techniques, such as projection or Taylor pruning and quantization. This is done to reduce its size and make it suitable for deployment on the STM32 device. Once the network is compressed, it can be simulated and tested to see if there is any loss in prediction accuracy.
After testing, the network undergoes code generation using either our Embedded Coder tool or X-CUBE-AI from STMicro. In this video, we will be using Embedded Coder for code generation. This generated code is then deployed to an STM32 via a MathWorks provided hardware support package, or with X-CUBE-AI. For the rest of this video, you will see a workflow detailing the incorporation of a TensorFlow network into a Simulink model for Processor-in-the-Loop testing.
I will begin with dispelling one of the challenges of developing for an STM32 embedded device-- deployment complexity. Where we can begin is in the model settings of the Simulink model. Here, I can choose what board I plan to deploy to and all of the hardware settings are already preconfigured for me. This is possible due to hardware support packages provided by MathWorks. This allows me to easily deploy my software to an STM32.
You can download and read more about our hardware support package at mathworks.com/hardware-support/stm32. With this hardware support package, we can leverage the full potential of STM32 boards, enabling efficient development and testing of our applications.
Now that the model is configured to be deployed to our desired hardware, next is the process of preparing the neural network for deployment. This step is crucial to optimize the network's performance and efficiency. We begin by importing our trained neural network, implemented in TensorFlow, into MATLAB. Thanks to the deep learning conversion tool for TensorFlow provided by MathWorks, this is a fairly straightforward step, only requiring the use of the function importTensorFlowNetwork.
With minimal work, one can import an entire network or just its layers. If there are layers that are not natively supported, a custom layer will be created. Then, using the Deep Network Designer from the MathWorks Deep Learning Toolbox, this imported network can be edited if needed.
Once our network is imported, it must be compressed. Compressing the neural network offers significant benefits, such as reducing memory footprint, improving inference speed, and enabling deployment on resource-constrained devices. In this case, a technique called projection pruning is employed to compress the model. This can be done with the function compressNetworkUsingProjection, from the Deep Learning Toolbox.
Network projection approaches the problem of compression by analyzing neural correlations before carefully introducing projective operations that ultimately reduce the number of learnable parameters in the network, while retaining the important neural relations to preserve accuracy and expressivity. This compression method reduces the network size, while retaining its performance and accuracy. To ensure that the performance and accuracy has not been affected, we can compare the results of the original and compressed networks.
Once it has been confirmed that the quality of the network has not been altered, we can see how it performs on the processor. This can be done on with Processor-in-the-Loop, or PIL, testing. However, before this test can occur, the model must be configured for code generation. This must be done because Processor-in-the-Loop testing generates source code for realtime testing of applications.
Any part of a model designated for deployment to our hardware will be turned into C code. The automatic generation of the source code is handled by our Embedded Coder product. There are a wide array of options for code generation available, but this model will be configured for library-free C code generation. I also want to mention that the generated code will be available as plain text, and can be included in the project or IDE of your choice.
Now, with the model configured for library-free code generation, we can do Processor-in-the-Loop testing. By using an array of pregenerated inputs, I compared the predictions of the compressed network to the original network. With this, we can see that the network has maintained its accuracy and performance when deployed to the CORTEX-M processor on the STM32 device.
Finally, we can perform Processor-in-the-Loop testing of our neural network within a Simulink model that contains our battery model. In our battery management system closed loop controller we can now place our compressed network in our SOC Estimation Model Reference block. The SOC Estimation Model Reference block is configured for Processor-in-the-Loop testing, which means that only this code will be deployed to the STM32. By running the Processor-in-the-Loop test, we can see the predicted values for the state of charge of our battery model.
This differs from our previous test, as the input values to the neural network are generated rather than predetermined, to better simulate what would happen in the physical system. This model allows for comparison between a neural network, extended Kalman filter, and Coulomb counting. However currently, only the neural network is deployed, due to resource constraints.
And there you have it-- a comprehensive workflow for deploying neural networks to STM32 devices using software from MathWorks. Let's recap the stages we've showcased and their significance in achieving efficient deployment of neural networks. We began by preparing our imported network in MATLAB, leveraging its powerful tools and capability for deep learning. Next, our network was compressed for optimal performance on the STM32 Nucleo device.
Finally, we utilized Embedded Coder to generate C code from our compressed network. This step not only enabled us to convert the models into source code, but also facilitated seamless integration with the STM32 hardware when paired with a hardware support package. By leveraging the power of the STM32 platform, and the robust capabilities of MathWorks software, we were able to accelerate the deployment of neural networks and bring the benefits of AI to STM32 devices.
Now, it's your turn to explore further. Whether you're an engineer, a software developer, or in academia, we encourage you to dive deeper into this workflow. Implement it in your own projects, experiment with different neural network architectures, and unlock the potential of Edge AI. Should you require any assistance along the way, do not hesitate to reach out to us.
MathWorks is here to support you. Whether you need technical guidance, training, or consulting services, we have the expertise to help you succeed. To learn more, visit our website at mathworks.com/st, or contact us directly via the information displayed on the screen. Together, let's unleash the full potential of Edge AI.