Fitting AI Models for Embedded Deployment

AI is no longer limited to powerful computing environments such as GPUs or high-end CPUs, and is often integrated into systems with limited resources like patient monitoring, diagnostic systems in vehicles, and manufacturing equipment. Fitting AI onto hardware with limited memory and power supply requires deliberate trade-offs between size of model, accuracy, inference speed, and power consumption—and that process is still challenging in many frameworks for AI development.

Optimizing AI models for limited hardware generally proceeds in these three steps:

Model Selection: Identify less complex models and neural networks that still achieve the required accuracy
Size Reduction: Tune the hyperparameters to generate a more compact model or prune the neural network
Quantization: Further reduce size by quantizing model parameters

Additionally, especially for signal and text problems, feature extraction and selection result in more compact models. This talk demonstrates model compression techniques in MATLAB^® and Simulink^® by fitting a machine learning model and pruning a convolutional network for an intelligent hearing aid.

Published: 25 May 2022