Instead of designing and training an architecture completely from scratch, you can build on what already exists with transfer learning. This chapter looks at how to modify an existing architecture and then retrain it to accomplish your specific task.
How Does Transfer Learning Work?
A simple description of neural networks can be useful to describe how transfer learning works for image-based architectures. A trained network is looking for patterns in data or, in this case, patterns in the images. It does this by looking for primitive features—like blobs and edges and colors—in the early layers, then as you progress through the layers the network combines them into more complex features, and then ultimately combines those into final patterns that can be labeled.
At first, it might seem that a network trained to recognize something like flowers in images is not useful if you are trying to develop a network that can find patterns in sensor data. However, the interesting thing is that with a time-frequency representation of a signal like you get with a spectrogram or scalogram, the signal can be preprocessed into an image. Primitive features like blobs, color, loops, and lines exist in pretty much all images, even time-frequency images of signals.
With transfer learning, you can take advantage of a pretrained network’s ability to recognize those primitive features and just replace the last few layers in the network that combine those features and do the final classification.
In general, training this network should be much faster and require much less data than starting from scratch since it only needs to learn how to combine features to recognize the larger patterns you’re looking for.
In this example, transfer learning is used to recognize and label hand motions using simple hardware. This example starts with a pretrained GoogLeNet network and retrains it to recognize high-five patterns in three-axis acceleration data.
The MATLAB Support Package for Arduino Hardware is used to read acceleration data from the MPU-9250 through the Arduino. It only takes three lines of code to connect to the Arduino, instantiate an MPU9250 object, and read the accelerometer.
Read accelerometer at 50 Hz and display
The three-axis acceleration data is converted into a color image with the red, green, and blue channels of the image representing the scalogram of the x, y, and z acceleration axes.
To train a network to recognize high fives, you need to provide it with labeled training data. In this case, the training data is multiple images of high fives and multiple images of other arm motions.
For this example, 200 labeled training images were collected. This data was collected by measuring the real motions of a person’s arm. The training data was then scrubbed for outliers and other motions that would corrupt the training and those images were removed.
Replacing the end layers was done using the Deep Network Designer app. Only two layers at the end of the GoogLeNet network need to be replaced. These are the fully connected layer, which combines the primitive features into the specific patterns, and the output layer, which assigns a label.
The training data is imported, and 20% of the images are set aside to be used for validation. With a trained network, the next step is to test it on a larger set of data. In this case, it was trying out the high-five classifier on real arm motions.