How to Use BERT Models for Natural Language Processing (NLP) in MATLAB
Learn how to apply BERT models (transformer-based deep learning models) to natural language processing (NLP) tasks such as sentiment analysis, text classification, summarization, and translation. This demonstration shows how to use Text Analytics Toolbox™ and Deep Learning Toolbox™ in MATLAB® to fine-tune a pretrained BERT model for a text classification task. You’ll learn about MATLAB code that illustrates how to start with a pretrained BERT model, add layers to it, train the model for the new task, and validate and test the final model.
Published: 9 Jan 2024
This video will demonstrate how to use BERT models in MATLAB for natural language processing tasks. BERT is an acronym for Bidirectional Encoder Representations from Transformers. It is a deep neural network model that uses transformer architecture for learning contextual relations between words or subwords. It's also a general purpose language representation trained on a large corpus of text gathered from Wikipedia articles and ebooks in the book corpus data set.
You can take the pre-trained BERT models and fine tune for specific tasks. Since you just need to fine tune the model, it takes less time to train and provides a greater accuracy, even with a smaller training data set.
This video will cover the process of fine tuning BERT. The general workflow is taking the pre-trained model and adding layers to it. The model is then trained on data specific to the kind of task at hand. As a side note, training additional layers only is the most popular approach to fine tune BERT because of its ease and speed of training. Other ways include training the entire BERT model starting with its pre-trained weights or gradually unfreezing BERT layers for training.
The MATLAB implementation of BERT is hosted on the MathWorks GitHub repository for transformer models. It includes multiple examples, including the one you'll see shortly called Fine Tune BERT, which trains the entire BERT model, along with additional layers, to classify factory reports to diagnose failures. The example called classify text data using BERT trains additional layers only. Please also take note of the software requirements for each of these examples.
First, download and unzip this repository into your working directory. Load the pre-trained BERT model in MATLAB, which includes a tokenizer and parameters. Next, load factory reports data from this CSV file, which contains descriptions of failure events and categorical labels for each event. This data will be used to build a model for classifying failure events based on the descriptions of the events. Next, encode the text data using the BERT model tokenizer, and add the tokens to the training data table.
The data is divided into training and validation sets. Use Word Cloud to get a better idea of the contents of the training texts. To prepare the data for training, create array data stores for training tokens, and labels, and combine those in a single data store. Then divide the data into mini batches using mini batch queue, a convenient method of managing mini batches of data for training using custom training loops. Here, you can initialize model parameters, specify training options, and use a custom training loop, doing a number of passes through the data.
During each pass, a mini batch is processed, estimating loss and its gradients with respect to learnable parameters. And those parameters are updated. Note that model gradients is a local function in this live script that contains the additional layers added to the BERT model. Now, you have everything you need to start training. Once the training finishes, move to validate the model. First, the validation data set is preprocessed in the same way as the training set.
You can then evaluate the model's performance by plotting a confusion chart of its predictions on the validation data against the true values. Finally, take the model, test it against new failure reports, and get predicted labels for each report.
BERT is a powerful model that can be used for transfer learning on a variety of natural language processing tasks. Be sure to check out BERT and other transformers at the MathWorks GitHub repository, and learn more on the text analytics web page and in documentation.