Import pre-trained word embeddings (GloVe, Skipgram, etc.) in Deep Neural Network models.

5 visualizaciones (últimos 30 días)
I was going through this page to learn how to classify text using word embeddings and LSTM. The page talks about training the word embeddings within the LSTM architecture, but does not discuss if I want to import word embedding models trained externally such as those using Global Vectors and word2vec which already provide large-scale pre-trained word embeddings. Any ideas how I can use pre-trained word embeddings in the LSTM architecture?

Respuesta aceptada

Liliana Agapito de Sousa Medina
Liliana Agapito de Sousa Medina el 28 de Nov. de 2018
You can use a pre-trained embedding model to initialize the Weights property of the wordEmbeddingLayer. For example:
% Import your pretrained word embedding model of choice
emb = readWordEmbedding('existingEmbeddingModel.vec');
embDim = emb.Dimension;
numWords = numel(emb.Vocabulary);
% Initialize the word embedding layer
embLayer = wordEmbeddingLayer(embDim, numWords);
embLayer.Weights = word2vec(emb, emb.Vocabulary)';
% If you want to keep the original weights "frozen", uncomment the following line
% embLayer.WeightLearnRateFactor = 0
The wordEmbeddingLayer with initialized Weights can then be placed in the network before lstmLayer.
Also note that training documents should be mapped according to the vocabulary of the pre-trained embedding model, before passing to the net for training, for example:
enc = wordEncoding(tokenizedDocument(emb.Vocabulary,'TokenizeMethod','none'));
XTrain = doc2sequence(enc,documentsTrain,'Length',75);

Más respuestas (2)

CoderTargaryn
CoderTargaryn el 28 de Nov. de 2018
Hi, Many thanks for your answer. After posting my question, I did some MATLAB documentation reading online and found that it is possible using your suggested way.

koosha salehi
koosha salehi el 24 de Oct. de 2020
HI
  • I am using stanford glove data set and i want to design a deep network with lstm i use WordEmbeddingLayer but it doesn't work i think that sequence input layer makes problem. who can help me?
  • and i need a small labled corpus and its Equivalent vectors for Glove format.
any one do it before?

Categorías

Más información sobre Modeling and Prediction en Help Center y File Exchange.

Productos


Versión

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by