Does Matlab Text Analytics toolbox contain the Fasttext Pre-trained word vectors from Common Crawl (600B tokens)
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Tesha Babka
el 30 de Sept. de 2018
Respondida: MathWorks Text Analytics Toolbox Team
el 16 de Abr. de 2020
Does Matlab Text Analytics toolbox contain the Fasttext pre-trained word vectors from Common Crawl (600B tokens)? It seems to have pertained vectors from a 16billion token data set from fastext. I am wondering if it has the Common Crawl (6000B tokens) Pre-trained word vectors, or if there is a Matlab scrip the add this to the Matlab word embedding model. The vectors are here https://fasttext.cc/docs/en/english-vectors.html
Actually just a Matlab script that would read the Common Crawl 600B file and output a the vectors and string of the corresponding words would be sufficient.
Thanks in advance!
0 comentarios
Respuestas (1)
MathWorks Text Analytics Toolbox Team
el 16 de Abr. de 2020
Text Analytics Toolbox does not itself include this embedding, but you can download it (as you said in your question) and load it into MATLAB as follows:
>> emb = readWordEmbedding("crawl-300d-2M.vec.zip")
emb =
wordEmbedding with properties:
Dimension: 300
Vocabulary: [1×1999995 string]
0 comentarios
Ver también
Categorías
Más información sobre Characters and Strings en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!