Problem with using readtable:

2 visualizaciones (últimos 30 días)
Ilter Onat Korkmaz
Ilter Onat Korkmaz el 18 de Abr. de 2021
Respondida: Aghamarsh Varanasi el 21 de Abr. de 2021
Here is the code:
opts = detectImportOptions("C:\Users\Onat\Desktop\392\vocab.txt");
opts.VariableTypes=["string", "double"];
opts.LineEnding = ["\n"];
vocab = readtable('C:\Users\Onat\Desktop\392\vocab.txt',opts);
I'am working on an NLP application in which I need the vocabulary and frequency of those words in vocabulary. Naturally, the corpus contains tokens such as single apostrophe. It seems that this is a major problem for MATLAB since it detects it as a special char. Notice that in the output given below, after apostrophe the frequencies are seen as comments to MATLAB. Can anyone help with this issue?
vocab.txt is as follows:
..... (i.e. this is not the beginning)
and 699333
in 603607
" 538122
to 504540
a 476836
was 304423
...... (i.e continues)
the output is as follows:
...... (i.e. this is not the beginning)
"and" 6.9933e+05
"in" 6.0361e+05
" 538122↵to 504540↵a 476836↵was 304423↵The 246510↵- 229901↵is 225721↵for 198733↵)

Respuesta aceptada

Aghamarsh Varanasi
Aghamarsh Varanasi el 21 de Abr. de 2021
Hi,
The quotes in a text file are used to import strings that contain any charater (even with delimiters). That is why you are seeing this behavior. You can specify the 'Format' of the file you are importing by using the 'Format' Name-Value pair input to the function readtable.
Example:
vocab = readtable('data.txt', 'Format' ,'%s %f');
This would import the data as expected.
For more info on how strings are imported from text documents, refer to this Community post.

Más respuestas (1)

Cris LaPierre
Cris LaPierre el 21 de Abr. de 2021
I would take a different approach. I would use readlines and string manipulation to create the table.
str = readlines("vocab.txt")
str = 6×1 string array
"and 699333" "in 603607" "" 538122" "to 504540" "a 476836" "was 304423"
T = array2table(split(str),'VariableNames',["Word","Freq"]);
T.Freq = str2double(T.Freq)
T = 6×2 table
Word Freq _____ __________ "and" 6.9933e+05 "in" 6.0361e+05 """ 5.3812e+05 "to" 5.0454e+05 "a" 4.7684e+05 "was" 3.0442e+05

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Etiquetas

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by