How to turn .txt file into a useful table.
49 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
This seems like it should be exceedingly simple, but I haven't found anything on here or anywhere else that addresses it. I have a text file delimited by periods that should be very easy to import using the readtable function, but it seems that readtable automatically sets everything to be character arrays. I've tried using format strings, but I get errors. I would include my code but it's simply one line, one fuction: readtable(filepath).
Trying to include a format string gets me:
"Unable to read the entire file. You may need to specify
a different format, delimiter, or number of header
lines.
Note: readtable detected the following parameters:
'HeaderLines', 0, 'ReadVariableNames', true
Error in redditAnalysis (line 4)
allData =
readtable('C:\Users\John\Desktop\ChildrensNeurobio\MATLABproject\redditPractice\all.txt',
'Delimiter', '.', 'Format', '%f%f%f%f%f%s');
"
Any idea how to get the columns I need into a useful numeric vector format?
EDIT: the first few lines of the file... rank.page.upvotes.comments.age.subreddit
1.1.40400.1283.3.OldSchoolCool
2.1.19200.906.4.funny
3.1.31800.1709.5.politics
4.1.40300.780.5.bestof
5.1.5844.1277.3.soccer
6.1.30200.256.5.aww
0 comentarios
Respuestas (2)
Sailesh Sidhwani
el 30 de Ag. de 2017
To achieve your workflow, along with the file you should all pass "File Import Options" to the readtable() functio. These options define how the file will be read in MATLAB. You can also set the variable names, variable types and delimiter in these import options. To know more about import options, check the documentation link below:
See the following steps to achieve your workflow. "abc.txt" is the subset of your file from your question.
opts = detectImportOptions('abc.txt')
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'rank_page_upvotes_comments_age_subreddit'}
VariableTypes: {'char'}
SelectedVariableNames: {'rank_page_upvotes_comments_age_subreddit'}
VariableOptions: Show all 1 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Now change the delimiter, variableNames and variableTypes as per your requirement.
opts.Delimiter = {'.'};
opts.VariableNames= {'rank','page','upvotes','comments','age','subreddit'}
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'.'}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'rank', 'page', 'upvotes' ... and 3 more}
VariableTypes: {'char', 'char', 'char' ... and 3 more}
SelectedVariableNames: {'rank', 'page', 'upvotes' ... and 3 more}
VariableOptions: Show all 6 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Now pass this "opts" as File Import Options to "readtable"
readtable('abc.txt',opts)
ans =
6×6 table
rank page upvotes comments age subreddit
____ ____ _______ ________ ___ _______________
'1' '1' '40400' '1283' '3' 'OldSchoolCool'
'2' '1' '19200' '906' '4' 'funny'
'3' '1' '31800' '1709' '5' 'politics'
'4' '1' '40300' '780' '5' 'bestof'
'5' '1' '5844' '1277' '3' 'soccer'
'6' '1' '30200' '256' '5' 'aww'
1 comentario
Jeremy Hughes
el 31 de Ag. de 2017
Editada: Jeremy Hughes
el 31 de Ag. de 2017
you can also set the types with:
>> opts = setvartype(opts,1:5,'double');
See my full answer for a slightly better approach.
Jeremy Hughes
el 31 de Ag. de 2017
Editada: Jeremy Hughes
el 31 de Ag. de 2017
Hi,
This is actually pretty simple:
>> opts = detectImportOptions('abc.txt','Delimiter','.')
>> opts.VariableNames= {'rank','page','upvotes','comments','age','subreddit'}
>> t = readtable('abc.txt',opts);
Without import options, readtable uses a slightly different reading method that scans for numbers and thus pulls the '.' (i.e. decimal point) along for the ride. Without the 'Delimiter' parameter, detectImportOptions will not choose '.' since it assumes the value will appear as a decimal separator.
Hope this helps,
Jeremy
1 comentario
Jeremy Hughes
el 31 de Ag. de 2017
And if the variable names are already there in the file, you might not need that second line.
Ver también
Categorías
Más información sobre Logical en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!