Auto Detect different file types?
4 views (last 30 days)
Show older comments
Hello,
I am trying to edit a program so that it is capable of auto detecting different text files. Currently, I am using two different pograms to open and report the seperate text files using the following bits of code:
Program 1:
filespec=[fpath char(fnameALL(2))]; TD=filespec;
delimiter='[\t]';comment='';quotes='';options='numeric';
[TDdata, ~]= readtext(filespec, delimiter, comment, quotes, options);
pVelocity = TDdata(6,62)*100; pCadence = (TDdata(6,21)+TDdata(6,48))/2;
pStride = TDdata(6,65)*100; pStepWidth = TDdata(6,68)*100;
pGSR=(pCadence/60)/(pVelocity/100);
pRTO = (TDdata(6,39)/TDdata(6,33))*100; pLTO = (TDdata(6,12)/TDdata(6,9))*100;
pRSS = (TDdata(6,30)/TDdata(6,33))*100; pLSS = (TDdata(6,57)/TDdata(6,9))*100;
pRSTEP = TDdata(6,42)*100; pLSTEP = TDdata(6,15)*100;
pROTO = (TDdata(6,36)/TDdata(6,33))*100; pLOTO = (TDdata(6,60)/TDdata(6,9))*100;
ToeOff = [pRTO pLTO];
filespec=[fpath char(fnameALL(3))]; TD=filespec;
delimiter='[\t]';comment='';quotes='';options='numeric';
[TDalldata, result]= readtext(filespec, delimiter, comment, quotes, options);
Num_trialstd=(length(TDalldata(1,:))-1)/68;
Program 2:
[fname fpath]=uigetfile('*.txt','Please select the _td file');
conditionid=input('Enter the condition (no spaces): ','s');
cd(fpath);
[dataALL,results]=readtext(fname,';','','','numeric');
[row, col]=find(dataALL(:,3)>0);
data=dataALL(row:length(dataALL),:);
What I am wondering is if there is a function I am unaware of that would automatically be able to distinguish the differences between text files?
If what i'm asking is unclear, I can provide clarification.
Thank you.
Accepted Answer
_
on 24 Jan 2022
Since the only difference between the way the two file types are read with readtext() is the delimiter, you can try different delimiters until you find one that works. With those two files you posted, I found that readtext() returns all NaNs if you use the wrong delimiter, so I'm using that as the condition that determines whether the file was read correctly or not. (If you have any file that returns all NaNs which needs to be considered valid, then you'd have to use a different condition.)
The following code loops over a set of files and for each file tries readtext() with each different delimiter (in this case just '[\t]' and ';' but the code will work for any number of delimiters) until one gives something that's not all NaNs. Then, for the next file, the delimiter that worked is tried first.
my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:numel(my_files)
fprintf('preparing to read file %s:\n',my_files{i});
tried_delimiters = false(1,n_delimiters);
success = false;
while any(~tried_delimiters)
fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
tried_delimiters(delimiter_idx) = true;
if all(isnan(data(:)))
fprintf('\tfailed\n');
delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
continue
end
success = true;
break
end
if success
% successfully read my_files{i} with delimiter delimiters{delimiter_idx}
fprintf('\tsuccess\n');
else
% couldn't figure out how to read this file
fprintf('\tall delimiters failed. couldn''t read the file\n');
continue
end
if delimiter_idx == 1
% do file type 1 stuff
else
% do file type 2 stuff
end
end
6 Comments
_
on 1 Feb 2022
When you say you've "read these files into a cell array", I assume that means you've modified the code I posted so that when it finds a delimiter that works, it stores the data variable in a cell array with one cell per file. (Which seems like a reasonable thing to do.) Is that what you mean?
If that's the case, and now you want to know how to figure out from the contents of each cell whether it was a type 1 or type 2 file, then you'd have to be able to distinguish between the two file types based on what comes from readtext() for each file type. readtext() returns a matrix, so you'd have to know something about the size of possible matrices returned by readtext() in each case or the possible locations of the NaN's in the matrix, etc. I have no idea about the range of possiblities for what those files could possibly contain, so I wouldn't be able to put any conditions on the matrices from readtext() in order to distinguish one type from another. But you may know more about what the possibilities are for those file types and hence what the matrices from readtext should look like, so you may be able to come up with some condition to distinguish the two types.
However, I think it may be easier to just keep track of each file's type when it is succesfully read, rather than trying to go back after the fact and figure it out from the matrices you end up with. That would look something like this minor modification to the code above:
my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
N = numel(my_files);
file_type = zeros(1,N);
file_data = cell(1,N);
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:N
fprintf('preparing to read file %s:\n',my_files{i});
tried_delimiters = false(1,n_delimiters);
success = false;
while any(~tried_delimiters)
fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
tried_delimiters(delimiter_idx) = true;
if all(isnan(data(:)))
fprintf('\tfailed\n');
delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
continue
end
file_type(i) = delimiter_idx;
file_data{i} = data;
success = true;
break
end
if success
% successfully read my_files{i} with delimiter delimiters{delimiter_idx}
fprintf('\tsuccess\n');
else
% couldn't figure out how to read this file
fprintf('\tall delimiters failed. couldn''t read the file\n');
end
end
Then you could run through your subsequent operations with the data from the files like this:
for i = 1:N
if file_type(i) == 1
% do file type 1 stuff with file_data{i}
elseif file_type(i) == 2
% do file type 2 stuff with file_data{i}
end
end
I'm not sure if that answers your question. If not, let me know.
More Answers (1)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!