Selective columns from multiple text folders

Question

0 votos

Hey guys,

I would really appreciate your help here.

I have a dataset containing 400 folders whit one text file inside. the text file has 13 columns and I want one of them! at the end, I want a text file with 400 columns. so far, I have it for one folder (by help of importer!) but I dont know how to implement the loop. folders and the text files (same name) have this order:

m0000000

m0000001

m0000002

...

m0000399

Here is the code so far ...

filename = 'D:\work\1ST EXP\4.2\m0000000\m0000000.TXT';
delimiter = '\t';
formatSpec = '%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter,  'ReturnOnError', false);
fclose(fileID);
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
    raw(1:length(dataArray{col}),col) = dataArray{col};
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
rawData = dataArray{1};
for row=1:size(rawData, 1);
  
    regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
    try
        result = regexp(rawData{row}, regexstr, 'names');
        numbers = result.numbers;
        
      
        invalidThousandsSeparator = false;
        if any(numbers==',');
            thousandsRegExp = '^\d+?(\,\d{3})*\.{0,1}\d*$';
            if isempty(regexp(numbers, thousandsRegExp, 'once'));
                numbers = NaN;
                invalidThousandsSeparator = true;
            end
        end
      
        if ~invalidThousandsSeparator;
            numbers = textscan(strrep(numbers, ',', ''), '%f');
            numericData(row, 1) = numbers{1};
            raw{row, 1} = numbers{1};
        end
    catch me
    end
end
%% Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
%% Allocate imported array to column variable names
Rho = cell2mat(raw(:, 1));
%clearvars filename delimiter formatSpec fileID dataArray ans raw col numericData rawData row regexstr result numbers invalidThousandsSeparator thousandsRegExp me R;
diary merged.txt
Rho 

2 comentarios
Mostrar Ninguno Ocultar Ninguno

ANKUR KUMAR el 10 de Mzo. de 2021

Could you please attach one of the text files, it would help us to help you.

Afshin sadeghi el 10 de Mzo. de 2021

m0000000.TXT

Thanks alot for answering.

I need the Rho columns.

Cheers.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Iniciar sesión para seguir la actividad

Answer 1

Stephen23 el 11 de Mzo. de 2021

Editada: Stephen23 el 11 de Mzo. de 2021

Abrir en MATLAB Online

1 voto

m0000000.txt

Here is one simple and efficient approach (untested, but should get you started):

P = 'D:\work\1ST EXP\4.2';
V = 0:399;
N = numel(V);
C = cell(1,N);
for k = 1:N
    F = sprintf('m%07d',V(k));
    F = fullfile(P,F,sprintf('%s.txt',F));
    T = readtable(F,'VariableNamingRule','preserve');
    C{k} = T.Rho;
end
M = [C{:}] % only if all files have the same number of rows.

You can then save matrix M, e.g.:

writematrix(M,'myfile.txt')

Just out of interests sake, why do you need that complex string parsing and thousand's separator handling? The sample file does not include any thousands separators that I can see:

T = readtable('m0000000.txt','VariableNamingRule','preserve')
T = 465x17 table
C1(xm)C1(ym)C1(zm)C2(xm)C2(ym)C2(zm)P1(xm)P1(ym)P1(zm)P2(xm)P2(ym)P2(zm)RhoIUDTime__________________________________________________________________________________________________________________

       0         0         1         0         0         2         0         0         26        0         0         -0.3    147.22     -44.88    7.63    01/21/2021 01:00:03
       0         0         1         0         0         3         0         0         27        0         0         0.84    146.98      123.9    2.76    01/21/2021 01:00:06
       0         0         1         0         0         4         0         0         28        0         0        -4.34    147.13    -638.77    0.39    01/21/2021 01:00:08
       0         0         1         0         0         5         0         0         29        0         0         1.68    147.17     246.72    1.32    01/21/2021 01:00:10
       0         0         1         0         0         2         0         0          3        0         0         0.03    147.17       4.28    5.18    01/21/2021 01:00:11
       0         0         1         0         0         3         0         0          4        0         0        -0.81    147.19    -119.69    0.41    01/21/2021 01:00:13
       0         0         1         0         0         4         0         0          5        0         0         1.22     147.2     179.94    0.52    01/21/2021 01:00:15
       0         0         1         0         0         2         0         0         27        0         0         0.87    147.19     128.65    2.57    01/21/2021 01:00:16
       0         0         1         0         0         3         0         0         28        0         0        -5.16    147.07    -758.41    0.39    01/21/2021 01:00:18
       0         0         1         0         0         4         0         0         29        0         0          2.9    147.15      427.2    0.58    01/21/2021 01:00:20
       0         0         1         0         0         2         0         0          6        0         0       -24.59    147.12    -3617.8     0.1    01/21/2021 01:00:21
       0         0         1         0         0         3         0         0          7        0         0       -24.44    146.97    -3592.2    0.11    01/21/2021 01:00:23
       0         0         1         0         0         4         0         0          8        0         0       -24.26    146.99    -3565.7    0.09    01/21/2021 01:00:25
       0         0         1         0         0         5         0         0          9        0         0       -24.91    146.98    -3661.2    0.11    01/21/2021 01:00:27
       0         0         1         0         0         2         0         0         30        0         0       -27.07    146.97    -3978.1    0.14    01/21/2021 01:00:28
       0         0         1         0         0         3         0         0         31        0         0       -22.67    146.99    -3331.9    0.17    01/21/2021 01:00:30

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Afshin sadeghi el 13 de Mzo. de 2021

@Stephen Cobeldick here is all the text

Error using readtable (line 197)

Invalid parameter name: VariableNamingRule.

Error in website (line 8)

T = readtable(F,'VariableNamingRule','preserve');

Afshin sadeghi el 15 de Mzo. de 2021

@Stephen Cobeldick its working now^^

Thank you very much.

Iniciar sesión para comentar.

Answer 2

ANKUR KUMAR el 10 de Mzo. de 2021

Editada: ANKUR KUMAR el 10 de Mzo. de 2021

Abrir en MATLAB Online

0 votos

Since you have not attached a text file, I just put a sample text file having multiple rows in multiple folders, and use the below code to have a matrix containing the first columns from all files.

files = dir('D:\matlab_ask')
dirFlags = [files.isdir];
subFolders = files(dirFlags);
for i =3:length(subFolders)
    cd(subFolders(i).name)
    filename = 'test.txt';
    fileID = fopen(filename,'r');
    dataArray = textscan(fileID, '%f%*s%[^\n\r]', 'Delimiter', ',');
    merged_matrix(:,i-2) = [dataArray{1:end-1}];
    
    cd ..
end
merged_matrix

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Stephen23 el 11 de Mzo. de 2021

Editada: Stephen23 el 11 de Mzo. de 2021

Note that assuming that the first two results from DIR should be discarded is buggy: consider what would happen if the user quite reasonably decides to add a wildcard and fileextension to the DIR match string. Consider also what happens to names of files/folders that might be in the same folder and whos names start with '+' or '.'. This can be easily avoided by writing more robust code:

https://www.mathworks.com/matlabcentral/answers/345841-unexplained-error-on-dir#answer_271627

https://www.mathworks.com/matlabcentral/answers/21150-function-call#comment_46042

https://www.mathworks.com/matlabcentral/answers/394807-create-text-file-containing-list-of-file-names#comment_556641

https://www.mathworks.com/matlabcentral/answers/288815-gui-for-loop-not-working-correctly#answer_224956

https://www.mathworks.com/matlabcentral/answers/484430-using-struct-dir-selpath-what-do-and-mean-as-struct-name#answer_395754

https://www.mathworks.com/matlabcentral/answers/40949-omitting-pointers-when-listing-folder-contents#answer_50568

https://www.mathworks.com/matlabcentral/answers/13978-list-of-files-sorting#answer_19278

Note that calling CD like that should be avoided. It is more efficient and more robust to use absolute/relative filenames to access data files. The MATLAB documentation states "Avoid programmatic use of cd, addpath, and rmpath, when possible. Changing the MATLAB path during run time results in code recompilation."

source: https://www.mathworks.com/help/matlab/matlab_prog/techniques-for-improving-performance.html

ANKUR KUMAR el 12 de Mzo. de 2021

I am still at learning stage. Thanks for your suggestions. I appreciate it.

Iniciar sesión para comentar.

Selective columns from multiple text folders

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Respuesta aceptada

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Más respuestas (1)

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Categorías

Etiquetas

Community Treasure Hunt

Selective columns from multiple text folders

2 comentarios Mostrar Ninguno Ocultar Ninguno

Respuesta aceptada

6 comentarios Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Más respuestas (1)

2 comentarios Mostrar Ninguno Ocultar Ninguno

Categorías

Etiquetas

Ver también

Community Treasure Hunt

2 comentarios
Mostrar Ninguno Ocultar Ninguno

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

2 comentarios
Mostrar Ninguno Ocultar Ninguno