Filtering multiple .csv files based on data from a table

Question

Simon el 20 de Ag. de 2024

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2146494-filtering-multiple-csv-files-based-on-data-from-a-table

Comentada: Jeremy Hughes el 21 de Ag. de 2024

Hello all,

I am trying to filter multiple .csv files (based on data in a separate table. To be more specific I am trying to filter the data from dataTables by d_Data.Actual.Calculated_Flow variable, so that it filters the data that are below minFlow or above maxFlow, which are listed in dataList. dataList has several columns the first one lists Tablename that corresponds with names of the tables in dataTables, the 4th & 5th lists corresponding maxFlow & minFlow that I would like to apply for filtering. How can I achieve this?

here is the code:

dataListDir="zoznam_merani.xlsx";
    dataDir='C:\Users\U430746\OneDrive - Danfoss\Desktop\EHA\EHA 1\Matlab\Matlab\Simulation EHA V1\Test Data\OneDrive_2024-08-16\Flow control\EHA_Flow_Control_standard mode\EHA_Flow_Control';
    % Loads dataList
    dataList=readtable(dataListDir);
    % Loads direction to .csv files
    dataFiles = dir(fullfile(dataDir, '*.csv'));
    % Inicialize a cell array for input of tables
    dataTables = cell(1, length(dataFiles));
    % Loads tables to the dataTables cell array
    for i = 1:length(dataFiles)
        fileName = fullfile(dataDir, dataFiles(i).name);
        dataTables{i} = readtable(fileName,"VariableNamingRule","preserve");
    end
    % Inicialize a cell array for filtered data
    filteredDataTables = cell(1, length(dataFiles));
    % Runs through all the tables in dataTables
    for i = 1:length(dataTables)
        % Finds a row that corresponds with the TableName
        rowIndex = strcmp(dataList.TableName, dataFiles(i).name);
        
        if any(rowIndex)
            % Gets minFlow & maxFlow for the actual table
            minFlow = dataList.minFlow(rowIndex);
            maxFlow = dataList.maxFlow(rowIndex);
            
            % Filters the tables based on minFlow & maxFlow
            filteredDataTables{i} = dataTables{i}(dataTables{i}.("d_Data.Actual.Calculated_Flow") >= minFlow & dataTables{i}.("d_Data.Actual.Calculated_Flow") <= maxFlow, :);
        else
            % If the tableName is not available executes no changes
            filteredDataTables{i} = dataTables{i};
        end
    end

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Jeremy Hughes el 20 de Ag. de 2024

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2146494-filtering-multiple-csv-files-based-on-data-from-a-table#answer_1501994

Abrir en MATLAB Online

It looks like you're loading all the data into memory, then looping to do the filtering. And to the best of my ability to tell, it looks like you're doing it right. (Without files, it's hard to say)

Are you having any specific problems?

You might have better luck doing the filtering on each table as you bring it in, so I reorganized this code a bit:

% Loads direction to .csv files
dataListDir="zoznam_merani.xlsx";
dataList=readtable(dataListDir,"TextType","string");
% Truncated this for readability, also One Drive can be problematic when
% accessed in a loop, but I see that more when writing. This is probably
% fine.
dataDir='C:\Users\...whereever...\';
dataFiles = dir(fullfile(dataDir, '*.csv'));
dataFiles = string({datafiles.name})';
% Loads tables to the dataTables cell array
dataTables = cell(1, height(dataFiles));
filteredDataTables = cell(1, height(dataFiles));
for i = 1:height(dataFiles) % Changed to length => height since length can give strange results 
    T = readtable(fullfile(dataDir, dataFiles(i)),"VariableNamingRule","preserve");
    
    % I assume this condition is right assuming TableName lists the files
    % as they are named in dataDir.
    
    rowIndex = (dataList.TableName == dataFiles(i));
    
    % **** Potentially, this is a problem if there are multiple rows with
    % the same file name, but so check here for that case. ****
    if any(rowIndex)
        % Gets minFlow & maxFlow for the actual table
        minFlow = dataList.minFlow(rowIndex);
        maxFlow = dataList.maxFlow(rowIndex);
        
        % Filters the tables based on minFlow & maxFlow
        aboveMin = T.("d_Data.Actual.Calculated_Flow") >= minFlow;
        belowMax = T.("d_Data.Actual.Calculated_Flow") <= maxFlow
        filteredDataTables{i} = T(aboveMin & belowMax, :);
    else
        % If the tableName is not available executes no changes
        filteredDataTables{i} = T;
    end
    dataTables{i} = T;
end

2 comentarios
Mostrar NingunoOcultar Ninguno

Simon el 21 de Ag. de 2024

Editada: Simon el 21 de Ag. de 2024

Hello Jeremy.

Thank you for your answer!

I have already solved the issue. I had done a silly mistake you could not see. I switched the of minFlow & maxFlow in the dataList table.

Thanks again for your feedback.

One more thing. I am new to this forum, is there any way to mark the question as closed/answered to spare other members from unnecessary work?

Jeremy Hughes el 21 de Ag. de 2024

If you accept an answer, it will highlight that answer. Others can still see it and answer, but they will see that you have accepted it.

Iniciar sesión para comentar.

Answer 2

Saurabh el 20 de Ag. de 2024

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2146494-filtering-multiple-csv-files-based-on-data-from-a-table#answer_1501584

Abrir en MATLAB Online

Hi Simon,

What I understood is that you want to filter multiple csv files based on the "d_Data.Actual.Calculated_Flow" column, by applying `minFlow` and `maxFlow` values from a separate `dataList` table that matches table names with CSV filename. I looked over your code and didn't see any issues.

Here are some things to check and consider:

Variable Naming Rule: You are using the "VariableNamingRule","preserve" option in “readtable”. Ensure that the variable name "d_Data.Actual.Calculated_Flow" is exactly as it appears in your CSV files. MATLAB is case-sensitive, and any mismatch will cause an error.
Table Name Matching: When matching “dataFiles (i)”.name with “dataList.TableName,” ensure that the names are consistent. If “dataList.TableName” does not include the file extension (e.g., .csv), you might need to adjust the comparison to remove the extension from “dataFiles(i).name”.
Row Index Check: The condition if any(rowIndex) checks if there is any match. Ensure that “dataList.TableName” contains the expected names and that there are no leading/trailing spaces or mismatches.
Data Types: Verify that “minFlow” and “maxFlow” are numeric values and that “dataTables{i}.("d_Data.Actual.Calculated_Flow")” is a numeric column. Any type mismatch can cause errors during the filtering operation.

Here's a refined version of script with some adjustments:

dataListDir = "zoznam_merani.xlsx";
dataDir = 'C:\Users\U430746\OneDrive - Danfoss\Desktop\EHA\EHA 1\Matlab\Matlab\Simulation EHA V1\Test Data\OneDrive_2024-08-16\Flow control\EHA_Flow_Control_standard mode\EHA_Flow_Control';
% Load dataList
dataList = readtable(dataListDir);
% Load .csv files
dataFiles = dir(fullfile(dataDir, '*.csv'));
% Initialize cell arrays for tables
dataTables = cell(1, length(dataFiles));
filteredDataTables = cell(1, length(dataFiles));
% Load tables into dataTables
for i = 1:length(dataFiles)  
    fileName = fullfile(dataDir, dataFiles(i).name);
    dataTables{i} = readtable(fileName, "VariableNamingRule", "preserve");
end
% Filter dataTables
for i = 1:length(dataTables)
    % Strip the .csv extension for comparison
    tableName = erase(dataFiles(i).name, '.csv');
    rowIndex = strcmp(dataList.TableName, tableName);
    if any(rowIndex)
        % Get minFlow and maxFlow for the current table
        minFlow = dataList.minFlow(rowIndex);
        maxFlow = dataList.maxFlow(rowIndex);  
        % Filter the table     
        flowData = dataTables{i}.("d_Data.Actual.Calculated_Flow");     
        filteredDataTables{i} = dataTables{i}(flowData >= minFlow & flowData <= maxFlow, :);      
    else   
        % No matching table name   
        filteredDataTables{i} = dataTables{i};   
    end
end

Some Debugging tips:

Use “disp” or “fprintf” to print out variables at different stages.
Use “class” and “whos” to verify that variables are of the expected types.

I hope this was helpful

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Simon el 20 de Ag. de 2024

Hello Saurabh,

Thank you for your answer!

I have already solved the issue. I had done a silly mistake you could not see. I switched the of minFlow & maxFlow in the dataList table.

However, your suggestions are very useful as well.

Thank you again. :)

Iniciar sesión para comentar.

Filtering multiple .csv files based on data from a table

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Filtering multiple .csv files based on data from a table

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos