Filtering multiple .csv files based on data from a table
    3 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
    Simon
 el 20 de Ag. de 2024
  
    
    
    
    
    Comentada: Jeremy Hughes
    
 el 21 de Ag. de 2024
            Hello all,
I am trying to filter multiple .csv files (based on data in a separate table. To be more specific I am trying to filter the data from dataTables by d_Data.Actual.Calculated_Flow variable, so that it filters the data that are below minFlow or above maxFlow, which are listed in dataList. dataList has several columns the first one lists Tablename that corresponds with names of the tables in dataTables, the 4th & 5th lists corresponding maxFlow & minFlow that I would like to apply for filtering. How can I achieve this?
here is the code:
dataListDir="zoznam_merani.xlsx";
    dataDir='C:\Users\U430746\OneDrive - Danfoss\Desktop\EHA\EHA 1\Matlab\Matlab\Simulation EHA V1\Test Data\OneDrive_2024-08-16\Flow control\EHA_Flow_Control_standard mode\EHA_Flow_Control';
    % Loads dataList
    dataList=readtable(dataListDir);
    % Loads direction to .csv files
    dataFiles = dir(fullfile(dataDir, '*.csv'));
    % Inicialize a cell array for input of tables
    dataTables = cell(1, length(dataFiles));
    % Loads tables to the dataTables cell array
    for i = 1:length(dataFiles)
        fileName = fullfile(dataDir, dataFiles(i).name);
        dataTables{i} = readtable(fileName,"VariableNamingRule","preserve");
    end
    % Inicialize a cell array for filtered data
    filteredDataTables = cell(1, length(dataFiles));
    % Runs through all the tables in dataTables
    for i = 1:length(dataTables)
        % Finds a row that corresponds with the TableName
        rowIndex = strcmp(dataList.TableName, dataFiles(i).name);
        if any(rowIndex)
            % Gets minFlow & maxFlow for the actual table
            minFlow = dataList.minFlow(rowIndex);
            maxFlow = dataList.maxFlow(rowIndex);
            % Filters the tables based on minFlow & maxFlow
            filteredDataTables{i} = dataTables{i}(dataTables{i}.("d_Data.Actual.Calculated_Flow") >= minFlow & dataTables{i}.("d_Data.Actual.Calculated_Flow") <= maxFlow, :);
        else
            % If the tableName is not available executes no changes
            filteredDataTables{i} = dataTables{i};
        end
    end
0 comentarios
Respuesta aceptada
  Jeremy Hughes
    
 el 20 de Ag. de 2024
        It looks like you're loading all the data into memory, then looping to do the filtering. And to the best of my ability to tell, it looks like you're doing it right. (Without files, it's hard to say)
Are you having any specific problems? 
You might have better luck doing the filtering on each table as you bring it in, so I reorganized this code a bit:
% Loads direction to .csv files
dataListDir="zoznam_merani.xlsx";
dataList=readtable(dataListDir,"TextType","string");
% Truncated this for readability, also One Drive can be problematic when
% accessed in a loop, but I see that more when writing. This is probably
% fine.
dataDir='C:\Users\...whereever...\';
dataFiles = dir(fullfile(dataDir, '*.csv'));
dataFiles = string({datafiles.name})';
% Loads tables to the dataTables cell array
dataTables = cell(1, height(dataFiles));
filteredDataTables = cell(1, height(dataFiles));
for i = 1:height(dataFiles) % Changed to length => height since length can give strange results 
    T = readtable(fullfile(dataDir, dataFiles(i)),"VariableNamingRule","preserve");
    % I assume this condition is right assuming TableName lists the files
    % as they are named in dataDir.
    rowIndex = (dataList.TableName == dataFiles(i));
    % **** Potentially, this is a problem if there are multiple rows with
    % the same file name, but so check here for that case. ****
    if any(rowIndex)
        % Gets minFlow & maxFlow for the actual table
        minFlow = dataList.minFlow(rowIndex);
        maxFlow = dataList.maxFlow(rowIndex);
        % Filters the tables based on minFlow & maxFlow
        aboveMin = T.("d_Data.Actual.Calculated_Flow") >= minFlow;
        belowMax = T.("d_Data.Actual.Calculated_Flow") <= maxFlow
        filteredDataTables{i} = T(aboveMin & belowMax, :);
    else
        % If the tableName is not available executes no changes
        filteredDataTables{i} = T;
    end
    dataTables{i} = T;
end
2 comentarios
  Jeremy Hughes
    
 el 21 de Ag. de 2024
				If you accept an answer, it will highlight that answer. Others can still see it and answer, but they will see that you have accepted it.
Más respuestas (1)
  Saurabh
      
 el 20 de Ag. de 2024
         Hi Simon, 
What I understood is that you want to filter multiple csv files based on the "d_Data.Actual.Calculated_Flow" column, by applying `minFlow` and `maxFlow` values from a separate `dataList` table that matches table names with CSV filename. I looked over your code and didn't see any issues.
Here are some things to check and consider: 
- Variable Naming Rule: You are using the "VariableNamingRule","preserve" option in “readtable”. Ensure that the variable name "d_Data.Actual.Calculated_Flow" is exactly as it appears in your CSV files. MATLAB is case-sensitive, and any mismatch will cause an error.
 - Table Name Matching: When matching “dataFiles (i)”.name with “dataList.TableName,” ensure that the names are consistent. If “dataList.TableName” does not include the file extension (e.g., .csv), you might need to adjust the comparison to remove the extension from “dataFiles(i).name”.
 - Row Index Check: The condition if any(rowIndex) checks if there is any match. Ensure that “dataList.TableName” contains the expected names and that there are no leading/trailing spaces or mismatches.
 - Data Types: Verify that “minFlow” and “maxFlow” are numeric values and that “dataTables{i}.("d_Data.Actual.Calculated_Flow")” is a numeric column. Any type mismatch can cause errors during the filtering operation.
 
Here's a refined version of script with some adjustments:
dataListDir = "zoznam_merani.xlsx";
dataDir = 'C:\Users\U430746\OneDrive - Danfoss\Desktop\EHA\EHA 1\Matlab\Matlab\Simulation EHA V1\Test Data\OneDrive_2024-08-16\Flow control\EHA_Flow_Control_standard mode\EHA_Flow_Control';
% Load dataList
dataList = readtable(dataListDir);
% Load .csv files
dataFiles = dir(fullfile(dataDir, '*.csv'));
% Initialize cell arrays for tables
dataTables = cell(1, length(dataFiles));
filteredDataTables = cell(1, length(dataFiles));
% Load tables into dataTables
for i = 1:length(dataFiles)  
    fileName = fullfile(dataDir, dataFiles(i).name);
    dataTables{i} = readtable(fileName, "VariableNamingRule", "preserve");
end
% Filter dataTables
for i = 1:length(dataTables)
    % Strip the .csv extension for comparison
    tableName = erase(dataFiles(i).name, '.csv');
    rowIndex = strcmp(dataList.TableName, tableName);
    if any(rowIndex)
        % Get minFlow and maxFlow for the current table
        minFlow = dataList.minFlow(rowIndex);
        maxFlow = dataList.maxFlow(rowIndex);  
        % Filter the table     
        flowData = dataTables{i}.("d_Data.Actual.Calculated_Flow");     
        filteredDataTables{i} = dataTables{i}(flowData >= minFlow & flowData <= maxFlow, :);      
    else   
        % No matching table name   
        filteredDataTables{i} = dataTables{i};   
    end
end
Some Debugging tips: 
- Use “disp” or “fprintf” to print out variables at different stages.
 - Use “class” and “whos” to verify that variables are of the expected types.
 
 I hope this was helpful
Ver también
Categorías
				Más información sobre Logical en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!