How to split a sequence based on values from one variable
    2 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
    Matteo Soldini
 el 3 de Mayo de 2020
  
    
    
    
    
    Comentada: Ameer Hamza
      
      
 el 4 de Mayo de 2020
            Good evening,
I can't figure out how to solve the following problem.
Assuming that I have a dataset as in the picture, I would like to divide it into many smaller datasets using the variable "State" and keeping the sequence. Actually the real dataset has more than 200000 observations so I can't know when the variable State changes from NORMAL to RECOVERY and vice versa, but I would like to split the dataset into many mini sequences where each one has the same State variable for all the observations.
Then, I would need to divide the variables into a Predictors set (varaibles Sensor 1, Sensor 2, Sensor 3) and a Response set (variable State).
If we take, as an example, the image, at the end of the problem I would like to have for the Predictors a cell array of size Nx1 (N equal to the number of mini sequences) with the first cell of size 3x2 (the three features and the first two observations), the second cell of size 3x2, the third cell of size 3x1 and so on. Correspondingly, for the Response I would like to have an Nx1 cell array where the first cell is of dimension 1x2, the second is 1x2, the third is 1x1 and so on.

The problem is that with a dataset of 200000 observations I don't know what kind of loop to use and how to use it.
Thank you!
0 comentarios
Respuesta aceptada
  Ameer Hamza
      
      
 el 3 de Mayo de 2020
        See the following example.
First create an example table
data = {1, 2, 3, 'norm'; 2, 3, 4, 'norm'; 
        2, 3, 1, 'rec' ; 4, 4, 2, 'rec';
        1, 2, 3, 'norm'; 2, 3, 4, 'rec'; 
        2, 3, 1, 'rec' ; 4, 4, 2, 'rec'};
t = cell2table(data, 'VariableNames', ...
    {'sen1', 'sen2', 'sen3', 'state'}); % an example table
Result
t =
  8×4 table
    sen1    sen2    sen3     state  
    ____    ____    ____    ________
    1.00    2.00    3.00    {'norm'}
    2.00    3.00    4.00    {'norm'}
    2.00    3.00    1.00    {'rec' }
    4.00    4.00    2.00    {'rec' }
    1.00    2.00    3.00    {'norm'}
    2.00    3.00    4.00    {'rec' }
    2.00    3.00    1.00    {'rec' }
    4.00    4.00    2.00    {'rec' }
Then run the following code to split the data
idx = findgroups(t.state);
partition_idx = [1; find(diff(idx)~=0)+1; size(data,1)];
partition_idx = discretize(1:size(data,1), partition_idx);
sensor_val = splitapply(@(x) {x}, table2cell(t(:,1:3)), partition_idx.');
state_val = splitapply(@(x) {x}, table2cell(t(:,4)), partition_idx.');
sensor_val and sensor_val are cell arrays containing the required values.
2 comentarios
Más respuestas (0)
Ver también
Categorías
				Más información sobre Structures en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!