How to extract a subset of person identified data from several excel files into one excel file
    3 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
    Edward Wakelam
 el 31 de Dic. de 2018
  
    
    
    
    
    Comentada: Edward Wakelam
 el 31 de Dic. de 2018
            I have a collection of seven very large (10,000 records in each) excel files where the data (33 attributes) for each student (identified by student ID) is spread across each of the seven excel files.  I'd like to randomly extract 1000 students data from across the seven files into one excel file (1000 rows, 34 columns).  I'm hoping that someone has already written MATLAB code to do something similar and save me from my very patchy programming skills?  Many thanks.
0 comentarios
Respuesta aceptada
  dpb
      
      
 el 31 de Dic. de 2018
        Well, there's a lot of code that will read a spreadsheet, but there's probably nothing that you can just blindly throw at the particular problem.
But, it's not hard problem to solve if the spreadsheets are reasonably regular in their format.
Just read each file into a an array and select the rows desired--if it is truly random selection, then something like
d=dir('AppropriateWildCardString.xls*');   % get the dir() list of files
for i=1:length(d)
  [~,~,raw]=xlsread(d(i).name);            % read raw data from the sheet
  if i==1                                  % generate a random subset indexing vector
    ix=randperm(size(raw,1),1000);
    data=raw(ix,:);                        % and save the selected subset
  else
    data=[data raw(ix,2:end)];             % and append to previous
  end
end
At that point you'll have cell array of the desired data; I'd recommend converting to table and storing the data in appropriate variable types (double, string, categorical, ...) based on its characteristics.
0 comentarios
Más respuestas (1)
  Michael Madelaire
      
 el 31 de Dic. de 2018
        
      Editada: Michael Madelaire
      
 el 31 de Dic. de 2018
  
      Start by saving the excel files as .csv.
Then load them into MatLab with 
data1 = csvread(filename1);
data2 = csvread(filename2);
data3 = csvread(filename3);
data4 = csvread(filename4);
data5 = csvread(filename5);
data6 = csvread(filename6);
data7 = csvread(filename7);
Then combine into one matrix
data = [data1; data2; data3; data4; data5; data6; data7];
Now you have to sample randomly from the new matrix (70,000 x 33). For this I suggest taken a look at:
This should do it.
%% Edit
You can also use 
xlsread()
to load excel files directly, but I prefer csvread.
Ver también
Categorías
				Más información sobre Data Import from MATLAB en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


