Improving parallelisation of sequentialfs

Question

Quant el 17 de Nov. de 2017

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/367715-improving-parallelisation-of-sequentialfs

Editada: Maciej Przydatek el 23 de Jul. de 2018

I am doing a student task about classification with the naive bayes classifier. I am experimenting with different distributions and forward feature selection on MNIST Fashion and UCI Spambase datasets. I have relatively powerful 4-core Intel i7-4770k coupled with 16 GB of RAM, so I set 'useParallel' to true. But still feature selection is quite slow. In fact, I got frustrated with feature selection using 'kernel' distribution on MNIST set not able to finish in 48 hours, and lookedthe ways tio speed it up.

I expected seqentialfs to try different combinations of features of one step in parallel, because they are independent from each other, but found that only cross validation is parallelised.

So I copy pasted the code from sequentialfs.m and made following modifications:

Commented out line 345
and replaced the loop on lines 351-354 with

 parfor k = 1:numAvailable
                crit(k) = callfun(fun,[X(:,in),X(:,available(k))],other_data,cv,mcreps,ParOptions);
            end

And ( renamed and modified) sequentialfs got faster by 1.5 -4.2 times depending on underlying distribution. I used UCI spamset for testing:

%%load data
clear all
close all
clc
spam_data=dlmread('spambase.data');%read in data
normal_factor=round(1/min(spam_data(spam_data>0)));
spam_data(:,1:end-1)=round(spam_data(:,1:end-1)*normal_factor);
distribs={'mn','mvmn','kernel'};
for k=1:3
if strcmp(distribs{k},'mvmn')
  fun = @(Xtrain,Ytrain,Xtest,Ytest)...
      sum(Ytest~=predict(fitcnb(Xtrain,Ytrain,'Distribution',distribs{k}, 'CategoricalPredictors', 'all'),Xtest));
else
fun = @(Xtrain,Ytrain,Xtest,Ytest)...
      sum(Ytest~=predict(fitcnb(Xtrain,Ytrain,'Distribution',distribs{k}),Xtest));
end
s = RandStream('mt19937ar','seed',2017);
RandStream.setGlobalStream(s);
 disp(['Starting feature selection with ', distribs{k}, ', bultin']);
 tic
fs = (sequentialfs(fun,spam_data(:,1:end-1),uint8(spam_data(:,end)),...
    'options',statset('Display','iter','UseParallel',true)));
toc
s = RandStream('mt19937ar','seed',2017);
RandStream.setGlobalStream(s);
 disp(['Starting feature selection with ', distribs{k}, ', custom with useParallel=true']);
tic
fs = (parsequentialfs(fun,spam_data(:,1:end-1),uint8(spam_data(:,end)),...
    'options',statset('Display','iter','UseParallel',true)));
toc
s = RandStream('mt19937ar','seed',2017);
RandStream.setGlobalStream(s);
 disp(['Starting feature selection with ', distribs{k}, ', custom with useParallel=false']);
tic
fs = (parsequentialfs(fun,spam_data(:,1:end-1),uint8(spam_data(:,end)),...
    'options',statset('Display','iter','UseParallel',false)));
toc
end

Clearly the CPU is significantly better utilised this way. Perhaps Mathworks can do the same?

I understand, that this would require much more rigorous testing with other classifiers, data sets and hardware configurations, but anyway Mathworks seem to disregard their own recommendation that parallel loops must be at the highest level possible

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Maciej Przydatek el 10 de Jul. de 2018

Editada: Maciej Przydatek el 23 de Jul. de 2018

This is excellent and works perfectly in my case! Sample calculations were taking about 55 second on single worker (100% core usage), almost 130 (!) seconds on 4 workers parpool with unmodified sequentialfs (about 45% usage of each core) and with modified sequentialfs I was able to go below 20 seconds (100% usage on all cores).

In R2018a, line numbers are 356 and 362-365 respectively.

EDIT: WARNING! With bigger data, this modification results in sudden memory consumption increase at the beginning of computations, which later goes down to average normal. The peak value depends on the size of data to be processed by workers. My data variable was 1208064000 bytes (over 1 GB) big and I had to use a swap partition of 16 GB (to double my 16 GB RAM) to avoid workers crash. My peak was at approximately 26 GB memory usage (all RAM consumed and most of the swap), but after half a minute RAM usage dropped to 8 GB. It may be caused by the process of distributing the data to the workers, but it's a blind shot.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Improving parallelisation of sequentialfs

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Improving parallelisation of sequentialfs

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos