Parallel calculating for fast execution

Question

0 votos

new machine performance.png

I wrote a code with two scripts: 1- a function which gives a needed value Tmax. Tmax depends on SIX variable inputs. 2- a script to calculat many other quantities where we need to call the function Tmax. For this we have to do this a lot of times and that needs a lot of time. I am lookin for a way to reduce calculating time.

In my second script, I change all the for loop with parfor loop where it is possible. I saw an amelioration and the time is reduced but not so much. I have a powerful machine and the configuration is attached. I hope be able to divide the execution time by 32 as I have 32 cores. That's not happening and I am wondering why. The points at which I calculate my Tmax are independant, so I think that it is possible to give to each core n/32 points if n is the number of my points= sample size. I ask you to explain this issue. Can we call a function n times in parallel with (n/ 32) times executed for each core?

I put the 2 scripts below:

1- the function :

function T_max = EPO_OILS_SEMIBATCH(Par)
global     UA  Tj0   Taj   F  tadd CHP_initial 
F=Par(1); 
tadd=Par(2);
UA=Par(3);
Taj=Par(4);
Tj0=Par(5);
CHP_initial=Par(6);
tspan=[0:10:10000]; 
y0=[0 CHP_initial 0 ((1-(0.14*CHP_initial)*0.24285)*1000)/18 0.5 1.70 0.00 0.00 0.00 Tj0 0.26]; 
[t, y]=ode23s(@semibatch,tspan,y0); 
T_max = max(y(:,10));
function dydt=semibatch(t,y)
global UA Tj0 Taj F tadd   
if t > tadd
    F=0;
end
dydt=[(1/(1+((1-((y(11)-0.12)/y(11)))/(((y(11)-0.12)/y(11))*9))))*((F/(y(11)-0.12))*(24-y(1))-((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))+((0.0009*exp(-(2429.636)*((1/y(10))-(0.0029411))))*y(3))+(1-((y(11)-0.12)/y(11)))*(((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5))+((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))+((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7))-((0.00339*exp(-(5063.74789)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(1)*((y(1)/y(4))^0.5)))/((y(11)-0.12)/y(11)));...
    -((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))-(F*y(2)/(y(11)-0.12)); ((-F*y(3)/(y(11)-0.12))+((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))-((0.0009*exp(-(2429.636)*((1/y(10))-(0.0029411))))*y(3))-((0.001*exp(-(2405.581)*((1/y(10))-(0.0029411))))*y(3))-(1-((y(11)-0.12)/y(11)))*(((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5))+((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))+((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7))+((0.0592*exp(-(8419.53331)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(3)*((y(1)/y(4))^0.5)))/((y(11)-0.12)/y(11)));...
    ((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))+((0.001*exp(-(2405.581)*((1/y(10))-(0.0029411))))*y(3))-(F*y(4)/(y(11)-0.12))-(1-((y(11)-0.12)/y(11)))*((0.000237*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*((y(1)*y(4))^0.5))/((y(11)-0.12)/y(11)); -((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5)); -((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6)); ((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))-((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7)); (((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5))+((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))+((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7)))-((0.000237*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*((y(1)*y(4))^0.5))-((0.00339*exp(-(5063.74789)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(1)*((y(1)/y(4))^0.5))-((0.0592*exp(-(8419.53331)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(3)*((y(1)/y(4))^0.5));...
    ((0.000237*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*((y(1)*y(4))^0.5))+((0.00339*exp(-(5063.74789)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(1)*((y(1)/y(4))^0.5))+((0.0592*exp(-(8419.53331)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(3)*((y(1)/y(4))^0.5)); (1/(((y(11)-0.12)*1.00+0.12*0.93)*2000))*((-(y(11)-0.12)*(((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))*-5580+((0.001*exp(-(2405.581)*((1/y(10))-(0.0029411))))*y(3))*-359000+((0.0009*exp(-(2429.636)*((1/y(10))-(0.0029411))))*y(3))*-163000)-0.12*((((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5))+((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))+((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7)))*-230000+(((0.000237*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*((y(1)*y(4))^0.5))+((0.00339*exp(-(5063.74789)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(1)*((y(1)/y(4))^0.5))+((0.0592*exp(-(8419.53331)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(3)*((y(1)/y(4))^0.5)))*-90000))+UA*(Tj0-y(10))+24*F*20*(Taj-y(10))); F];

2- the calculating:

tic
n=100;   % n is big (till 1000000 and more)
p=6; 
F_max= 0.002; 
F_min= 0.001; 
tadd_max=1200; 
tadd_min= 600; 
UA_max= 100;  
UA_min= 1;    
Taj_max= 308.15;  
Taj_min=293.15; 
Tj0_max= 343.15;   
Tj0_min= 313.15;   
CHP_initial_max=8;  
CHP_initial_min=2.9; 
sob1 = sobolset(p);
An = net(sob1,n);
Par_max=[F_max tadd_max UA_max Taj_max Tj0_max CHP_initial_max];  
Par_min=[F_min tadd_min UA_min Taj_min Tj0_min CHP_initial_min];  
A=zeros(size(An,1),size(An,2)); 
parfor i=1:size(An,1)
A(i,:)=An(i,:).*(Par_max-Par_min)+Par_min;
end
A;
T_max_A=[];
parfor i=1:n
T_max_A(i)= EPO_OILS_SEMIBATCH(A(i,:));   
end
f_0 = (1/n)*sum(T_max_A)                  
D_T = ((1/n)*sum(T_max_A.^2))- f_0^2

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Jan el 27 de En. de 2017

Editada: Jan el 27 de En. de 2017

Please use the "{} Code" button for a proper formatting. Currently we cannot run or inspect your code by copy&paste and editing this massive block of code is prone to errors and time consuming. Thanks.

Sergey Kasyanov el 27 de En. de 2017

Abrir en MATLAB Online

Can you use code formatting?

Like this:

 A=10;
 B=10;

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Jan el 27 de En. de 2017

0 votos

Sorry, this will not solve the problem:

For the speed part: Wow, this is a cruel code! I would not dare to simplify it manually. So just some ideas:

sqrt() is cheaper than ^0.5.
There are a lot of terms in the king of exp(a*(1/y(10))-b). Because the exp() function is very expensive, you can try to combine these terms to reduce the number of calls.

By the way, you can omit the square brackets in tspan=[0:10:10000], see why-not-use-square-brackets. But here the saved microseconds will not matter.

if t > tadd F=0; end adds a discontinuity to the integertation. Matlab's integrators handle smooth functions only, see http://www.mathworks.com/matlabcentral/answers/59582#answer_72047 .

I'm wondering, if you can trust the results: Most of the constants have 3 or 5 valid digits only, some have 8. The formula has about 100 terms. Without any analysis I guess, that the cancellation error might dominate the solution.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

moulay ELMOUKRIE el 28 de En. de 2017

0 votos

Thanks for your answers. Your remarks gave me some amelioration but I think we can not be able to get interesting time reduction if we stay focused on the function, because the problem is the number n of calculated images which I need to my results convergence. I thaught that I can calculate many images in the same time. That's what I expected from parfor command. But the machine treats the calculating differently. In contrary, the function is now simplified because I eliminated all the global parameters and I kept just the six variable inputs. I injected all the constants in dydt. I did it step by step and checked each time with some runs.

I attached the program : the file hassanehassane is not complet. The complet calculating needs by 64*time given by hassanehassane! So when n=1000000, that will take 15 DAYS in minium.

If each core can calculat the T max of (n/32) points at the sametime, that will be a big advantage.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Parallel calculating for fast execution

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Respuestas (2)

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

Parallel calculating for fast execution

2 comentarios Mostrar Ninguno Ocultar Ninguno

Respuestas (2)

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

2 comentarios
Mostrar Ninguno Ocultar Ninguno

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos