Faster codes in a loop

1 visualización (últimos 30 días)
wesso Dadoyan
wesso Dadoyan el 5 de Jul. de 2017
Editada: Jan el 6 de Jul. de 2017
Hi,
I have a sequence of dates and firms. For each firm I want to compute the median return on equity for the peers that are in the same state excluding the firm itself. I wrote the following but it is taking ages since I am running the loop for each row in the dataset. I am wondering if there is a way to make it faster:
% ROEpeer is a vector of nans;
% ROApeer is a vector of nans;
%ROE is the return on equity for the firm
%ROA is the return on asset for the firm
%state is the state in which the firm operates
%ID is the ID of the firm
% year is the year of the financial statements
% Quarter is the quarter of the financial statements;
for i=1:length(ROEpeer)
x0a=find(Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i)& ~isnan(ROE));
x0b=find(Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i)& ~isnan(ROA));
if length(x0a)>2
ROEpeer(i)=median(ROE(x0a));
end
if length(x0b)>2
ROApeer(i)=median(ROE(x0b));
end
end

Respuesta aceptada

Jan
Jan el 5 de Jul. de 2017
Editada: Jan el 5 de Jul. de 2017
Are you sure that ROEpeer and ROApeer is pre-allocated properly?
isnumROE = ~isnan(ROE);
isnumROA = ~isnan(ROA);
for i = 1:length(ROEpeer)
tmp = (Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i));
x0a = (tmp & isnumROE);
x0b = (tmp & isnumROA);
if sum(x0a) > 2
ROEpeer(i) = median(ROE(x0a));
end
if sum(x0b) > 2
ROEpeer(i) = median(ROE(x0b));
end
end
Try this at first. It avoid to calculate ~isnan(ROE) in each iteration and determines the caomparison of Year, Quarter, State and ID once only. Omitting the find() allows for a faster "logical indexing". Perhaps this runs in the half time. But what does the profiler tell you about the bottleneck of the code? Is it median? Then improving the loop will not be very successful.
For optimizing the run time, test data are useful. Otherwise it is just some guessing and avoiding the repeated calculation of the same results.
Another idea: if x0a has more then one element, the median is calculated multiple times for the different i. Does this work:
isnumROE = ~isnan(ROE);
isnumROA = ~isnan(ROA);
for i = 1:length(ROEpeer)
tmp = (Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i));
x0a = (tmp & isnumROE);
x0b = (tmp & isnumROA);
if sum(x0a) > 2 && isnan(ROEpeer(i))
ROEpeer(x0a) = median(ROE(x0a));
end
if sum(x0b) > 2 && isnan(ROApeer(i))
ROEpeer(x0b) = median(ROE(x0b));
end
end
Now all ROEpeer(x0b) are replaced by the median at once and not repeatedly.
How are Yearm Quarter, State and ID defined? Do neighboring elements have the same value usually or are the data mixed? Are the values sorted?
  2 comentarios
wesso Dadoyan
wesso Dadoyan el 5 de Jul. de 2017
thanks . the 1st suggestion works while the second doesn't because ROEpeer and ROApeer are already nans.I have just tried profiler. it seems that simply the number of observations is mainly the problem (around 600,000). Thanks again
Jan
Jan el 6 de Jul. de 2017
Editada: Jan el 6 de Jul. de 2017
Are you sure, that the 2nd method does not work? Imagine this:
i = 17;
x0a = [17, 18, 19];
if length(x0a) > 2 && isnan(ROEpeer(17)) % Both is TRUE
ROEpeer([17,18,19]) = median(ROEpeer([17,18,19]));
end
Now in the next iteration:
i = 18;
x0a = [17, 18, 19];
if length(x0a) > 2 && isnan(ROEpeer(17)) % TRUE && FALSE
% Now this is not calculated again, because ROEpeer(18)
% was set to the wanted value already!
ROEpeer([17,18,19]) = median(ROEpeer([17,18,19]));
end
The calculation of the expensive median is saved for i=18 and i=19 in his case. This could and should reduce the runtime massively.
If the combination of years, quarters, states and IDs contain longer runs or identical data, a run length encoding would reduce the runtime also. Then you do not have to check for each value, if the pattern changes, but you'd run a loop over the indices only, where the values changes.
I'm sure the function could be still improved, if you post some relevant input data.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Deployment, Integration, and Supported Hardware en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by