Faster codes in a loop
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
wesso Dadoyan
el 5 de Jul. de 2017
Hi,
I have a sequence of dates and firms. For each firm I want to compute the median return on equity for the peers that are in the same state excluding the firm itself. I wrote the following but it is taking ages since I am running the loop for each row in the dataset. I am wondering if there is a way to make it faster:
% ROEpeer is a vector of nans;
% ROApeer is a vector of nans;
%ROE is the return on equity for the firm
%ROA is the return on asset for the firm
%state is the state in which the firm operates
%ID is the ID of the firm
% year is the year of the financial statements
% Quarter is the quarter of the financial statements;
for i=1:length(ROEpeer)
x0a=find(Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i)& ~isnan(ROE));
x0b=find(Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i)& ~isnan(ROA));
if length(x0a)>2
ROEpeer(i)=median(ROE(x0a));
end
if length(x0b)>2
ROApeer(i)=median(ROE(x0b));
end
end
0 comentarios
Respuesta aceptada
Jan
el 5 de Jul. de 2017
Editada: Jan
el 5 de Jul. de 2017
Are you sure that ROEpeer and ROApeer is pre-allocated properly?
isnumROE = ~isnan(ROE);
isnumROA = ~isnan(ROA);
for i = 1:length(ROEpeer)
tmp = (Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i));
x0a = (tmp & isnumROE);
x0b = (tmp & isnumROA);
if sum(x0a) > 2
ROEpeer(i) = median(ROE(x0a));
end
if sum(x0b) > 2
ROEpeer(i) = median(ROE(x0b));
end
end
Try this at first. It avoid to calculate ~isnan(ROE) in each iteration and determines the caomparison of Year, Quarter, State and ID once only. Omitting the find() allows for a faster "logical indexing". Perhaps this runs in the half time. But what does the profiler tell you about the bottleneck of the code? Is it median? Then improving the loop will not be very successful.
For optimizing the run time, test data are useful. Otherwise it is just some guessing and avoiding the repeated calculation of the same results.
Another idea: if x0a has more then one element, the median is calculated multiple times for the different i. Does this work:
isnumROE = ~isnan(ROE);
isnumROA = ~isnan(ROA);
for i = 1:length(ROEpeer)
tmp = (Year==Year(i) & Quarter==Quarter(i) & State==State(i) & ID~=ID(i));
x0a = (tmp & isnumROE);
x0b = (tmp & isnumROA);
if sum(x0a) > 2 && isnan(ROEpeer(i))
ROEpeer(x0a) = median(ROE(x0a));
end
if sum(x0b) > 2 && isnan(ROApeer(i))
ROEpeer(x0b) = median(ROE(x0b));
end
end
Now all ROEpeer(x0b) are replaced by the median at once and not repeatedly.
How are Yearm Quarter, State and ID defined? Do neighboring elements have the same value usually or are the data mixed? Are the values sorted?
2 comentarios
Jan
el 6 de Jul. de 2017
Editada: Jan
el 6 de Jul. de 2017
Are you sure, that the 2nd method does not work? Imagine this:
i = 17;
x0a = [17, 18, 19];
if length(x0a) > 2 && isnan(ROEpeer(17)) % Both is TRUE
ROEpeer([17,18,19]) = median(ROEpeer([17,18,19]));
end
Now in the next iteration:
i = 18;
x0a = [17, 18, 19];
if length(x0a) > 2 && isnan(ROEpeer(17)) % TRUE && FALSE
% Now this is not calculated again, because ROEpeer(18)
% was set to the wanted value already!
ROEpeer([17,18,19]) = median(ROEpeer([17,18,19]));
end
The calculation of the expensive median is saved for i=18 and i=19 in his case. This could and should reduce the runtime massively.
If the combination of years, quarters, states and IDs contain longer runs or identical data, a run length encoding would reduce the runtime also. Then you do not have to check for each value, if the pattern changes, but you'd run a loop over the indices only, where the values changes.
I'm sure the function could be still improved, if you post some relevant input data.
Más respuestas (0)
Ver también
Categorías
Más información sobre Deployment, Integration, and Supported Hardware en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!