# How can I vectorize the nested loops in my code to improve performance?

2 views (last 30 days)
Tianhuojian on 2 Oct 2021
Commented: Tianhuojian on 3 Oct 2021
Hello,
I can't figure out how to vectorize my set of nested loops. How can I improve the speed of my code through vectorization?
Also, is there a faster way to check if a value is currently in the array than ismember? ismember is a significant bottleneck to the program. Any other noticeable optimizations or suggestions would be greatly appreciated too.
The script calculates prime-limit ratios within the defined limits. The defined limits in the example code only take a few seconds to run. As the prime limit increases it takes exponentially more time to calculate, since it must multiply the current prime number by every numerator and denominator previously sent to the array.
This is the code:
%define limits
primeLimit = 11;
integerLimit = intmax('int32');
octaveRatio = 2;
printEachLimit = 0;
sortType = "ƒ Ratio";
%set ratio counter
arrayCounter = 2;
%initialize arrays
primeOptionArray = primes(primeLimit);
primeLimitArray = ;
numeArray = ;
denoArray = ;
fRatioArray = ;
centsArray = ;
tenneyHeightArray = ;
%loop through each prime
for i = 1:length(primeOptionArray)
tic;
currentPrime = primeOptionArray(i); %get current prime
disp(currentPrime+"-limit Calculating...");
progressArrayLength = arrayCounter; %set length based on previous array size, used to calculate progress
%loop through all ratios in array, multiply current prime
for ii = 1:arrayCounter
nume = numeArray(ii); %get numerator
deno = denoArray(ii); %get denominator
%multiply numerator by prime while within limit
while nume <= integerLimit
nume = nume*currentPrime;
%multiply denominator by 2 until ratio is within defined octave
while nume/deno > octaveRatio
deno = deno*2;
end
fRatio = nume/deno; %frequency ratio
if fRatio > 1 && fRatio <= octaveRatio
%reduce ratio to simplest form before checking limit again
ratioGCD = gcd(nume,deno);
reducedNume = nume/ratioGCD;
reducedDeno = deno/ratioGCD;
%check ratio is within limit and if has already been found, then push to arrays
if reducedNume <= integerLimit
if ismember(fRatio,fRatioArray) == 0
primeLimitArray(arrayCounter) = currentPrime;
numeArray(arrayCounter) = reducedNume;
denoArray(arrayCounter) = reducedDeno;
fRatioArray(arrayCounter) = fRatio;
centsArray(arrayCounter) = 1200*log2(fRatio);
tenneyHeightArray(arrayCounter) = log2(reducedNume*reducedDeno);
nume = reducedNume;
deno = reducedDeno;
arrayCounter = arrayCounter+1; %count up 1 since ratio was found and added to arrays
end
end
end
end
%perform inverse calculation
nume = numeArray(ii); %get numerator
deno = denoArray(ii); %get denominator
%multiply numerator by 2 while within limit
while nume <= integerLimit
nume = nume*2;
%multiply denominator by prime until ratio is within defined octave
while nume/deno > octaveRatio
deno = deno*currentPrime;
end
fRatio = nume/deno; %frequency ratio
if fRatio > 1 && fRatio <= octaveRatio
%reduce ratio to simplest form before checking limit again
ratioGCD = gcd(nume,deno);
reducedNume = nume/ratioGCD;
reducedDeno = deno/ratioGCD;
%check ratio is within limit and if has already been found, then push to arrays
if reducedNume <= integerLimit
if ismember(fRatio,fRatioArray) == 0
primeLimitArray(arrayCounter) = currentPrime;
numeArray(arrayCounter) = reducedNume;
denoArray(arrayCounter) = reducedDeno;
fRatioArray(arrayCounter) = fRatio;
centsArray(arrayCounter) = 1200*log2(fRatio);
tenneyHeightArray(arrayCounter) = log2(reducedNume*reducedDeno);
nume = reducedNume;
deno = reducedDeno;
arrayCounter = arrayCounter+1; %count up 1 since ratio was found and added to arrays
end
end
end
end
%update progress...
if currentPrime > 2
if (ii == round(progressArrayLength*0.25))
elapsedTime = toc;
disp("25% "+elapsedTime+"s");
elseif (ii == round(progressArrayLength*0.5))
elapsedTime = toc;
disp("50% "+elapsedTime+"s");
elseif (ii == round(progressArrayLength*0.75))
elapsedTime = toc;
disp("75% "+elapsedTime+"s");
elseif (ii == round(progressArrayLength))
elapsedTime = toc;
disp("100% "+elapsedTime+"s");
end
end
end
%save file for each prime limit, containing all prior data
if currentPrime > 2 && printEachLimit == 1
elapsedTime = toc;
disp("Writing to CSV File...");
dataMatrix = [numeArray;denoArray;fRatioArray;centsArray;tenneyHeightArray;primeLimitArray]; %create matrix
dataMatrix = dataMatrix.'; %transpose rows & columns
if sortType == "ƒ Ratio"
dataMatrix = sortrows(dataMatrix,3);
elseif sortType == "Tenney Height"
dataMatrix = sortrows(dataMatrix,5);
end
dataMatrix = array2table(dataMatrix);
dataMatrix.Properties.VariableNames(1:6) = {'Numerator','Denominator','ƒ Ratio','Cents','Tenney Height','Prime Limit'};
fileTitle = [num2str(length(primeLimitArray)),' Ratios ',num2str(currentPrime),'-limit ',num2str(integerLimit),'-int-limit ',num2str(round(elapsedTime)),'s.csv'];
writetable(dataMatrix,fileTitle);
end
end
%save one file containing all data found
if printEachLimit == 0
elapsedTime = toc;
disp("Writing to CSV File...");
dataMatrix = [numeArray;denoArray;fRatioArray;centsArray;tenneyHeightArray;primeLimitArray]; %create matrix
dataMatrix = dataMatrix.'; %transpose rows & columns
if sortType == "ƒ Ratio"
dataMatrix = sortrows(dataMatrix,3);
elseif sortType == "Tenney Height"
dataMatrix = sortrows(dataMatrix,5);
end
dataMatrix = array2table(dataMatrix);
dataMatrix.Properties.VariableNames(1:6) = {'Numerator','Denominator','ƒ Ratio','Cents','Tenney Height','Prime Limit'};
fileTitle = [num2str(length(primeLimitArray)),' Ratios ',num2str(primeOptionArray(end)),'-limit ',num2str(integerLimit),'-int-limit ',num2str(round(elapsedTime)),'s.csv'];
writetable(dataMatrix,fileTitle);
end
elapsedTime = toc;
disp("Complete. "+length(primeLimitArray)+" Ratios Found. "+elapsedTime+"s");
Note: "gcdFast" reduces the runtime by half, but has been changed to the default gcd for this example to remove dependencies.

Jan on 2 Oct 2021
Edited: Jan on 2 Oct 2021
The editor shows you 12 valuable hints already: Letting an array grow iteratively is very expensive. See this example:
x = [];
for k = 1:1e6
x(k) = rand;
end
The final array x need 8 MB only (8 bytes per element). But Matlab has to create a new vector in each iteration and copy the former contents to it. Theredfore Matlab has to request sum(1:1e6)*8 Bytes and copy sum(1:1e6-1)*8 Bytes. This is more than 4 Terabyte! Of course, this slows doen the procesing massively.
The solution is easy: Pre-allocation:
x = zeros(1, 1e6);
for k = 1:1e6
x(k) = rand;
end
Now Matlab allocates the final array with 8MB only and no further allocations or copies are required.
If you do not know the final size of your array, allocate the maximum posible size, if this matchs in your RAM and crop the array afterwards.
Another option is to allocate in chunks:
chunk = 1e3;
ix = 0;
nx = 0;
x = [];
for k = 1:1e6 * rand
ix = ix + 1;
if ix > nx
nx = nx + chunk;
x(nx) = 0; % Expands the array implicitly
end
x(ix) = rand;
end
x(ix + 1:nx) = []; % Crop unused memory
Prefer larger chunk sizes.
while nume/deno > octaveRatio
deno = deno*2;
end
while nume/deno > octaveRatio
deno = deno*currentPrime;
end
Seriously? There ius no need to do this in a loop.
This has a lot of overhead:
if ismember(fRatio,fRatioArray) == 0
Faster and nicer:
if ~any(fRatio == fRatioArray)
With replacing ismember by any(a==b), the code runs in 7.5 instead of 8.5 seconds in my Matlab R2018b.
Now use the profiler to find the bottleneck: gcd(). You can create a much faster tool omitting the smart features as array input, special treatment of integer types and so on:
function u = myGCD(a, b)
u = max(a, b);
v = min(a, b);
while v > 0
t = rem(u, v);
u = v;
v = t;
end
end
Using this instead of of Matlab's gcd() reduces the tun time to 2.4 seconds.
I do not understand the pile of tocs. You measure the time from the last start of the first loop, where tic is called. Is this really the correct timing for the output file?
Tianhuojian on 3 Oct 2021
Hello Jan,
Thank you so much for your detailed answer and help. I have implemented your suggestions into my code, which has improved the runtime and overhead.
I had tried basic preallocation before, but it seemed to make my code slower since it required allocating such a high value? I'm not sure. Allocating in chunks works best for my needs it seems, wonderful suggestion!
Your GCD function is even faster than a Euclidean algorithm for simplification that I was using previously.
The tic and tocs are currently configured to test how long each limit takes, not the total time elapsed, this will be changed.

R2021b

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!