How expanding 2 columns matrix (with each row having edges of a specific range) into a vector without for loop?

1 visualización (últimos 30 días)
Hi
I have a two column and 4 rows (4X2)matrix. Each row contains the begining (value in 1st column) and the end (value in 2nd column) of a spesific range of indexes a data file. The ranges length are NOT the same. I would like to add all the indexes that fall within each range (row) and add all the indexes in one vector to be used later for data manipulation. One way to do it is using a for-loop that goes through each row and create the indexes in between and save them in a new vector that contain all indexes as in the example below:
% Each row in the matrix rangeEdges contains the begining & end of a spesific range of indexes
rangeEdges = [1,10; 20,25; 40,60; 65,71]
for ixi=1: size(rangeEdges,1)
allIndexes = [allIndexes,[rangeEdges(ixi,1):1:rangeEdges(ixi,2)]];
end
The code above will results into the vector allIndexes below:
allIndexes =
Columns 1 through 15
1 2 3 4 5 6 7 8 9 10 20 21 22 23 24
Columns 16 through 30
25 40 41 42 43 44 45 46 47 48 49 50 51 52 53
Columns 31 through 44
54 55 56 57 58 59 60 65 66 67 68 69 70 71
This would works fine if I do not have too many rows (ranges). But in my case, the matrix rangeEdges sometims has millions of rows and the for-loop will need extremely long time to execute (hours). The previous process is to do operations on only one data file, and I have thousands of data files. So it is practically impossible to do it with for-loop.
My question: is there a way to do the above mentioned process and get vector allIndexes without using a for-loop?
Thanks!

Respuesta aceptada

Athul Prakash
Athul Prakash el 27 de En. de 2020
Hey,
Here's something I came up with, you may try it out to determine if it works with your data.
It's not very straightforward, but the idea is to use 'cumsum' (doc is linked below) to come up with a vector of logical indices, which should be suitable for you, even though you have asked for vector indices.
tempVar = zeros(rangeEdges(end,2), 1, 'uint8'); % create a temp vector of length = last index in rangeEdges
tempVar(rangeEdges(:,1))=1; % set 1 for all the start points of the ranges.
tempVar(rangeEdges(:,2)+1)=-1; %set -1 for all the end points of the ranges (offset by 1 because that's how cumsum works)
% Now temp is a vector of 1's and -1's at the start and end points of our required ranges.
allIndices = cumsum(tempVar); % creates 1's in the required ranges, 0's everywhere else.
allIndices = logical(allIndices); % converts the type to logical, so that we can plug this in for indexing.
Hope it helps.
  1 comentario
M.Abuasbeh
M.Abuasbeh el 29 de En. de 2020
Thanks a lot Athul, it works very well!
It reduced the time need by few orders of magnitude:)
Just one comment, I needed to delete the ('uint8') in tempVar definition because it won't accept negative valuses. I made a quick comparision between the methods, see the code below:
rangesSize=[100000, 500000, 1000000, 5000000, 10000000]; % to test few ranges sizes
for ixi1=1:numel(rangesSize)
XX = 10:1000:rangesSize(ixi1); % starting edge of each range
YY = XX+700; % ending edge of each range
rangeEdges=[XX',YY']; % each rwo has two columns (1st col: startEdge ,2nd col: EndEdge
%% Using cumsum instead (without using for loop)
tic
% create a temp vector of length = last index in rangeEdges
tempVar = zeros(rangeEdges(end,2), 1);
% I had to delete 'uint8' in tempVar becuase it won't accept -1 values
% set 1 for all the start points of the ranges.
tempVar(rangeEdges(:,1))=1;
%set -1 for all the end points of the ranges (offset by 1 because that's how cumsum works)
tempVar(rangeEdges(:,2)+1)=-1;
% Now temp is a vector of 1's and -1's at the start and end points of our required ranges.
% creates 1's in the required ranges, 0's everywhere else.
allIndices = cumsum(tempVar);
% converts the type to logical, so that we can plug this in for indexing.
allIndices = logical(allIndices);
t2(ixi1)=toc % t2: time (in sec) to get all the needed indecies using cumsum
%% TO COMPARE WITH for-loop
%% Using for loop
clear allMisRangesIndexes
allMisRangesIndexes =[]; % define an empty vector to add all indecies in it using for loop
tic
for ixi4=1: size(rangeEdges,1)
allMisRangesIndexes = [allMisRangesIndexes,...
[rangeEdges(ixi4,1):1:rangeEdges(ixi4,2)]];
end
t1(ixi1)=toc % time (in sec) to get all the needed indecies using for loop
loopCounts(ixi1)=ixi4; % to plot comparison later
end
%%
Both allMisRangesIndexes & allIndices would serve the same purpose but time needed using for-loop increases drastically (const. * x^2.3) when the number of for-loops increases (more than 500)
%% Plot (log scaled Y axis) to compare time need to do both methods
figure (1)
plot(loopCounts,t1,':.r',loopCounts,t2,':.b','LineWidth',1.4)
set(gca, 'YScale', 'log') % make Y axis in log scale
title('For-Loop vs cumsum','FontSize',24, 'FontWeight', 'bold')
legend({'for-loop','cumsum'},'FontSize',12, 'FontWeight', 'bold')
ylabel('Time in Seconds', 'FontSize',16, 'FontWeight', 'bold')
xlabel('Number of for-loops (or Ranges)', 'FontSize',16, 'FontWeight', 'bold')
set(gca,'FontSize',12,'FontWeight', 'bold')
Then you get the plot below:
for_loop vs cumsum_logical array.jpg
Thanks a lot again!

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Loops and Conditional Statements en Help Center y File Exchange.

Productos


Versión

R2016a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by