Efficient overlapping block processing with nested sub blocks (similar to im2col with stride)

Question

0 votos

Hello, I recently have been doing a side project to teach myself some more efficient and performance optimized MATLAB and I ran into the following problem. My solution feels "ugly" to me and I highly suspect that there is a better version. I am intentionally straying from builtin functions like im2col and blockproc but I definitely am not asking you to restrict your solutions especially if they are high performance.

Problem:

An example of the input to the problem is generated by the following code. It consists of square matrix with nxn blocks. I use ascending integers to make it easier to keep track of the format of the data.

% example input matrix A with nxn blocks 
n = 2; 
N = 2*n^2;
A = kron((1:n^2).',ones(1,2*N)) + (1:n^2:N^2) - 1;
[row,col] = ind2sub([n n],1:n^2);
A((row-1).*n+col,:) = A((col-1).*n+row,:);
A = col2im(A, [n n], [N N], 'distinct').'
A = 8×8
     1     3     5     7     9    11    13    15
     2     4     6     8    10    12    14    16
    17    19    21    23    25    27    29    31
    18    20    22    24    26    28    30    32
    33    35    37    39    41    43    45    47
    34    36    38    40    42    44    46    48
    49    51    53    55    57    59    61    63
    50    52    54    56    58    60    62    64
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

The desired output from the input is to take a sliding block of 2n x 2n with a stride of n in both the x and y directions. This generates a block with 4 subblocks of size n x n. I would like to generate a similar result of im2col, but with the caveat that the column result keeps the individual sub blocks together (as if I had taken im2col of each of the sub blocks and then stacked them into a single column). Normally im2col scans the matrix top to bottom and left to right, but the final desired requirement is that the columns come from a sweep from left to right and then top to bottom.

My solution is:

% intermediate matrix B with 2n patches with n stride
M = length(A(:,1)) + n;
A(end+1:M,end+1:M) = 0;
[x,y] = meshgrid(1:2*n);
[xs,ys] = meshgrid(0:n:(M-2*n));
idx = (reshape(y,4*n^2,[]) + ys(:)') + (M*reshape(x-1,4*n^2,[]) + M*xs(:)');
colB = A(idx);
% these 2 lines not part of my solution but may help visualize the
% intermediate step of expanding the matrix
szB = sqrt(numel(colB));
B = col2im(colB, [2*n 2*n], [szB szB], 'distinct');
% reshape and permute chain for desired format and ordering
C = reshape(colB,n,2,[]);
C = permute(C,[1 3 2]);
C = reshape(C, 4*n^2, n, []);
C = permute(C, [1 3 2]);
C = reshape(C, 4*n^2,[]) % final desired output
C = 16×16
     1     5     9    13    17    21    25    29    33    37    41    45    49    53    57    61
     2     6    10    14    18    22    26    30    34    38    42    46    50    54    58    62
     3     7    11    15    19    23    27    31    35    39    43    47    51    55    59    63
     4     8    12    16    20    24    28    32    36    40    44    48    52    56    60    64
     5     9    13     0    21    25    29     0    37    41    45     0    53    57    61     0
     6    10    14     0    22    26    30     0    38    42    46     0    54    58    62     0
     7    11    15     0    23    27    31     0    39    43    47     0    55    59    63     0
     8    12    16     0    24    28    32     0    40    44    48     0    56    60    64     0
    17    21    25    29    33    37    41    45    49    53    57    61     0     0     0     0
    18    22    26    30    34    38    42    46    50    54    58    62     0     0     0     0
    19    23    27    31    35    39    43    47    51    55    59    63     0     0     0     0
    20    24    28    32    36    40    44    48    52    56    60    64     0     0     0     0
    21    25    29     0    37    41    45     0    53    57    61     0     0     0     0     0
    22    26    30     0    38    42    46     0    54    58    62     0     0     0     0     0
    23    27    31     0    39    43    47     0    55    59    63     0     0     0     0     0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

The final result is good for general n and the solution runs decently fast. However it just looks like there is still performance to squeeze out of this. I feel like there is a more direct method than constructing the intermediate matrix and then decomposing it. I also feel like there is a better reordering process than my reshape->permute->reshape->permute->reshape. After too many hours trying I am forced to concede but hopefully someone here can teach me something new. Thanks in advance.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Matthew el 23 de Mzo. de 2026

Accidentally put "non" in the title of the post, to be clear the 2n x 2n blocks are overlapping.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Stephen23 el 23 de Mzo. de 2026

Editada: Stephen23 el 23 de Mzo. de 2026

Abrir en MATLAB Online

2 votos

Your reshape/permute chain is working backwards: you're extracting in the "wrong" order and then correcting it. The cleaner approach is to build an index array that maps directly to the desired output layout in one shot, eliminating the intermediate B matrix and the entire reshape chain. The linear index of each output element is the sum of a local offset (position within a window, in subblock order) and a global offset (window position).

First lets create your example matrix in a more direct way:

n = 2; 
N = 2*n^2;
nw = N/n;
A = reshape(permute(reshape(1:N^2, n,n, nw,nw), [1,4,2,3]), N,N)
A = 8×8
     1     3     5     7     9    11    13    15
     2     4     6     8    10    12    14    16
    17    19    21    23    25    27    29    31
    18    20    22    24    26    28    30    32
    33    35    37    39    41    43    45    47
    34    36    38    40    42    44    46    48
    49    51    53    55    57    59    61    63
    50    52    54    56    58    60    62    64
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Now determine the required indices:

% Pad A just once:
Nn = N + n; % padded dimension
Ap = zeros(Nn,Nn,class(A));
Ap(1:N,1:N) = A;
% Local offsets within a 2n×2n window, laid out in subblock order:
% Column-major indices within one n×n subblock
rb  = kron(ones(n,1), (0:n-1).'); % n^2 × 1  row within subblock
cb  = kron((0:n-1).', ones(n,1)); % n^2 × 1  col within subblock
% Four subblocks: (TL, TR, BL, BR) -> (sr,sc) = (0,0),(0,1),(1,0),(1,1)
sr4 = [0;0;1;1];
sc4 = [0;1;0;1];
row_off = kron(n*sr4, ones(n^2,1)) + repmat(rb,4,1); % 4n^2 × 1
col_off = kron(n*sc4, ones(n^2,1)) + repmat(cb,4,1); % 4n^2 × 1
% Window positions: wx (column) varies fast -> left-to-right, top-to-bottom
[wx,wy] = ndgrid(0:nw-1, 0:nw-1); % wx fast, wy slow
row_win = n * wy(:).';            % 1 × nw^2
col_win = n * wx(:).';            % 1 × nw^2
% Single index expression via broadcast outer-sum
idx = (row_off+row_win) + Nn*(col_off+col_win) + 1; % 4n^2 × nw^2
C = Ap(idx)
C = 16×16
     1     5     9    13    17    21    25    29    33    37    41    45    49    53    57    61
     2     6    10    14    18    22    26    30    34    38    42    46    50    54    58    62
     3     7    11    15    19    23    27    31    35    39    43    47    51    55    59    63
     4     8    12    16    20    24    28    32    36    40    44    48    52    56    60    64
     5     9    13     0    21    25    29     0    37    41    45     0    53    57    61     0
     6    10    14     0    22    26    30     0    38    42    46     0    54    58    62     0
     7    11    15     0    23    27    31     0    39    43    47     0    55    59    63     0
     8    12    16     0    24    28    32     0    40    44    48     0    56    60    64     0
    17    21    25    29    33    37    41    45    49    53    57    61     0     0     0     0
    18    22    26    30    34    38    42    46    50    54    58    62     0     0     0     0
    19    23    27    31    35    39    43    47    51    55    59    63     0     0     0     0
    20    24    28    32    36    40    44    48    52    56    60    64     0     0     0     0
    21    25    29     0    37    41    45     0    53    57    61     0     0     0     0     0
    22    26    30     0    38    42    46     0    54    58    62     0     0     0     0     0
    23    27    31     0    39    43    47     0    55    59    63     0     0     0     0     0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Stephen23 el 23 de Mzo. de 2026

Editada: Stephen23 el 23 de Mzo. de 2026

Abrir en MATLAB Online

"With 3 dimensions it is easy for me to visualize and predict the output of my permutation but with more than 3 my intuition is muted"

Attributed to John von Neumann: "In mathematics you don't understand things. You just get used to them". Intuition only takes you so far, I don't think of arrays as having a physical representation (although they can). For me using PERMUTE is mostly about 1) keeping track of the parts/features of the array that you are trying to keep together, 2) keeping track of the dimension requirements (e.g. input order, output order, etc). Keeping a firm grasp on the order of data in memory is critical, because a lot of times (as used here) PERMUTE is just setting things up for RESHAPE.

So to answer your question "how you came up with this permutation?": 1) RESHAPE into useful blocks, 2) Identify/confirm the feaures/groups/whatever, 3) PERMUTE the data in those blocks to get them into the required memory order for the next step 4) RESHAPE into the ouput array.

n = 2; 
nw = 2; 
N = n*nw;  % N = 4, N^2 = 16
V = 1:N^2
V = 1×16
     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
R = reshape(V, n,n, nw,nw) % RESHAPE into useful blocks
R = 
R(:,:,1,1) =

     1     3
     2     4


R(:,:,2,1) =

     5     7
     6     8


R(:,:,1,2) =

     9    11
    10    12


R(:,:,2,2) =

    13    15
    14    16

We can use these blocks to map the required features to the dimensions: for the 1st column 1,2 are already along dim1, the next required elements 9,10 mean we require dim4. For the 2nd column we need 3,4 which requires dim2. That only leaves dim3 which goes at the end.

B = permute(R, [1,4,2,3]) % PERMUTE the blocks so that they suit memory order
B = 
B(:,:,1,1) =

     1     9
     2    10

B(:,:,2,1) =

     3    11
     4    12

B(:,:,1,2) =

     5    13
     6    14

B(:,:,2,2) =

     7    15
     8    16

Note after PERMUTE we can already see the data is in the required order in memory: the elements are shown in order 1, 2, 9, 10, 3, 4, 11, 12, ... etc.

C = reshape(B, N,N) % RESHAPE to the required output size
C = 4×4
     1     3     5     7
     2     4     6     8
     9    11    13    15
    10    12    14    16
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Experiment! Most likely it won't be correct on the first attempt.

Try smaller examples: my tip is to use dimensions that do not have common divisors.

Matthew el 23 de Mzo. de 2026

Thanks for your insightful responses, I've learned a quite a bit here

Iniciar sesión para comentar.

Answer 2

Matt J el 24 de Mzo. de 2026

Editada: Matt J el 24 de Mzo. de 2026

Abrir en MATLAB Online

1 voto

If the reordering is something you want to repeat on multiple matrices of the same size and block structure, then you can use your current method once, and recycle some intermediate results to make subsequent processing faster. This generalizes very easily to other dimensions and other kinds of reorderings:

clc,
n = 2; 
N = 2*n^2;
A = kron((1:n^2).',ones(1,2*N)) + (1:n^2:N^2) - 1;
[row,col] = ind2sub([n n],1:n^2);
A((row-1).*n+col,:) = A((col-1).*n+row,:);
A = col2im(A, [n n], [N N], 'distinct').'*10
A = 8×8
    10    30    50    70    90   110   130   150
    20    40    60    80   100   120   140   160
   170   190   210   230   250   270   290   310
   180   200   220   240   260   280   300   320
   330   350   370   390   410   430   450   470
   340   360   380   400   420   440   460   480
   490   510   530   550   570   590   610   630
   500   520   540   560   580   600   620   640
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
tic;  %pre-computation
[tmp,mask,idx]=Reordering(size(A),n);
toc;
Elapsed time is 0.003690 seconds.
tic;  %subsequent passes
C=tmp;
C(mask)=A(idx);
toc;
Elapsed time is 0.001299 seconds.
C
C = 16×16
    10    50    90   130   170   210   250   290   330   370   410   450   490   530   570   610
    20    60   100   140   180   220   260   300   340   380   420   460   500   540   580   620
    30    70   110   150   190   230   270   310   350   390   430   470   510   550   590   630
    40    80   120   160   200   240   280   320   360   400   440   480   520   560   600   640
    50    90   130     0   210   250   290     0   370   410   450     0   530   570   610     0
    60   100   140     0   220   260   300     0   380   420   460     0   540   580   620     0
    70   110   150     0   230   270   310     0   390   430   470     0   550   590   630     0
    80   120   160     0   240   280   320     0   400   440   480     0   560   600   640     0
   170   210   250   290   330   370   410   450   490   530   570   610     0     0     0     0
   180   220   260   300   340   380   420   460   500   540   580   620     0     0     0     0
   190   230   270   310   350   390   430   470   510   550   590   630     0     0     0     0
   200   240   280   320   360   400   440   480   520   560   600   640     0     0     0     0
   210   250   290     0   370   410   450     0   530   570   610     0     0     0     0     0
   220   260   300     0   380   420   460     0   540   580   620     0     0     0     0     0
   230   270   310     0   390   430   470     0   550   590   630     0     0     0     0     0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
function [C,mask,idx]=Reordering(dims,n)
    A=reshape(1:prod(dims),dims);
    M = length(A(:,1)) + n;
    A(end+1:M,end+1:M) = 0;
    [x,y] = meshgrid(1:2*n);
    [xs,ys] = meshgrid(0:n:(M-2*n));
    idx = (reshape(y,4*n^2,[]) + ys(:)') + (M*reshape(x-1,4*n^2,[]) + M*xs(:)');
    colB = A(idx);
    % reshape and permute chain for desired format and ordering
    C = reshape(colB,n,2,[]);
    C = permute(C,[1 3 2]);
    C = reshape(C, 4*n^2, n, []);
    C = permute(C, [1 3 2]);
    C = reshape(C, 4*n^2,[]); % final desired output
    mask=logical(C);
    idx=C(mask);
end

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Matthew el 25 de Mzo. de 2026

Interesting, the original impetus of this block rearrangement was the realization that a couple for loops could be replaced by a precomputation of all these vectors. Constant memory calls into the original matrix was slowing down the process. So in my use case, this subblock processing is a single computation step that is not repeated but I definitely see the value in repeatability of the process.

The matrix A in my code is on the order of about 1500x1500 so with n=5 the colB array can have quite a large number of columns. One saving grace is that I only need the uppertriangular portion of the result so I can remove a lot of column indices.

Also I should thank you further as I remember digging through some of your block manipulation codes on the FEX prior to posting this question orginally.

Iniciar sesión para comentar.

Efficient overlapping block processing with nested sub blocks (similar to im2col with stride)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuesta aceptada

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Versión

Etiquetas

Community Treasure Hunt

Efficient overlapping block processing with nested sub blocks (similar to im2col with stride)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Respuesta aceptada

3 comentarios Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Productos

Versión

Etiquetas

Ver también

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos