Fast vector reshaping/permutation

Question

0 votos

I'm trying to optimize a very specific vector operation, namely taking a large (2^20 x 1) vector, reshaping it, permuting the indices, and reshaping once more. To be concrete, an example:

A  = rand(2^20,1);          % Large vector, with one dimension a power of 2
A  = A./norm(A);            % Normalize just for convenience
DR = 2^6;                   % DR,DL,DM are powers of 2 which multiply to form the size of A
DL = 2^6;
DM = 2^8;
tic;
B = reshape(permute(reshape(A,DR,DM,DL),[2,1,3]),DM,DR*DL);
toc;                        % On my machine this takes ~1.2 ms

The above operation is very simple, and entirely limited in speed by the permute step - as I understand it, permutation in matlab requires the entire array to be copied, losing time for the copy to be created and the transfer to occur. I am wondering if there is any clever way to get past this requirement for this specific use-case.

I have tried putting the operation of a gpu (by calling, for instance),

A = rand(2^20,1,'gpuArray')

Which does improve the runtime by a factor of ~4 but also hurts some other areas of my application. I have not yet tried to mexify the code, but would be interested if this seems a viable way to improve as well.

Edit from the comments: Ultimately this reshaped vector/matrix "B" is then multiplied by a Matrix (DM x DM), and then permuted/reshaped back into it's original form. If there is some fast way to combine all of those operations then that would of course be even more ideal.

Edit 2 for further context: As the answers/comments asked for more clarification of the overall use case, I will provide a toy model of a larger chunk of the code. Essentially this is the type of overall operation we are looking to do:

L = 20;
mid_size = 4;
DM = 2^mid_size;
A  = rand(2^L,1);          
A  = A./norm(A);            
Ms = rand(DM,DM,L-mid_size+1);
tic;
for left_size = 0:mid_size:(L-mid_size)
    right_size = L - mid_size - left_size;
    
    DR = 2^right_size;                   
    DL = 2^left_size;
    B = reshape(permute(reshape(A,DR,DM,DL),[2,1,3]),DM,DR*DL);
    B_prime = Ms(:,:,left_size+1) * B;
    A = permute(reshape(B_prime,DM,DR,DL),[2,1,3]);
    
    
end
A = reshape(A, 2^L, 1);
toc;

This is of course embedded in a larger program, but I think this is essentially an isolated "kernel"

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

James Tursa el 15 de Jun. de 2021

That's strange. I may have to experiment with some mex code to figure out what is going on with those permute( ) and pagetranspose( ) timings. I would have thought pagetranspose( ) would be optimized to be at least as fast as permute( ), but this is obviously not the case.

Adam Shaw el 15 de Jun. de 2021

Ultimately this reshaped vector is then multiplied by a Matrix (DM x DM), and then permuted/reshaped back into it's original form. If there is some fast way to combine all of those operations then that would of course be even more ideal.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Matt J el 17 de Jun. de 2021

Editada: Matt J el 17 de Jun. de 2021

Abrir en MATLAB Online

1 voto

Edit 2 for further context: ...Essentially this is the type of overall operation we are looking to do:

This will be more efficient:

L = 20;
mid_size = 4;
DM = 2^mid_size;
A  = rand(2^L,1);          
A  = A./norm(A);            
Ms = rand(DM,DM,L-mid_size+1);
Ms=permute(Ms,[2,1,3]);  %<--- pre-permute outside the loop
tic;
for left_size = 0:mid_size:(L-mid_size)
    
    right_size = L - mid_size - left_size;
    DR = 2^right_size;                   
    DL = 2^left_size;
    
    A= pagemtimes(  reshape(A,DR,DM,DL) ,  Ms(:,:,left_size+1));
end
A = reshape(A, 2^L, 1);
toc;

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Matt J el 15 de Jun. de 2021

1 voto

No, permute() will be the fastest way (on the CPU). How does the GPU hurt other areas of your application?

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Adam Shaw el 15 de Jun. de 2021

Maybe it's something I could sit down and fix, but for instance just the brute force approach of making "A" into a gpuArray causes the overall runtime of the program to be slower - though I admit I haven't dug enough into this to say exactly where/why.

Matt J el 15 de Jun. de 2021

You should be doing all your large computations, including the creation of A, on the GPU.

Iniciar sesión para comentar.

Answer 3

James Tursa el 15 de Jun. de 2021

Editada: James Tursa el 15 de Jun. de 2021

Abrir en MATLAB Online

1 voto

Don't do the permute( ) operation. Just use pagemtimes( ) downstream in your code with the appropriate 'transpose' option. This will cause the matrix multiply to use code that "virtually" transposes the matrix without actually physically forming it first.

https://www.mathworks.com/help/matlab/ref/pagemtimes.html?searchHighlight=pagemtimes&s_tid=srchtitle

E.g., something like this if I understand your dimensions:

result = reshape(pagemtimes(Matrix,'none',reshape(A,DR,DM,DL),'transpose'),DM,DR*DL);

I think pagemtimes( ) is multi-threaded and uses BLAS in the background so I doubt a mex routine could be written to beat this for speed.

10 comentarios
Mostrar 8 comentarios más antiguos Ocultar 8 comentarios más antiguos

James Tursa el 16 de Jun. de 2021

Well, I guess OP is going to have to weigh in on what he really wants. His original calculation only had one permute (for the 3D transpose operation) with the two reshapes going between vectors and 3D arrays.

Adam Shaw el 16 de Jun. de 2021

Thanks for the spirited discussion. I've added another edit to the original post with a toy model which is approximately my use case to try and give more context to the broader problem. I can clarify any part of it, but essentially the idea is you have to do this reshaping/permuting operation with multiple different tensor dimensions in sequence. I thought just the reshape(permute(reshape())) line would be enough to try and improve, but from your discussion it seems there are probably better ways to optimize the overall problem....

Iniciar sesión para comentar.

Fast vector reshaping/permutation

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Más respuestas (2)

2 comentarios
Mostrar Ninguno Ocultar Ninguno

10 comentarios
Mostrar 8 comentarios más antiguos Ocultar 8 comentarios más antiguos

Categorías

Productos

Versión

Etiquetas

Community Treasure Hunt

Fast vector reshaping/permutation

5 comentarios Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Más respuestas (2)

2 comentarios Mostrar Ninguno Ocultar Ninguno

10 comentarios Mostrar 8 comentarios más antiguos Ocultar 8 comentarios más antiguos

Categorías

Productos

Versión

Etiquetas

Ver también

Community Treasure Hunt

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

2 comentarios
Mostrar Ninguno Ocultar Ninguno

10 comentarios
Mostrar 8 comentarios más antiguos Ocultar 8 comentarios más antiguos