Incremental median across pages of a 3D array

1 visualización (últimos 30 días)
Matt J
Matt J el 25 de En. de 2021
Editada: Matt J el 28 de En. de 2021
I am trying to compute the inter-page median,
B=median(A,3)
of a 3D array A except that A is too large to be held in memory in its entirety and its pages A(:,:,k) occupy separate files on disk (EDIT: and I do not have the means to read in strict subchunks of a page). Is there an algorithm, and ideally also a Matlab implementation somewhere, that will compute B incrementally by looping over successive pages A(:,:,k), or chunks of pages?
  4 comentarios
Cris LaPierre
Cris LaPierre el 25 de En. de 2021
Looking as well. I see that reshape is supported for tall arrays, as is cat. How do you envision loading your data? Could you load each page as a tall array and use cat to turn it into a 3D array?
Matt J
Matt J el 28 de En. de 2021
Editada: Matt J el 28 de En. de 2021
That sounds like it would require representing each page as a datastore. I don't have the means to load in subsections of a page, e.g, individual rows, so I don't see how such a datastore could be set up.

Iniciar sesión para comentar.

Respuestas (2)

Gaurav Garg
Gaurav Garg el 28 de En. de 2021
Hi Matt,
You can compute the median of each column by converting the column into tall column and then calculating its median. You can repeat the step for each column (in your case).
T=tall(A(:,1))
m=median(T);
answer = gather(m);
Or, you can also convert the array into distributed array and then compute median (though the former solution might be more useful).
A=zeros(100000,3);
D = distributed(A);
e = median(D);
  1 comentario
Matt J
Matt J el 28 de En. de 2021
Editada: Matt J el 28 de En. de 2021
Hi Gaurav,
The idea is to obtain the median of A(i,j,:) for each fixed pair (i,j). What you propose seems to require converting the 3D array into a 2D array, permuted and reshaped somehow so that the pages are now columns. But I don't see how you would accomplish that since, as I said, the pages, A(:,:,k) canot be held simultaneously in RAM, nor are they available contiguously on disk.

Iniciar sesión para comentar.


Matt J
Matt J el 28 de En. de 2021
Editada: Matt J el 28 de En. de 2021
One solution would be to decimate and concatenate the pages, as below. If I choose a modest stride, it becomes possible to store the decimated pages simultaneously in RAM and take their median,
stride=5;
Asubsets=cell(1,numPages);
for i=1:stride
for j=1:stride
for k=1:numPages
temp=read(___); %read k-th page
Asubsets{k}=temp(i:stride:end,j:stride:end);
end
B(i:stride:end,j:stride:end)=median( cat(3,Asubsets{:}) , 3);
end
end
However, this approach requires stride^2 passes through the files and a lot of discarded page data in each pass. So, I was hoping for a method that could be done in only a single pass, eliminating the outer two loops.

Categorías

Más información sobre Data Type Conversion en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by