GPU and CPU Parallelization and Bicg Optimization
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Zulkuf Azizoglu
el 11 de Oct. de 2022
Comentada: Zulkuf Azizoglu
el 26 de Jun. de 2023
I use a matlab script to solve a big matrix using bicg function. Simply my code looks something like this:
for i=1:n
...
[Pvect] = bicg(AS, BS, tol, maxit,L,U); %where AS, BS, L, and U are different in each loop
%AS is a 10^6x10^6 sparse complex double
%BS=10^6x1 is a sparse complex double
%L&U are 10^6x10^6 sparse sparse complex double
...
end
Every for loop is independent. I recently parallelized this script by using parfor. The computer I use has 128 CPU cores, but I noticed that using parpool(anything more than 32) the local workers are exhausted (i.e., the code run time does not decrease significantly). However, I usually use n=32 (i.e., run the for script for 32 different scenarios), so this is not a big issue for me. The code currently looks something like this:
parpool(32)
parfor i=1:n
...
[Pvect] = bicg(AS, BS, tol, maxit,L,U); %where AS, BS, L, and U are different in each loop
%AS is a 10^6x10^6 sparse complex double
%BS=10^6x1 is a sparse complex double
%L&U are 10^6x10^6 sparse sparse complex double
...
end
I want to further speed up the code using gpuArray (which is supported on bicg). The main reason for that I also use another script where I run the bicg function sequentially many times. So in that case n is 1, but running it many times makes it computationally expensive. However, if possible, I also want to use gpuArrays for cases where n is 32 or more (i.e., the code described above).
I checked the documentation and other user questions, however, I am a little lost on how to utilize cpu and gpu power concurrently. The computer I use has 3 GPU's that I can utilize.
- Should I try to use only the GPUs for both the parfor loop and solution of bicg?
- Or should I run the parfoor loop with CPU power and use all the GPUs for solution of bicg? If so how can do this? As far as I understood, GPU resources will be distributed to each worker in this case.
- Or what would be your suggestion on doing this properly? Thank you very much for any kind of guidance in advance!
The computer that I use is the following (I can also try to use 2 of these computers/nodes in the future. Do you think that would help with any of the scenarios described above?):
GPU: 3x NVIDIA A100 PCIE 40GB
(1 per socket )
gpu0: socket 0
gpu1: socket1
gpu2: socket1
GPU Memory: 40 GB HBM2
CPU: 2x AMD EPYC 7763 64-Core Processor ("Milan")
Total cores per node: 128 cores on two sockets (64 cores / socket )
Hardware threads per core: 1 per core
Hardware threads per node: 128 x 1 = 128
Clock rate: 2.45 GHz
RAM: 256 GB
Cache: 32KB L1 data cache per core
512KB L2 per core
32 MB L3 per core complex
(1 core complex contains 8 cores)
256 MB L3 total (8 core complexes )
Each socket can cache up to 288 MB
(sum of L2 and L3 capacity)
Local storage: 144GB /tmp partition on a 288GB SSD.
0 comentarios
Respuesta aceptada
Alvaro
el 26 de En. de 2023
You cannot run a parfor loop in a GPU, but you can have each worker access a GPU to perform computations.
The documentation shows how to assign GPUs to workers but whether each worker needs its own GPU is not straightforward.
To have bicg use the resources of GPU simply pass the arguments as gpuArray.
Note that there are some limitations.
Más respuestas (0)
Ver también
Categorías
Más información sobre GPU Computing en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!