Multiple GPU setup slower than single GPU

3 visualizaciones (últimos 30 días)
arvid Martens
arvid Martens el 24 de Abr. de 2018
Comentada: Joss Knight el 17 de Mayo de 2018
For my research I have to perform a lot of repetition of the same optimization (for statistics). I already found out that my fitness function is way faster on the GPU and as such I am performing those calculations on the available GPUs. Fortunately, I have 3 GPUs at my disposal, I worked out a scheme where I open a parallel pool and using parfeval I assign each GPU to a different optimization.
When I checked the performance of the this setup, I noticed that the speed of a single GPU decreases a lot (by half) when it is used in the multiple GPU setup (3 workers) compared to a single GPU setup (1 worker).
I rechecked the implementations and saw no signs that data has to be sent from one GPU to the other so they never have to be synchronized.
Solutions I have tried: - Make a fitness function mfile for each GPU (did not work) - Open a matlab instance for each GPU separately (did not work)
Suggestions on this problem are appreciated?
  9 comentarios
arvid Martens
arvid Martens el 14 de Mayo de 2018
DP precision of the quadro as at the same level of the tesla. it is only in the newer architectures (maxwell, pascal) that DP level of quadros is low compared to tesla.
My problem actually occurs when the GPU are working independently, so three seperate matlab sessions with the varaible T loaded. Then if I perform the ifft on a single GPU the percentage utilization is at a stable 60% (titan V), however when a second operation is started on another matlab instance with a diffrent GPU the percentage drops (and fluctuates) of the first GPU. The second GPU also fluctuates and performance of both of the GPUs has dropped.
In my current model I have circumvented the problem by limiting the amount of data in the variable T. I noticed that if the amount of the data is below a treshold the problem does not occur and GPU utilization is at a stable 95%-100% on all three of them. Above the treshold the utilization starts to fluctuates and calculation time increases.
Joss Knight
Joss Knight el 17 de Mayo de 2018
You're right, sorry (about the double precision performance).
I wouldn't put too much stock in the Utilization measure, it is only weakly linked to performance. Much better would be to look at how long it is taking to run your code.
The only thing I can think of is that you are being limited by shared system resources. All three processes are sharing the PCI bus and system memory - perhaps there is a lot of data transfer. Or perhaps you are doing some large computations on the CPU that use all your cores? Even some GPU functions do that because they are hybrid algorithms (e.g. mldivide, eig, chol etc). Waiting for the CPU would slow the rate at which kernels are being launched on the GPU.
If you are running on Linux it would be interesting to see whether you can get any benefit out of using the Multi-Process Service.

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Más información sobre Parallel and Cloud en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by