Gradient Calculation with fmincon and UseParallel
2 views (last 30 days)
I am trying to optimize a 6 variable function with bound constraints. The objective function is expensive to call (20 seconds to 3 minutes depending on variable values) and has no analytical derivative. It contains element-wise matrix operations, summations, matrix algebra, a while-loop, and a for-loop, which are sometimes within other functions.
When I enable the UseParallel option for fmincon the gradient calculation takes about 5.1 times longer than the function evaluation for each iteration. I have been using a parallel pool of 16 workers, so my impression was that the gradient calculation would take roughly as long as a function evaluation. Is fmincon is calculating the gradient sequentially, rather than assigning each gradient calculation to a separate core? If so, does anyone have suggestions for making sure the gradient is calculated in parallel?
I have explored the Parallel Computing and Optimization Toolbox documentation, but it is certainly possible I missed something.
Alan Weiss on 14 Apr 2022
This is not the expected performance. I would usually expect that parallel gradient estimation would give you quite a bit of speedup (not 6x, but I would usually expect well over 2x speedup).
But perhaps there are aspects of your objective function that cause difficulties for parallel evaluation. Do you have file reads or writes within the function? Persistent or global variables? Other things that are not MATLAB functions, but are calls to external software? Each of these things can cause difficulties for parallel computation.
MATLAB mathematical toolbox documentation
Matt J on 14 Apr 2022
Edited: Matt J on 14 Apr 2022
I am trying to optimize a 6 variable function with bound constraints....I have been using a parallel pool of 16 workers,
Because you only have 6 variables, the finite difference operations require only 7 function evaluations per gradient calculation. Therefore, anything more than 7-fold parallelization will not be a benefit to you (unless possibly you are using an algorithm other than the default interior-point method). I suggest reducing the pool size to 7, or maybe a bit lesss, so that each worker has more cores available to it.