Cannot replicate results of nonlinear optimization w/ nonlinear constraints from replication files

I cannot seem to replicate simulation results generated by my MATLAB code despite using same code, functions, starting values, random number generator, etc. Since I am literally using the same code/conditions used to generate the code the first time, I have omitted the code itself.
General description of code used: I simulate N nonlinear optimization problems with nonlinear constraints using fmincon, parallelized over my machine’s cores. I iterate over these simulations until a fixed point is reached. I am using fmincon’s SQP solver with centralized finite differences and have provided the analytical gradients for both my objective and constraint functions. Generally, CheckGradients does not fail – will get one failed for a large number of passed CheckGradients, presumably due to some scaling issue for a particular simulation.
Description of problem: I ran five experiments in the latter months of 2021: 3 in September on MATLAB 2018A, and 2 in December on MATLAB 2021B. All five experiments generated results that were generally consistent with one another.
In January 2022, I attempted to replicate all three of the September experiments using code archived in October for the express purpose of replicating the September results (there were no changes to the code between September and October); in particular, when I attempt to simply re-run the last iteration of the fixed-point algorithm – i.e., the iteration in which the code should terminate – the code does not terminate. In fact, in examining the “replicate” simulations, I see results that are very, very different than my original results.
Final context: I believe I installed the following updates in between the last time my code was generating consistent results and the point at which things started going haywire:
  • Feature update to Windows 10, version 21H2
  • Cumulative Update for .NET Framework 3.5 and 4.8 for Windows 10 Version 20H2 for x64 (KB5008876)
  • Update for Windows 10 Version 21H2 for x64-based systems (KB5008212)
Again – I am using the same code files, same environments, same starting environments, etc. Can someone shed some light as to why I am encountering this phenomenon?

Respuestas (1)

Matt J
Matt J el 4 de Feb. de 2022
Editada: Matt J el 4 de Feb. de 2022
when I attempt to simply re-run the last iteration of the fixed-point algorithm
You wouldn't be able to re-run the last iteration unless you had the second to last iteration as a start point. If you mean you used the October 2021 output of fmincon as the initial guess, then,
(1) Whether the iterations would continue depends on which stopping criteria caused the October 2021 runs to stop. If you simply reached the MaxIteration limit, then convergence never happenened and there is no reason the iterations should not continue when resumed.
(2) Some stopping criteria are defined relative to the norm of the gradient at the initial point, see
Since you are now launching fmincon at a different point, those stopping rules would be affected.
(3) The SQP algorithm uses quasi-newton methods to estimate the Hessian of the Lagrangian, which means the Hessian is iteratively updated from some starting guess. I don't think you have any way of resuming the Hessian updates from where they left off in October.
(4) You may have multiple solutions. What is the first order optimality measure at the new and old solutions? How different are the objective values at the two solutions?
(5) You say the new solutions are very different, but we don't know how you are evaluating that. Even if the differences are on the order of 10^15, we can't know whether to consider that a large number. If the gradients in the neighborhood of the October solution are also O(10^15), it makes sense to view that as a small change.

7 comentarios

Sorry, I think I must have not explained things properly. Some fake code:
PARAM = %generate parameters for each of N simulations
start = exprssn(PARAM)
opt = optimset(stuff that i need, e.g., use SQP, tighter constraint tolerance, etc.)
P_now= P_init %initialize "industry" variable
P_then= 0 % initialize "industry" variable
tol = small
% when I refer to fixed-point algorithm, I refer to this while loop
while abs(P_now/P_then -1)>tol
parfor i=1:N
param = PARAM(i,:);
obj = @(x)OBJ(x,param);
constr = @(x)CONSTR(x,param);
% this is the actual optimization
[x,fval,exitflag] = fmincon(obj,start(i,:),[],[],[],[],lbd,ubd(i,:),[],opt)
Y(i,:) = [exitflag,x,fval]; % storing optimization values
end
start = [(Y(:,1)>0)*Y(:,2)+(1-(Y(:,1)>0))*exprssn(PARAM)] % update start values...
% so that new start values use optima where they are found
Y(Y(:,1)<1,:) = []; % purge all simulations that appear to have no solution
P_then= P; % store old "industry" variable
P_now = function(Y(:,2)); %use simulations to generate new "industry" variable
j = j+1;
end
My experiments PARAM = PARAM(:,k).*(1.01); <-- simply perturbing random parameter vector. Each experiment begins where the benchmark experiment ends - that its, it begins using the starting values and industry variable for iteration j-1 of the while loop but with the new vector of parameters.
(0) when I say I am re-running the last iteration, I am starting from iteration j-1 of the while loop with all of the attendant stored starting values.
(1) Sorry, but when I refer to "iteration" in the original post, I am referring to a complete run of the "while" loop above. The stopping criteria used to achieve convergence is
abs(P_now/P_then-1) < tol
Moreover, I am paying attention to why my solver stops for an individual simulation - I need to be able to discern between simulations that appear to have a solution and those that do not, as they have distinct interpretations in my application and must be treated differently.
(2) This may be true, but I should not be generating wildly different results given the above. More to the point, if there were a concern, it should have always been a concern and I should never have generated results that were consistent with each other. That is: this explanation seemingly does not explain why I should suddenly stop generating results that are consistent with each other, as it should have precluded obtaining consisten results to begin with.
(3) See the response to (2).
(4) I do not have multiple solutions. By my assumptions on the underlying primitives, each simulation will have a unique solution or none at all (which means I really, really need to pay attention to fmincon's exitflag).
(5) When I attempt to literally replicate my results using identical conditions and initializations, I get different results. Setting aside that you do not know how I am evaluating the difference between runs of my code - why should they be different at all given that I am feeding my code the same inputs?
The outer while loop is not something we can help you with. It's a home-brewed algorithm devised by you and only you have anyway of predicting its behavior. We also have no way of testing your claim that you are initializing the loop step consistent with October 2021. If you aren't (even if you think you are) that would explain why the results are different now.
The question we can explore with you is whether fmincon is doing its job. Is it giving you exitflag=1? Is it giving you a small optimality measure at convergence? Is the new solution feasible and giving a lower objective value than the previous solution? If yes to all of the above, what reason is there to doubt that the optimization steps are failing to produce valid output?
To clarify, I am not asking for help with the outer while loop; I am asking about things that could influence the behavior of fmincon. Stuff re outer loop is provided for context.
I am aware that I might have inadvertently changed something - it's just, whatever has been changed (at least from what I can tell) cannot be in the code itself, as I am running the code archived in October with the same static input values, for replication purposes - the same code used to generate experiments in December that were consistent with my September experiments. Now, attempts to replicate all experiments from the original start points - the 3 in September and the 2 in December - generate results that are consistent with each other, but not with the results obtained in September/December. The new results sit stably around a new equilibrium; however, the new equilibrium is still inexplicably different than the old one.
To me, it seems that any change would have to come through a change to general MATLAB preferences or something affecting MATLAB performance overall. The only such preferences that I can recall touching between the December experiments and the failed January replication attempts are parallelization preferences (in terms of number of workers & threads per worker). However, when I attempt to re-create the parallelization preferences used from the older environment (using the information stored in a saved parpool object), I still cannot replicate results.
I will also add: the new results appear to be more accurate (in terms of generating smaller fvals), so I'm not necessarily upset with them; however, I would still like to know why I am getting different fvals. It seems strange to me that I could inadvertently change something and get better results. Isn't that, like, not what happens?
I guess I am asking, if one were to take what I am saying at face value: given the same environmental/code inputs (functions, fixed start values, randomization, and code), what, if anything, could explain better fmincon outputs (e.g., OS update? something about the parallelization environment outside of the number of workers/threads per worker? some other set of preferences that affects fmincon performance?)? I am responsible for the replication of these results, yet I cannot replicate them nor explain why these new results are not only different but better than the previous results.
The only thing I can think of is that the parallel deployment could be different. If you have different non-Matlab process going on in the background since October, Matlab may not be able to subscribe the exact same cores to each worker. Therefore, the operations inside your objective function may not be multithreaded the same way, leading to differences in floating point errors. However, if the solutions ot the optimization problems are unique and well-conditioned, as you say they are, the difference shouldn't be very big.
One very important thing to note, however, is that if the new fvals are susbtantially better than the old ones, it means that your original optimization never succeeded, assuming the objective function and constraints indeed haven't changed. That means your original code had a bug in it, because the exitflag and conditions for convergence weren't as well-checked as you say they should be. A further possibility therefore is that you later fixed the bug, but neglected to rerun the computation before you archived the code in October.
Therefore, the operations inside your objective function may not be multithreaded the same way, leading to differences in floating point errors. However, if the solutions ot the optimization problems are unique and well-conditioned, as you say they are, the difference shouldn't be very big.
Unless, perhaps, the October solutions were saddle points of the objective function. Saddle points are unstable, so small floating point differences could tip the iterations in the direction of a substantially different minimum, like in the simplified problem below:
fun=@(x) x(1)^2-x(2)^2;
lb=[-1;-1]; ub=-lb;
optimoptions('fmincon','Algorithm','sqp');
[x,fval,exitflag,output]=fmincon(fun,[1e-8,1e-8],[],[],[],[],lb,ub), %unstable
Initial point is a local minimum that satisfies the constraints. Optimization completed because at the initial point, the objective function is non-decreasing in feasible directions to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance.
x = 1×2
1.0e+-8 * 1.0000 1.0000
fval = 0
exitflag = 1
output = struct with fields:
iterations: 0 funcCount: 3 constrviolation: 0 stepsize: 0 algorithm: 'interior-point' firstorderopt: 2.3267e-08 cgiterations: 0 message: '↵Initial point is a local minimum that satisfies the constraints.↵↵Optimization completed because at the initial point, the objective function is non-decreasing ↵in feasible directions to within the value of the optimality tolerance, and ↵constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The final point is the initial point.↵The first-order optimality measure, 2.326744e-08, is less than options.OptimalityTolerance =↵1.000000e-06, and the maximum constraint violation, 0.000000e+00, is less than↵options.ConstraintTolerance = 1.000000e-06.↵↵' bestfeasible: [1×1 struct]
[x,fval,exitflag,output]=fmincon(fun,[1e-8,1e-6],[],[],[],[],lb,ub), %stable
Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance.
x = 1×2
0.0000 1.0000
fval = -1.0000
exitflag = 1
output = struct with fields:
iterations: 11 funcCount: 36 constrviolation: 0 stepsize: 9.9000e-07 algorithm: 'interior-point' firstorderopt: 1.9446e-06 cgiterations: 0 message: '↵Local minimum found that satisfies the constraints.↵↵Optimization completed because the objective function is non-decreasing in ↵feasible directions, to within the value of the optimality tolerance,↵and constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The relative first-order optimality measure, 9.723043e-07,↵is less than options.OptimalityTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.↵↵' bestfeasible: [1×1 struct]
"Therefore, the operations inside your objective function may not be multithreaded the same way, leading to differences in floating point errors. However, if the solutions ot the optimization problems are unique and well-conditioned, as you say they are, the difference shouldn't be very big....
Unless, perhaps, the October solutions were saddle points of the objective function. Saddle points are unstable, so small floating point differences could tip the iterations in the direction of a substantially different minimum..."
Just wanted to provide an update, as this was really helpful. While my program should not have saddle points, it does have another source of instability: scaling. Previously, I had thought that my extant fixes had vanquished the scaling issue, as I continued to get what appeared to be stable optimization results even as I varied the program's structural parameters significantly.
However, given your point regarding solution stability and multithreading changes, it appears that what was really happening was that the solver was returning solutions of poor quality that were consistent with one another. My guess is that an OS update in early January changed the nature of the non-MATLAB processes running in the background in a manner that allowed the solver to perform better, generating a structural break in the solutions generated by the solver.
The final clue that scaling is the culprit: since encounter the structural break in January, re-calibrating the model has been a nightmare, as my solutions - while better (in an fval sense) than those obtained before - have been extremely unstable with respect to changes in underlying structural parameters. After getting inspired by your comment, I tweaked my function definitions to ensure the control variables are closer in magnitude. While it's too early to tell whether this willl definitively end the issue, initial results have been encouraging. Thank you!

Iniciar sesión para comentar.

Preguntada:

el 3 de Feb. de 2022

Comentada:

el 5 de Feb. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by