Cannot replicate results of nonlinear optimization w/ nonlinear constraints from replication files

Question

Nick Rowe el 3 de Feb. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/1642735-cannot-replicate-results-of-nonlinear-optimization-w-nonlinear-constraints-from-replication-files

Comentada: Matt J el 5 de Feb. de 2022

I cannot seem to replicate simulation results generated by my MATLAB code despite using same code, functions, starting values, random number generator, etc. Since I am literally using the same code/conditions used to generate the code the first time, I have omitted the code itself.

General description of code used: I simulate N nonlinear optimization problems with nonlinear constraints using fmincon, parallelized over my machine’s cores. I iterate over these simulations until a fixed point is reached. I am using fmincon’s SQP solver with centralized finite differences and have provided the analytical gradients for both my objective and constraint functions. Generally, CheckGradients does not fail – will get one failed for a large number of passed CheckGradients, presumably due to some scaling issue for a particular simulation.

Description of problem: I ran five experiments in the latter months of 2021: 3 in September on MATLAB 2018A, and 2 in December on MATLAB 2021B. All five experiments generated results that were generally consistent with one another.

In January 2022, I attempted to replicate all three of the September experiments using code archived in October for the express purpose of replicating the September results (there were no changes to the code between September and October); in particular, when I attempt to simply re-run the last iteration of the fixed-point algorithm – i.e., the iteration in which the code should terminate – the code does not terminate. In fact, in examining the “replicate” simulations, I see results that are very, very different than my original results.

Final context: I believe I installed the following updates in between the last time my code was generating consistent results and the point at which things started going haywire:

Feature update to Windows 10, version 21H2
Cumulative Update for .NET Framework 3.5 and 4.8 for Windows 10 Version 20H2 for x64 (KB5008876)
Update for Windows 10 Version 21H2 for x64-based systems (KB5008212)

Again – I am using the same code files, same environments, same starting environments, etc. Can someone shed some light as to why I am encountering this phenomenon?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Matt J el 4 de Feb. de 2022

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1642735-cannot-replicate-results-of-nonlinear-optimization-w-nonlinear-constraints-from-replication-files#answer_888735

Editada: Matt J el 4 de Feb. de 2022

when I attempt to simply re-run the last iteration of the fixed-point algorithm

You wouldn't be able to re-run the last iteration unless you had the second to last iteration as a start point. If you mean you used the October 2021 output of fmincon as the initial guess, then,

(1) Whether the iterations would continue depends on which stopping criteria caused the October 2021 runs to stop. If you simply reached the MaxIteration limit, then convergence never happenened and there is no reason the iterations should not continue when resumed.

(2) Some stopping criteria are defined relative to the norm of the gradient at the initial point, see

https://www.mathworks.com/help/optim/ug/first-order-optimality-measure.html#bs17j81

Since you are now launching fmincon at a different point, those stopping rules would be affected.

(3) The SQP algorithm uses quasi-newton methods to estimate the Hessian of the Lagrangian, which means the Hessian is iteratively updated from some starting guess. I don't think you have any way of resuming the Hessian updates from where they left off in October.

(4) You may have multiple solutions. What is the first order optimality measure at the new and old solutions? How different are the objective values at the two solutions?

(5) You say the new solutions are very different, but we don't know how you are evaluating that. Even if the differences are on the order of 10^15, we can't know whether to consider that a large number. If the gradients in the neighborhood of the October solution are also O(10^15), it makes sense to view that as a small change.

7 comentarios
Mostrar 5 comentarios más antiguosOcultar 5 comentarios más antiguos

Nick Rowe el 4 de Feb. de 2022

Abrir en MATLAB Online

Sorry, I think I must have not explained things properly. Some fake code:

PARAM = %generate parameters for each of N simulations
start =  exprssn(PARAM)
opt = optimset(stuff that i need, e.g., use SQP, tighter constraint tolerance, etc.)
P_now= P_init %initialize "industry" variable
P_then= 0 % initialize "industry" variable
tol = small
% when I refer to fixed-point algorithm, I refer to this while loop
while abs(P_now/P_then -1)>tol
    parfor i=1:N
        param = PARAM(i,:);
        obj = @(x)OBJ(x,param);
        constr = @(x)CONSTR(x,param);
        % this is the actual optimization
        [x,fval,exitflag] = fmincon(obj,start(i,:),[],[],[],[],lbd,ubd(i,:),[],opt)
        Y(i,:) = [exitflag,x,fval]; % storing optimization values
    end
    start = [(Y(:,1)>0)*Y(:,2)+(1-(Y(:,1)>0))*exprssn(PARAM)] % update start values...
    % so that new start values use optima where they are found
    Y(Y(:,1)<1,:) = []; % purge all simulations that appear to have no solution 
    P_then= P; % store old "industry" variable
    P_now = function(Y(:,2)); %use simulations to generate new "industry" variable
    j = j+1;
    end

My experiments ≈ PARAM = PARAM(:,k).*(1.01); <-- simply perturbing random parameter vector. Each experiment begins where the benchmark experiment ends - that its, it begins using the starting values and industry variable for iteration j-1 of the while loop but with the new vector of parameters.

(0) when I say I am re-running the last iteration, I am starting from iteration j-1 of the while loop with all of the attendant stored starting values.

(1) Sorry, but when I refer to "iteration" in the original post, I am referring to a complete run of the "while" loop above. The stopping criteria used to achieve convergence is

abs(P_now/P_then-1) < tol

Moreover, I am paying attention to why my solver stops for an individual simulation - I need to be able to discern between simulations that appear to have a solution and those that do not, as they have distinct interpretations in my application and must be treated differently.

(2) This may be true, but I should not be generating wildly different results given the above. More to the point, if there were a concern, it should have always been a concern and I should never have generated results that were consistent with each other. That is: this explanation seemingly does not explain why I should suddenly stop generating results that are consistent with each other, as it should have precluded obtaining consisten results to begin with.

(3) See the response to (2).

(4) I do not have multiple solutions. By my assumptions on the underlying primitives, each simulation will have a unique solution or none at all (which means I really, really need to pay attention to fmincon's exitflag).

(5) When I attempt to literally replicate my results using identical conditions and initializations, I get different results. Setting aside that you do not know how I am evaluating the difference between runs of my code - why should they be different at all given that I am feeding my code the same inputs?

Nick Rowe el 4 de Feb. de 2022

Editada: Nick Rowe el 4 de Feb. de 2022

To clarify, I am not asking for help with the outer while loop; I am asking about things that could influence the behavior of fmincon. Stuff re outer loop is provided for context.

I am aware that I might have inadvertently changed something - it's just, whatever has been changed (at least from what I can tell) cannot be in the code itself, as I am running the code archived in October with the same static input values, for replication purposes - the same code used to generate experiments in December that were consistent with my September experiments. Now, attempts to replicate all experiments from the original start points - the 3 in September and the 2 in December - generate results that are consistent with each other, but not with the results obtained in September/December. The new results sit stably around a new equilibrium; however, the new equilibrium is still inexplicably different than the old one.

To me, it seems that any change would have to come through a change to general MATLAB preferences or something affecting MATLAB performance overall. The only such preferences that I can recall touching between the December experiments and the failed January replication attempts are parallelization preferences (in terms of number of workers & threads per worker). However, when I attempt to re-create the parallelization preferences used from the older environment (using the information stored in a saved parpool object), I still cannot replicate results.

I will also add: the new results appear to be more accurate (in terms of generating smaller fvals), so I'm not necessarily upset with them; however, I would still like to know why I am getting different fvals. It seems strange to me that I could inadvertently change something and get better results. Isn't that, like, not what happens?

I guess I am asking, if one were to take what I am saying at face value: given the same environmental/code inputs (functions, fixed start values, randomization, and code), what, if anything, could explain better fmincon outputs (e.g., OS update? something about the parallelization environment outside of the number of workers/threads per worker? some other set of preferences that affects fmincon performance?)? I am responsible for the replication of these results, yet I cannot replicate them nor explain why these new results are not only different but better than the previous results.

Matt J el 4 de Feb. de 2022

Editada: Matt J el 4 de Feb. de 2022

The only thing I can think of is that the parallel deployment could be different. If you have different non-Matlab process going on in the background since October, Matlab may not be able to subscribe the exact same cores to each worker. Therefore, the operations inside your objective function may not be multithreaded the same way, leading to differences in floating point errors. However, if the solutions ot the optimization problems are unique and well-conditioned, as you say they are, the difference shouldn't be very big.

One very important thing to note, however, is that if the new fvals are susbtantially better than the old ones, it means that your original optimization never succeeded, assuming the objective function and constraints indeed haven't changed. That means your original code had a bug in it, because the exitflag and conditions for convergence weren't as well-checked as you say they should be. A further possibility therefore is that you later fixed the bug, but neglected to rerun the computation before you archived the code in October.

Matt J el 4 de Feb. de 2022

Editada: Matt J el 4 de Feb. de 2022

Abrir en MATLAB Online

Therefore, the operations inside your objective function may not be multithreaded the same way, leading to differences in floating point errors. However, if the solutions ot the optimization problems are unique and well-conditioned, as you say they are, the difference shouldn't be very big.

Unless, perhaps, the October solutions were saddle points of the objective function. Saddle points are unstable, so small floating point differences could tip the iterations in the direction of a substantially different minimum, like in the simplified problem below:

fun=@(x) x(1)^2-x(2)^2;
lb=[-1;-1]; ub=-lb;
optimoptions('fmincon','Algorithm','sqp');
[x,fval,exitflag,output]=fmincon(fun,[1e-8,1e-8],[],[],[],[],lb,ub), %unstable
Initial point is a local minimum that satisfies the constraints.

Optimization completed because at the initial point, the objective function is non-decreasing 
in feasible directions to within the value of the optimality tolerance, and 
constraints are satisfied to within the value of the constraint tolerance.
x = 1×2
	1.0e+-8 *

    1.0000    1.0000
fval = 0
exitflag = 1
output = struct with fields:
         iterations: 0
          funcCount: 3
    constrviolation: 0
           stepsize: 0
          algorithm: 'interior-point'
      firstorderopt: 2.3267e-08
       cgiterations: 0
            message: '↵Initial point is a local minimum that satisfies the constraints.↵↵Optimization completed because at the initial point, the objective function is non-decreasing ↵in feasible directions to within the value of the optimality tolerance, and ↵constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The final point is the initial point.↵The first-order optimality measure, 2.326744e-08, is less than options.OptimalityTolerance =↵1.000000e-06, and the maximum constraint violation, 0.000000e+00, is less than↵options.ConstraintTolerance = 1.000000e-06.↵↵'
       bestfeasible: [1×1 struct]
[x,fval,exitflag,output]=fmincon(fun,[1e-8,1e-6],[],[],[],[],lb,ub), %stable
Local minimum found that satisfies the constraints.

Optimization completed because the objective function is non-decreasing in 
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.
x = 1×2
    0.0000    1.0000
fval = -1.0000
exitflag = 1
output = struct with fields:
         iterations: 11
          funcCount: 36
    constrviolation: 0
           stepsize: 9.9000e-07
          algorithm: 'interior-point'
      firstorderopt: 1.9446e-06
       cgiterations: 0
            message: '↵Local minimum found that satisfies the constraints.↵↵Optimization completed because the objective function is non-decreasing in ↵feasible directions, to within the value of the optimality tolerance,↵and constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The relative first-order optimality measure, 9.723043e-07,↵is less than options.OptimalityTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.↵↵'
       bestfeasible: [1×1 struct]

Nick Rowe el 5 de Feb. de 2022

"Therefore, the operations inside your objective function may not be multithreaded the same way, leading to differences in floating point errors. However, if the solutions ot the optimization problems are unique and well-conditioned, as you say they are, the difference shouldn't be very big....

Unless, perhaps, the October solutions were saddle points of the objective function. Saddle points are unstable, so small floating point differences could tip the iterations in the direction of a substantially different minimum..."

Just wanted to provide an update, as this was really helpful. While my program should not have saddle points, it does have another source of instability: scaling. Previously, I had thought that my extant fixes had vanquished the scaling issue, as I continued to get what appeared to be stable optimization results even as I varied the program's structural parameters significantly.

However, given your point regarding solution stability and multithreading changes, it appears that what was really happening was that the solver was returning solutions of poor quality that were consistent with one another. My guess is that an OS update in early January changed the nature of the non-MATLAB processes running in the background in a manner that allowed the solver to perform better, generating a structural break in the solutions generated by the solver.

The final clue that scaling is the culprit: since encounter the structural break in January, re-calibrating the model has been a nightmare, as my solutions - while better (in an fval sense) than those obtained before - have been extremely unstable with respect to changes in underlying structural parameters. After getting inspired by your comment, I tweaked my function definitions to ensure the control variables are closer in magnitude. While it's too early to tell whether this willl definitively end the issue, initial results have been encouraging. Thank you!

Matt J el 5 de Feb. de 2022

You are quite welcome.

Iniciar sesión para comentar.

Cannot replicate results of nonlinear optimization w/ nonlinear constraints from replication files

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

7 comentarios
Mostrar 5 comentarios más antiguosOcultar 5 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Cannot replicate results of nonlinear optimization w/ nonlinear constraints from replication files

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

7 comentarios Mostrar 5 comentarios más antiguosOcultar 5 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

7 comentarios
Mostrar 5 comentarios más antiguosOcultar 5 comentarios más antiguos