First-Order Optimality Measure

What Is First-Order Optimality Measure?

First-order optimality is a measure of how close a point x is to optimal. Most Optimization Toolbox™ solvers use this measure, though it has different definitions for different algorithms. First-order optimality is a necessary condition, but it is not a sufficient condition. In other words:

The first-order optimality measure must be zero at a minimum.
A point with first-order optimality equal to zero is not necessarily a minimum.

For general information about first-order optimality, see Nocedal and Wright [31]. For specifics about the first-order optimality measures for Optimization Toolbox solvers, see Unconstrained Optimality, Constrained Optimality Theory, and Constrained Optimality in Solver Form.

Stopping Rules Related to First-Order Optimality

The OptimalityTolerance tolerance relates to the first-order optimality measure. Typically, if the first-order optimality measure is less than OptimalityTolerance, solver iterations end.

Some solvers or algorithms use relative first-order optimality as a stopping criterion. Solver iterations end if the first-order optimality measure is less than μ times OptimalityTolerance, where μ is either:

The infinity norm (maximum) of the gradient of the objective function at x0
The infinity norm (maximum) of inputs to the solver, such as f or b in linprog or H in quadprog

A relative measure attempts to account for the scale of a problem. Multiplying an objective function by a very large or small number does not change the stopping condition for a relative stopping criterion, but does change it for an unscaled one.

Solvers with enhanced exit messages state, in the stopping criteria details, when they use relative first-order optimality.

Unconstrained Optimality

For a smooth unconstrained problem,

$\min_{x} f (x),$

the first-order optimality measure is the infinity norm (meaning maximum absolute value) of ∇f(x), which is:

$first-order optimality measure = \max_{i} | {(\nabla f (x))}_{i} | = {‖ \nabla f (x) ‖}_{\infty} .$

This measure of optimality is based on the familiar condition for a smooth function to achieve a minimum: its gradient must be zero. For unconstrained problems, when the first-order optimality measure is nearly zero, the objective function has gradient nearly zero, so the objective function could be near a minimum. If the first-order optimality measure is not small, the objective function is not minimal.

Constrained Optimality Theory

This section summarizes the theory behind the definition of first-order optimality measure for constrained problems. The definition as used in Optimization Toolbox functions is in Constrained Optimality in Solver Form.

For a smooth constrained problem, let g and h be vector functions representing all inequality and equality constraints respectively (meaning bound, linear, and nonlinear constraints):

$\min_{x} f (x) subject to g (x) \leq 0, h (x) = 0.$

The meaning of first-order optimality in this case is more complex than for unconstrained problems. The definition is based on the Karush-Kuhn-Tucker (KKT) conditions. The KKT conditions are analogous to the condition that the gradient must be zero at a minimum, modified to take constraints into account. The difference is that the KKT conditions hold for constrained problems.

The KKT conditions use the auxiliary Lagrangian function:

L (x, λ) = f (x) + \sum λ_{g, i} g_{i} (x) + \sum λ_{h, i} h_{i} (x) .

(1)

The vector λ, which is the concatenation of λ_g and λ_h, is the Lagrange multiplier vector. Its length is the total number of constraints.

The KKT conditions are:

\nabla_{x} L (x, λ) = 0,

(2)

λ_{g, i} g_{i} (x) = 0 \forall i,

(3)

{\begin{matrix} g (x) \leq 0, \\ h (x) = 0, \\ λ_{g, i} \geq 0. \end{matrix}

(4)

Solvers do not use the three expressions in Equation 4 in the calculation of optimality measure.

The optimality measure associated with Equation 2 is

‖ \nabla_{x} L (x, λ) ‖ = ‖ \nabla f (x) + \sum λ_{g, i} \nabla g_{i} (x) + \sum λ_{h, i} \nabla h_{h, i} (x) ‖ .

(5)

The optimality measure associated with Equation 3 is

‖ \vec{λ_{g} g} (x) ‖,

(6)

where the norm in Equation 6 means infinity norm (maximum) of the vector

\vec{λ_{g, i} g_{i}} (x)

The combined optimality measure is the maximum of the values calculated in Equation 5 and Equation 6. Solvers that accept nonlinear constraint functions report constraint violations g(x) > 0 or |h(x)| > 0 as ConstraintTolerance violations. See Tolerances and Stopping Criteria.

Constrained Optimality in Solver Form

Most constrained toolbox solvers separate their calculation of first-order optimality measure into bounds, linear functions, and nonlinear functions. The measure is the maximum of the following two norms, which correspond to Equation 5 and Equation 6:

\begin{array}{l} ‖ \nabla_{x} L (x, λ) ‖ = ‖ \nabla f (x) + A^{T} λ_{i n e q l i n} + A e q^{T} λ_{e q l i n} \\ + \sum λ_{i n e q n o n l i n, i} \nabla c_{i} (x) + \sum λ_{e q n o n l i n, i} \nabla c e q_{i} (x) ‖, \end{array}

(7)

‖ \vec{| l_{i} - x_{i} | λ_{l o w e r, i}}, \vec{| x_{i} - u_{i} | λ_{u p p e r, i}}, \vec{| {(A x - b)}_{i} | λ_{i n e q l i n, i}}, \vec{| c_{i} (x) | λ_{i n e q n o n l i n, i}} ‖,

(8)

where the norm of the vectors in Equation 7 and Equation 8 is the infinity norm (maximum). The subscripts on the Lagrange multipliers correspond to solver Lagrange multiplier structures. See Lagrange Multiplier Structures. The summations in Equation 7 range over all constraints. If a bound is ±Inf, that term is not constrained, so it is not part of the summation.

Linear Equalities Only

For some large-scale problems with only linear equalities, the first-order optimality measure is the infinity norm of the projected gradient. In other words, the first-order optimality measure is the size of the gradient projected onto the null space of Aeq.

Bounded Least-Squares and Trust-Region-Reflective Solvers

For least-squares solvers and trust-region-reflective algorithms, in problems with bounds alone, the first-order optimality measure is the maximum over i of |v_i*g_i|. Here g_i is the ith component of the gradient, x is the current point, and

$v_{i} = {\begin{array}{l} | x_{i} - b_{i} | & if the negative gradient points toward bound b_{i} \\ 1 & otherwise . \end{array}$

If x_i is at a bound, v_i is zero. If x_i is not at a bound, then at a minimizing point the gradient g_i should be zero. Therefore the first-order optimality measure should be zero at a minimizing point.