Backslash does not provided the solution with the smallest 2-norm

Question

Carl Christian Kjelgaard Mikkelsen el 4 de Ag. de 2022

1
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/1774000-backslash-does-not-provided-the-solution-with-the-smallest-2-norm

Comentada: Bruno Luong el 6 de Ag. de 2022

I was debugging a constraint solver when I encountered a problem with the backslash operator in MATLAB.

I have isolated the problem into the attached minimal working example. The issue occurs when solving a consistent but under-determined linear system Ax=b. In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right? I believe that the attached script demonstrates that this is not the case and that the computed result depends on whether A is treated as a sparse or a dense matrix.

Would you please take a look and see if I have missed something or if this is indeed an issue that should be fixed?

Kind regards

Carl Christian K. Mikkelsen.

% The backslash operator and underdetermined linear systems
% 
% The result of A\b depends on whether A is dense or sparse and 
% the computed results do not minimize the 2-norm over the set
% of solutions of Ax=b.
%
% PROGRAMMING by Carl Christian Kjelgaard Mikkelsen (spock@cs.umu.se)
%   2022-08-04 Initial programming and testing
% Define a wide matrix 
A=[1 1 0; 0 1 1];
% Define a solution
t=(1:3)';
% Define the right hand side so that Ax=b is consistent
b=A*t;
% Solve the underdetermined linear equation Ax=b
% using a dense representation of A
x=A\b;
% Solve the underdetermined linear equation Ax=b 
% using a sparse representation of A
y=sparse(A)\b;
% Solve the underdetermined linear equation Ax=b
% in the linear least squares sense
z=A'*((A*A')\b);
% Compute the residuals
rx=b-A*x;
ry=b-A*y;
rz=b-A*z;
% Compute the 2-norm of the solutions
nx=norm(x);
ny=norm(y);
nz=norm(z);
% Display the results
fprintf('The solutions\n');
The solutions
display([x y z]);
   -2.0000    3.0000    0.3333
    5.0000         0    2.6667
         0    5.0000    2.3333
fprintf('The residuals\n');
The residuals
display([rx ry rz]);
   1.0e-14 *

    0.1776         0    0.0444
    0.1776         0   -0.1776
fprintf('The norm of the solutions\n');
The norm of the solutions
display([nx ny nz]);
    5.3852    5.8310    3.5590

15 comentarios
Mostrar 13 comentarios más antiguosOcultar 13 comentarios más antiguos

Carl Christian Kjelgaard Mikkelsen el 4 de Ag. de 2022

@Steven Lord Thank your for your response. I have discussed this point elsewhere on this page, but I shall develop the point further right here.

Some authors will carefully distinguish between the solving a tall linear system in the least squares sense and computing the least norm solution of a wide linear systems. However, it is quite common to use the term "least squares solution" to cover both case. In the case of a consistent and underdetermined linear system, there are solutions and it makes no sense to minimize the residual, rather it is the norm of the solution which is minimized.

Therefore, when a user reads the text provided by the command "help \" there is only one way to interpret the line:

"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B"

The key is the use of the article "the" as in "the solution". It implies uniqueness. Therefore, the reader will believe that x=A\b minimizes 2-norm of the residual when A is tall matrix of full column rank and that x=A\b has the smallest 2-norm of all the solutions of Ax=b when A is a wide matrix of full row rank. Why? Because these two problems connected to the 2-norm have unique solutions.

Paul el 5 de Ag. de 2022

I agree with @Carl Christian Kjelgaard Mikkelsen that help and the documentation should be clarified. The help states

s = split(string(help('mldvide')),'.');
s = strtrim(s(8))
s = 
    "If A is an M-by-N rectangular matrix with M~=N and B is a column
         vector with M components, or a matrix with several such columns,
         then X = A\B is the solution in the least squares sense to the
         under- or overdetermined system of equations A*X = B"

Note the phrase "the solution."

However, the doc page mldivide, \ states:

"

If A is a rectangular m-by-n matrix with m ~= n, and B is a matrix with m rows, then A\B returns a least-squares solution to the system of equations A*x= B."

Note the phrase "a ... solution."

So if reading the former, it's reasonable to expect that mldivide returns "the [unique] solution" and if the latter it returns "a ... solution [of many]."

As has been pointed out, if A is not square, then there can be many solutions that yield the same, minimum 2-norm of the residual. So, IMO, the help should be changed to not refer to "the solution."

What would be the harm in modifying the doc page (and the help I suppose) to read something like "... then A\B returns a least-squares solution to the system of equations A*x= B, i.e., a solution that mininizes norm(A*x - b)." Then there would be no confusion between least-squares and minimum norm solutions.

It would be even better to also explain that in some cases such a solution is unique and in others it isn't, and for the latter cases explain which criteria is used to return the result, unless such logic is too dificult to explain when considering all of the possible paths that mldivide can take.

Paul el 5 de Ag. de 2022

You are correct, I shouldn't have used "unique." According to my dictionary "the" is a definite article to "indicate that a following noun or noun equivalent is definite or has been previously specified by context or by circumstance." By pointing her finger, Sandy is making clear by circumstance the specific car. On the other hand, it would be quite confusing if Sandy said "look at the car" w/o any other indication if there are many cars to see.

If stating "the solution" it should be clear to the reader which solution is the referent, which might or might not be a unique solution, but at least we'll know which solution we're talking about.

In this case, there can, in general, be many least-squares solutions, so refering to "the (define) least-squares solution" w/o any additional context is not correct, IMO.

OTOH, the indefinite article "a" means the referent is uspecified. Sandy pointed at a car. I don't know which car she pointed to. mldivide returns a least-squares solution. I don't know which one it returns, just that whetever it returns is a least-squares solution.

Though I woudn't use the minimum norm solution synonymously with least-square solution, apparently others do as shown by @Carl Christian Kjelgaard Mikkelsen in this commentand also here. So if one were to (correctly IMO) read "the solution" as referring to a particular solution, it makes a lot of sense to think it's refering to the unique, minimum norm solution for the underdetermined problem.

And now I realzie that @Carl Christian Kjelgaard Mikkelsen already pointed out the "a" / "the" issue in this comment. I just should have responded there.

Bruno Luong el 5 de Ag. de 2022

"a solution" and "the solution" are not gramatically equivalent."

Granted, but they are not exclusive.

Given that

"the solution" (in a proper context) is "a solution".

They can perfectly both used, without any contradiction. So I don't see the reason they cannot be coexist.

Les Beckham el 5 de Ag. de 2022

Holy cow, guys. This has pretty much degenerated into "that depends on what the definition of "is" is". Let it go, already. Have a good night. :)

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Matt J el 4 de Ag. de 2022

2
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1774000-backslash-does-not-provided-the-solution-with-the-smallest-2-norm#answer_1021145

Editada: Matt J el 4 de Ag. de 2022

In this case, we expect that MATLAB returns the solution x which has the smallest 2-norm, right?

No, the backslash operator will use a QR-solver to produce a solution in that case. This won't necessarily be the least 2-norm solution.

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Carl Christian Kjelgaard Mikkelsen el 4 de Ag. de 2022

Firstly, I would like to thank you both for reading and responding to my question.

While I am grateful for references to the various functions that can be used to compute the least squares solution of an underdetermined linear system, this was never an issue for me.

However, two issues remain:

Firstly, the computed result depends on the representation of the matrix. We get one result if we treat the matrix as a dense matrix and we get another result which is completely different if we treat the matrix as a sparse matrix. Is this really the intended behavior? If the answer is yes, then this should be made very clear in the documentation.

Secondly, the documentation does not reflect the reality of the software. I quote from the website: https://se.mathworks.com/help/matlab/ref/mldivide.html, specifically the line:

"If A is a rectangular m-by-n matrix with m~=n, and B is a matrix with m rows, then A\B is a least-squares solution to the system of equations A*x=B."

The website is correct in the sense that A\B is a solution, but as I have demonstrated it is not the least squares solution. I also quote from the internal documentation from MATLAB 2020b, i.e., the command "help \":

"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B. The effective rank, K, of A is determined from the QR decomposition with pivoting. A solution X is computed which has at most K nonzero components per column. If K < N this will usually not be the same solution as PINV(A)*B. A\EYE(SIZE(A)) produces a generalized inverse of A."

The internal documentation is wrong because as I have demonstrated, A\B is not the least squares solution. Moreover, the first and the third sentence of the internal documentation appear to contradict each other.

Matt J el 4 de Ag. de 2022

Editada: Matt J el 4 de Ag. de 2022

We get one result if we treat the matrix as a dense matrix and we get another result which is completely different if we treat the matrix as a sparse matrix. Is this really the intended behavior? If the answer is yes, then this should be made very clear in the documentation.

Even if you had agreement between dense-type and sparse-type, you could not count on getting the same result on different computer architectures.

When a continuuum of solutions exists to any minimization problem, then the solution is unstable. It is never possible to guarantee which one you will get from a numerical routine. This is a general fact from numerical analysis. See below how much the solution can change when we add small random perturbations to A and b in your proposed example:

% Define a wide matrix 
A=[1 1 0; 0 1 1];
% Define a solution
t=(1:3)';
% Define the right hand side so that Ax=b is consistent
b=A*t
b = 2×1
     3
     5
t1=A\b  %solution 1
t1 = 3×1
   -2.0000
    5.0000
         0
noise=@(z) z+0.0001*randn(size(z));
An=noise(A), bn=noise(b),  %add noise
An = 2×3
    0.9999    1.0001   -0.0002
    0.0000    1.0000    1.0000
bn = 2×1
    2.9999
    5.0003
t2=An\bn  %noisy solution
t2 = 3×1
         0
    2.9999
    2.0003

Carl Christian Kjelgaard Mikkelsen el 4 de Ag. de 2022

Thank you all for taking the time to respond. I shall reply to each individual.

@Torsten You are certainly correct that the least squares solution of a tall linear system is distinctly different from the least norm solution of a wide linear system. However, it is quite common to use the term least squares solution to refer to both situations. When I read the internal documentation I took note of the use of definite article "the" as in "the solution", a terminology which always implies uniqueness. I therefore assumed that A\b would minimize the 2-norm of the residual when the matrix is tall and minimize the 2-norm of the solution when the matrix is wide.

@Bruno Luong In my case of a consistent underdetermined linear system the residual is zero. It is therefore natural that the reader assumes that the term "the solution in the least squares sense" refers to problem of finding the solution with the smallest 2-norm. It is very kind of you to admit that the documentation is not always rigourously written. We now know how to improve this specific section.

@Matt J Thank you for demonstrating that MATLAB's backslash operator is unstable for this use case. The algorithm that I use, i.e. the expression z=A'(A*A')\b, does not suffer from this problem as the central matrix

is exceedingly well-conditioned for the example that I have provided.

Matt J el 5 de Ag. de 2022

Editada: Matt J el 5 de Ag. de 2022

You are quite far from the issue at hand, i.e., the result of x=A\b depends on the data type of A and the 2-norm of x is not minima

The fact that the 2-norm of x is not minimized is not the issue. We have already established that A\b does not pledge to deliver an x with this property, either for sparse or for full A.

The fact that the solution depends on the data type of A is the only issue that remains, and my comments are applicable to it. I have argued that when the solution is under-determined, you cannot expect different numerical routines - nor even the same numerical routine on a different computer - to deliver the same solution, ever. So, it is not clear why you think the results should be the same.

Perhaps you think that sparse and full should at least use the same algorithm to select one of the non-unique solutions? That defeats the purpose, though, of having sparse type. The purpose of sparse type is to process things differently from full type, and in a way that exploits the sparsity of A.

Carl Christian Kjelgaard Mikkelsen el 5 de Ag. de 2022

@Matt J No, I am afraid that I cannot agree with you. I believe that we have established that the documentation of backslash can be improved to clarify which problem is being solved and that the terminology is not consistent across the international community or even within the USA, see the link to Wiscon state university that I provided above.

I have already explained to @Steven Lord how the current documentation can be read to support my interpretation. I have also stated clearly that if the desired behavior is to produce different solution for the same matrix depending on the data-type, then this should figure prominently in the documentation.

Matt J el 5 de Ag. de 2022

Editada: Matt J el 5 de Ag. de 2022

@Carl Christian Kjelgaard Mikkelsen Even if you disagree or think the documentation should be modified, your question has been answered right?

It should now be clear that A\b purports to do no more than to minimize

. When there exists more than one solution to this problem, there is no commitment on the MathWorks' part as to which of the non-unique solutions you will get. Furthermore, there is no commitment that the solutions will be the same from full-to-sparse or from computer-to-computer. None of this is anything the MathWorks regards as a bug.

Iniciar sesión para comentar.

Answer 2

Bruno Luong el 4 de Ag. de 2022

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1774000-backslash-does-not-provided-the-solution-with-the-smallest-2-norm#answer_1021240

Editada: Bruno Luong el 4 de Ag. de 2022

Few other methods to get least-norm solution

A = rand(3,6)
A = 3×6
    0.2748    0.9301    0.9530    0.4196    0.5850    0.3316
    0.4791    0.8315    0.6904    0.6636    0.2678    0.6999
    0.4191    0.4202    0.0789    0.2761    0.9976    0.3208
b = rand(3,1)
b = 3×1
    0.4586
    0.9898
    0.8222
% Methode 1, Christine, recommended
x = lsqminnorm(A,b)
x = 6×1
    0.5627
    0.0942
   -0.3752
    0.5117
    0.2032
    0.7243
% Method 2, Pseudo inverse
x = pinv(A)*b
x = 6×1
    0.5627
    0.0942
   -0.3752
    0.5117
    0.2032
    0.7243
% Method 3, BS on KKT
[m,n] = size(A);
y=[eye(n), A'; A, zeros(m)] \ [zeros(n,1); b];
x = y(1:n)
x = 6×1
    0.5627
    0.0942
   -0.3752
    0.5117
    0.2032
    0.7243
% Method 4, QR on A'
[Q,R,p] = qr(A',0);
x = Q*(R'\b(p,:))
x = 6×1
    0.5627
    0.0942
   -0.3752
    0.5117
    0.2032
    0.7243

2 comentarios
Mostrar NingunoOcultar Ninguno

Carl Christian Kjelgaard Mikkelsen el 4 de Ag. de 2022

Thank you for reading and responding to my question.

Please note that my issue is not the problem of computing the least squares solution of a consistent underdetermined linear system, but the fact that the backslash operator provides very different results depending on the matrix representation and that the computed results do not minimize the 2-norm over the set of solutions.

Bruno Luong el 4 de Ag. de 2022

Editada: Bruno Luong el 4 de Ag. de 2022

"Please note that my issue is not the problem of computing the least squares solution"

I think you need to reset the definition of what is "least squares solution". I guess all the confusion is that you associate with minimum norm solusion, which is not in general.

I know, and guess most of people working sometime with MATLAB also know it.

https://www.math-forums.com/threads/underdetermined-systems-backslash.188674/

https://fr.mathworks.com/matlabcentral/answers/814450-mldivide-algorithm-for-an-underdetermined-system-of-equations

https://fr.mathworks.com/matlabcentral/answers/1755095-why-matrix-division-returns-different-answers?s_tid=srchtitle

Iniciar sesión para comentar.

Answer 3

John D'Errico el 5 de Ag. de 2022

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1774000-backslash-does-not-provided-the-solution-with-the-smallest-2-norm#answer_1021605

Editada: John D'Errico el 5 de Ag. de 2022

It seems the gist of your question comes down to the idea that for a wide matrix, MATLAB should (in your eyes only) always produce a solution that minimizes the norm of x. For square matrices, or matrices that are taller than wide, it should do something else? That alone seems highly inconsistent to me.

So first, where have you ever seen the claim that this is true? No place in the documentation is this claimed. I even took a quick look. Maybe you have confused the idea of a solution that minimizes the norm of the residiuals to a solution that minimizes the norm of x itself. We can't really know where you got the idea.

Should something be fixed? Of course not! MATLAB already provides multiple solutions that yield the minimum norm of x when they are used.

A = [1 2 3;4 5 7];
b = [2;3];
A\b
ans = 3×1
   -1.0000
         0
    1.0000

So backslash produces a basic solution, as expected for the underdetermined case, with one element exactly zero. For a solution that is minimum norm in x, we already have:

lsqminnorm(A,b)
ans = 3×1
   -1.0571
    0.2857
    0.8286
pinv(A)*b
ans = 3×1
   -1.0571
    0.2857
    0.8286
lsqr(A,b)
lsqr converged at iteration 2 to a solution with relative residual 1.7e-14.
ans = 3×1
   -1.0571
    0.2857
    0.8286

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

John D'Errico el 6 de Ag. de 2022

@Carl Christian Kjelgaard Mikkelsen

The only place I can see where you claim the documentation says the solution is a minimum norm solution is here:

"The key is the use of the article "the" as in "the solution". It implies uniqueness. Therefore, the reader will believe that x=A\b minimizes 2-norm of the residual when A is tall matrix of full column rank and that x=A\b has the smallest 2-norm of all the solutions of Ax=b when A is a wide matrix of full row rank. Why? Because these two problems connected to the 2-norm have unique solutions."

NO place in the documentation for backslash has it said anything about minimum norm for x. You have made a jump to an unfounded conclusion from nothing more than their use of the word "THE", to decide that this implies your version of what you want to be true.

The problem is, there are certainly other ways to choose a UNIQUE solution. For example, why assume a minimum 2-norm?

And to jump to a conclusion from the inclusion of a three letter word in the documentation is a bit strange. Surely if they were going to SPECIFICALLY do something different for one shape matrix than for another shape matrix, they would have said something more than just an obique reference with the word "the"?

We see this line:

"If A is an M-by-N matrix with M < or > N and B is a column vector with M components, or a matrix with several such columns then X = A\B is the solution in the least squares sense to the under- or overdetermined system of equations A*X = B"

The phrase "In the least squares sense" means a solution that minimizes norm(A*X-b). It says nothing about norm(x).

Honestly, I think you are grasping at straws here, based only on the use of the word "the".

Bruno Luong el 6 de Ag. de 2022

" grasping at straws"

I disagree, he only grasping at the straw. ;-)

Iniciar sesión para comentar.

Backslash does not provided the solution with the smallest 2-norm

15 comentarios
Mostrar 13 comentarios más antiguosOcultar 13 comentarios más antiguos

Respuesta aceptada

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Más respuestas (2)

2 comentarios
Mostrar NingunoOcultar Ninguno

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Backslash does not provided the solution with the smallest 2-norm

15 comentarios Mostrar 13 comentarios más antiguosOcultar 13 comentarios más antiguos

Respuesta aceptada

12 comentarios Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Más respuestas (2)

2 comentarios Mostrar NingunoOcultar Ninguno

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

15 comentarios
Mostrar 13 comentarios más antiguosOcultar 13 comentarios más antiguos

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos