Hello, I'd like to find a transformation of a set of vectors X that maps to known vectors Y. This can be expressed as a standard least square optimization problem, i.e min |WX - Y|^2 where W is the transformation matrix I'm looking for.
Additionally, I would like to minimize the L2 norm of the matrix W so the full minimization problem becomes : min |W|^2 + |WX - Y|^2
This problem can be solved with gradient descent and I was just wondering which function from the Matlab optimization tool I should use ?
Thanks!

4 comentarios

Beverly Grunden
Beverly Grunden el 26 de Feb. de 2018
I have a similar question: I want argmin_X { |WX-Y|^2 + |X|_1 }. In case, my notation is confusing, I want to find the X that minimizes the square of the 2-norm of (WX-Y) with a regularization of the 1-norm of X. W is m x n, Y is m x 1 is How to do this in Matlab? Thank you!
John D'Errico
John D'Errico el 26 de Feb. de 2018
Editada: John D'Errico el 26 de Feb. de 2018
I think your notation seems spot on. But your goal? Sorry, but this seems to be a bit silly as a goal. I do accept that you can choose any goal that you want, but why this?
By having a mixed problem, thus minimizing the 2 norm of (W*X-Y), combined with minimizing the 1-norm of X, you want to live in a world where nothing will be simple to write and solve, when solving the very similar problem wherein the regularization is on the 2-norm of X is trivial to solve.
You would need to formulate this as a general nonlinear optimization, with the caveat that due to the 1-norm, you will have a problem that is non-differentiable in the parameters. So given use of optimization tools like fminunc (or whatever), the optimization may run poorly, and may fail to converge. It will probably be dependent on starting values. While admittedly you can use the 2-norm solution to provide what may be decent starting values, if they are that good, then why bother with this at all?
I don't see much benefit in such a problem formulation. Seemingly little to gain, for increased complexity, convergence issues, starting values, etc. And when you can write the fully L2 norm problem in essentially one line of code that has no issues, why?
You can view a regularized linear least squares as a biased solution, where you have an a-priori belief that the solution has a small norm. (I am taking a somewhat reluctant Bayesian stance here.) Typically this is only important for problems with rank deficient or a somewhat ill-posed system, as then the ill-posed nature will tend to inflate any noise in the data, giving you garbage for results. The regularizer will tend to bias the results towards what you want to see. Such regularization is hugely valuable, in fact I use it to great benefit on many problems. But that regularization is a very ad hoc choice, as well as exactly how much bias to use. So in this context, insisting that the regularizer would be based on the 1-norm just seems strange.
Beverly Grunden
Beverly Grunden el 27 de Feb. de 2018
I appreciate your comments and suggestions. Unfortunately I did not choose this minimization problem. The one norm is intended to force sparsity on the vector components. I am attempting to replicate what the authors of a certain article did and then make a comparison to my novel method.
Thanks for your input. I will explore your ideas.
John D'Errico
John D'Errico el 27 de Feb. de 2018
You can obviously do as you wish, even if it seems absurd to me.
You will have little choice but to use an optimization, because of the mixed norms. Thus, were it entirely an L1 norm, you can solve the problem using linprog. Were it entirely an L2 norm, you can just use backslash. The hybrid? Just a brute force optimization, and as I said, it may exhibit some issues at that.

Iniciar sesión para comentar.

 Respuesta aceptada

John D'Errico
John D'Errico el 2 de Jun. de 2011
Editada: John D'Errico el 10 de Mzo. de 2017

1 voto

Using an nonlinear optimizer to solve this is overkill and completely silly, when a linear solver does it for you. Besides, with an optimizer you need to worry about convergence issues, starting values, etc.
Append an identity matrix to X, in this case as extra columns of X. Also append zeros to y. Then use slash (i.e /) to solve the problem.
W = [Y,zeros(1,N)]/[X,eye(N)];
This solves the regularized problem, where N is the number of rows in X. This solves the problem you have, with no optimizer needed at all. See that when [X,ones(N)] has more columns than rows, it solves the least squares problem you are asking to solve.
doc slash
help slash
As an example,
For example, make up some data:
X = rand(10,20);
Y = rand(1,20);
We wish to solve the problem
W*X = Y
This is easily accomplished using slash, if we did no regularization at all.
u = Y/X
u =
-0.10061 -0.039857 -0.087301 0.14731 0.40622 0.36113 -0.31992 0.32343 0.071708 0.16989
With regularization, thus solving the penalized sum of squares, we do it as:
u = [Y,zeros(1,10)]/[X,eye(10)]
u =
-0.013147 0.042605 0.0085151 0.089504 0.24349 0.1602 -0.11616 0.20876 0.14126 0.13649
The solution has been biased towards zero, as we should expect.

2 comentarios

Shankararama Sharma R
Shankararama Sharma R el 10 de Mzo. de 2017
This gives a W with dimension 1*(M+N). But we should only get 1*M, right?
John D'Errico
John D'Errico el 10 de Mzo. de 2017
Editada: John D'Errico el 10 de Mzo. de 2017
It was a long time ago that I wrote that. :)
As it turns out, I was sloppy, and wrote the wrong expression, I forgot to append zeros to Y. I've fixed that, as well as adding an example. As you can see, the solution has the correct number of values.
You are mistaking the solution for one where you solve the problem
X*W = Y
Here, you would use BACKSLASH. Thus
W = X\Y
or with the simple ridge regression regularization employed here, that becomes
W = [X;eye(N)]\[Y;zeros(N,1)]
Note the differences. In this case, I append X as rows to X, also zero rows to Y.
It is important whether you are using / or \ to solve the problem. In turn, that depends on which form your original problem was in.

Iniciar sesión para comentar.

Más respuestas (1)

Laura Proctor
Laura Proctor el 2 de Jun. de 2011

1 voto

The slash operator is exactly what you need to solve your system of equations.
For example, you have indicated that you have the equation WX = Y where X and Y are given. To arrive at the least-squares fit for an overdetermined system, MATLAB has the algorithm built-in to the forward slash operator, so that you simply need to type in the following line:
W = Y/X
Often the system of equations is written as Ax = b where one must solve for x --- in this case use the backslash operator:
x = A\b
If you are interested in the optimization functions available, check out this page which shows all of the functions available in base MATLAB:
Optimization Toolbox and Global Optimization Toolbox also have many other functions available.

1 comentario

John D'Errico
John D'Errico el 2 de Jun. de 2011
Note that slash by itself fails to solve the problem given by the OP. You must augment the system with an identity to introduce a term consistent with the square of the norm of X.

Iniciar sesión para comentar.

Preguntada:

al
el 2 de Jun. de 2011

Comentada:

el 27 de Feb. de 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by