Multivariate regression for constrained dependent variables
Mostrar comentarios más antiguos
I have a database formed by 6 independent variables and 2 dependent variables. The equation for each dependent variable should be something like:

I want to find βs that satisfy
.
. Which metodology should I use?
I tried using mvregress, but I couldn't constrain the dependent variables. I am also thinking of using optimization as well, but I still couldn't figure out how to constrain the dependent variables.
Thank you so much!
8 comentarios
And the condition y1j + y2j = 1 should hold for your complete database
x11j x21j ... x61j
x12j x22j ... x62j
?
Thus
(b11*x11j + b21*x21j + ... b61*x61j) + (b12*x12j + b22*x22j + ... + b62*x62j) = 1
for all j ?
Or do I misunderstand your regression problem ?
I still don't get your problem.
Is it correct that you want to estimate 12 parameters (the beta values) ?
Is it correct that you have row vectors [x11i,x12i,...,x16,i] , [x21i,x22i,...,x26,i] (1 <= i <= N) and you want to estimate the the 12 beta values such that
sum_i {x11i*beta11+x12i*beta12+...+x16i*beta16 + x21i*beta21+x22i*beta22+...+x26i*beta26 - 1}^2
is minimized ?
Umar
el 10 de Ag. de 2024
Hi @Torsten,
If the answer is yes to all the questions listed above by @Ismail, then in that case you can use you can use the least squares method to estimate the 12 beta values that will minimize the given expression.
Ismail
el 10 de Ag. de 2024
John D'Errico
el 10 de Ag. de 2024
It is actually a problem unlike others I can recall, in many years.
The point is though, you cannot use an equality constraint on the sum of y1+y2, since you will have N data points. It will not be just ONE constraint, but one constraint for each data point.
This leaves you with too few degrees of freedom to solve the problem. It is why I showed how to implement it in the form of a goal, where you try to make the sum as close to 1 as possible.
That there is no tool out there to handle it directly speaks to the uncommon nature of your problem.
Umar
el 11 de Ag. de 2024
Hi @Ismail,
To address your comments regarding, “ Ah I see, so I don't need to put any constrained values in that case, right?Also, does that mean that there is no function that could deliberately put dependent variables as the constrains, right?”
You are correct that if you choose not to impose constraints on your beta values, you will allow for greater flexibility in fitting your model. However, if you were to impose equality constraints on dependent variables (like ensuring their sum equals a certain value), you would indeed face challenges. As you noted, with N data points, this would generate multiple constraints—one for each data point—thereby reducing your degrees of freedom significantly. Now, in situations where there are many data points (N), applying equality constraints can lead to an over-constrained system. Each constraint would reduce your available degrees of freedom (the number of independent pieces of information you have), which can hinder your ability to find a unique solution for the beta values. Given that traditional methods may not accommodate your specific needs effectively, consider reformulating your problem as an optimization goal rather than applying strict equality constraints. For instance, you could frame it as minimizing the difference from 1 while allowing for some tolerance—this can be expressed as a penalty function or through regularization techniques. In addition, if overfitting is a concern or if you wish to incorporate some form of control over the estimated parameters, consider using regularization techniques such as Lasso or Ridge regression. I would suggest you should also take @ John D'Errico’s advice into account as well.
John D'Errico
el 11 de Ag. de 2024
Editada: John D'Errico
el 11 de Ag. de 2024
A ridge regression as suggested by @Umar may be a great idea here. Classically, ridge regression is a technique to bias a regression estimator, pushing the estimated coefficients towards zero for a singular or nearly singular problem. But you can also use different bias forms. One common one I have used is to bias coefficients towards smoothness, in the context of a curve fitting problem (think about splines, or low order piecewise functions here.)
In your context, I would build a scheme to bias the coefficients such that the expression (y1hat+y2hat-1) is always small. That is, write the target expression as a quadratic form. The nice thing is, you could implement that scheme without need to use a nonlinear tool like lsqnonlin. Backslash would be sufficient in the end if you do it carefully.
Respuesta aceptada
Más respuestas (2)
Define a Nx12 matrix A as
A = [x11,x12,x13,x14,x15,x16,x21,x22,x23,x24,x25,x26]
and a Nx1 vector b as
b = ones(N,1)
and solve for beta = [beta11;beta12;beta13;beta14;beta15;beta16;beta21;beta22;beta23;beta25;beta26] by using
beta = A\b
If you also have values for the y, your matrix A becomes 3*N x 12 as
A = [x11,x12,x13,x14,x15,x16,x21,x22,x23,x24,x25,x26;x11,x12,x13,x14,x15,x16,zeros(N,6);zeros(N,6),x21,x22,x23,x24,x25,x26];
your vector b becomes 3*N x 1 as
b = [ones(N,1);y1;y2]
and you solve for beta = [beta11;beta12;beta13;beta14;beta15;beta16;beta21;beta22;beta23;beta25;beta26] by using
beta = A\b
If it's the latter case you have to consider, you can also give weights to the three different error components as @John D'Errico does in his suggestion (alpha1, alpha2, alpha3).
Sorry. You CANNOT insure that y1+y2 == 1, EXACTLY. At least not if there is any noise in your data.
All you can do is to use it as a target. And then that target will conflict with your goal of minimizing the errors in y1 and y2. As such, you might set it up as a problem of the form
alpha1*norm(1 - y1hat - y2hat)^2 + alpha2*norm(y1 - y1hat)^2 + alpha3*norm(y2 - y2hat)^2
where y1hat and y2hat are the predictions from the separate regression models, and alpha1, alpha2, and alpha3 are parameters you must choose to tradeoff the residuals from each piece of the problem.
An important point I want to make is that the lack of a constant term in your models may well be a significant problem.
Anyway, how might you do this? I suppose you could solve it using lsqnonlin, though, since the models do have parameters that enter in linearly, it would seem a linear modeling tool must be an option too.
Let me see if I can give an example. I'll need to make up some data, since you have provided none that I see.
N = 50;
X1 = randn(N,6);
X2 = rand(N,6);
Y1 = 0.4 + randn(N,1)/100;
Y2 = 0.6 + randn(N,1)/100;
As you can see, the data does have the property that y1+y2 should be 1, but for the noise in the system.
If I model each process separately, I get this:
B1 = X1\Y1;
B2 = X2\Y2;
[B1,B2]
It is just random crap data, so I really don't care about how well the models fit. I expect the models will be pretty poor, in fact.
Y1hat = X1*B1;
Y2hat = X2*B2;
[norm(Y1 - Y1hat),norm(Y2 - Y2hat)]
But what happens with the predictions? As done separately, how well does it do?
norm(1 - (Y1 + Y2))
norm(1 - (Y1hat + Y2hat))
So, with no target put on the parallel models, things actually got WAY worse in terms of the sum of the predictions being 1. Again, the lack of a constant term in your model is probably a huge factor here.
Anyway, having built the models separately, can we formulate this as a regression model, but with the sum target built in?
alph = [1/3 1/3 1/3];
B12start = randn(12,1);
[B12est,~,~,exitflag] = lsqnonlin(@(B12) obj(B12,X1,X2,Y1,Y2,alph),B12start);
reshape(B12est,6,2)
Y1hatest = X1*B12est(1:6);
Y2hatest = X2*B12est(7:12);
[norm(Y1 - Y1hatest),norm(Y2 - Y2hatest)]
norm(1 - (Y1hatest + Y2hatest))
As you can see, the models are a little different. The coefficients have changed, but not by a lot. But now the target for the sum of the two models has had its error reduced.
function res = obj(B12,X1,X2,Y1,Y2,alph)
Y1hat = X1*B12(1:6);Y2hat = X2*B12(7:12);
res = [alph(1)*(1 - (Y1hat + Y2hat));alph(2)*(Y1-Y1hat);alph(3)*(Y2-Y2hat)];
end
Categorías
Más información sobre Linear Least Squares en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

