Multivariate regression for constrained dependent variables

I have a database formed by 6 independent variables and 2 dependent variables. The equation for each dependent variable should be something like:
I want to find βs that satisfy .
Which metodology should I use?
I tried using mvregress, but I couldn't constrain the dependent variables. I am also thinking of using optimization as well, but I still couldn't figure out how to constrain the dependent variables.
Thank you so much!

8 comentarios

Torsten
Torsten el 10 de Ag. de 2024
Editada: Torsten el 10 de Ag. de 2024
And the condition y1j + y2j = 1 should hold for your complete database
x11j x21j ... x61j
x12j x22j ... x62j
?
Thus
(b11*x11j + b21*x21j + ... b61*x61j) + (b12*x12j + b22*x22j + ... + b62*x62j) = 1
for all j ?
Or do I misunderstand your regression problem ?
Ismail
Ismail el 10 de Ag. de 2024
Editada: Ismail el 10 de Ag. de 2024
Do you mean about the current situation of my data?
If that is your question then yes, currently y1j, y2j share the same x1j x2j ... x6j. So, for example if x(j) = [4 5 2 6 4 1] and y(j) = [0.4 0.6], then the equations would be
y1j = 4*β11 + 5*β21 + 2*β31 + 6*β41 + 4*β51 + 1*β61 = 0.4
y2j = 4*β12 + 5*β22 + 2*β32 + 6*β42 + 4*β52 + 1*β62 = 0.6
Thus, y1j+y2j=1.
Edit: Yes, your explanation is right. Currently the database is all like that.
Torsten
Torsten el 10 de Ag. de 2024
Editada: Torsten el 10 de Ag. de 2024
I still don't get your problem.
Is it correct that you want to estimate 12 parameters (the beta values) ?
Is it correct that you have row vectors [x11i,x12i,...,x16,i] , [x21i,x22i,...,x26,i] (1 <= i <= N) and you want to estimate the the 12 beta values such that
sum_i {x11i*beta11+x12i*beta12+...+x16i*beta16 + x21i*beta21+x22i*beta22+...+x26i*beta26 - 1}^2
is minimized ?
Hi @Torsten,
If the answer is yes to all the questions listed above by @Ismail, then in that case you can use you can use the least squares method to estimate the 12 beta values that will minimize the given expression.
Yes, the ideal result would be:
Hi @Umar,
Ah I see, so I don't need to put any constrained values in that case, right?
Also, does that mean that there is no function that could deliberately put dependent variables as the constrains, right?
It is actually a problem unlike others I can recall, in many years.
The point is though, you cannot use an equality constraint on the sum of y1+y2, since you will have N data points. It will not be just ONE constraint, but one constraint for each data point.
This leaves you with too few degrees of freedom to solve the problem. It is why I showed how to implement it in the form of a goal, where you try to make the sum as close to 1 as possible.
That there is no tool out there to handle it directly speaks to the uncommon nature of your problem.
Hi @Ismail,
To address your comments regarding, “ Ah I see, so I don't need to put any constrained values in that case, right?Also, does that mean that there is no function that could deliberately put dependent variables as the constrains, right?”
You are correct that if you choose not to impose constraints on your beta values, you will allow for greater flexibility in fitting your model. However, if you were to impose equality constraints on dependent variables (like ensuring their sum equals a certain value), you would indeed face challenges. As you noted, with N data points, this would generate multiple constraints—one for each data point—thereby reducing your degrees of freedom significantly. Now, in situations where there are many data points (N), applying equality constraints can lead to an over-constrained system. Each constraint would reduce your available degrees of freedom (the number of independent pieces of information you have), which can hinder your ability to find a unique solution for the beta values. Given that traditional methods may not accommodate your specific needs effectively, consider reformulating your problem as an optimization goal rather than applying strict equality constraints. For instance, you could frame it as minimizing the difference from 1 while allowing for some tolerance—this can be expressed as a penalty function or through regularization techniques. In addition, if overfitting is a concern or if you wish to incorporate some form of control over the estimated parameters, consider using regularization techniques such as Lasso or Ridge regression. I would suggest you should also take @ John D'Errico’s advice into account as well.
John D'Errico
John D'Errico el 11 de Ag. de 2024
Editada: John D'Errico el 11 de Ag. de 2024
A ridge regression as suggested by @Umar may be a great idea here. Classically, ridge regression is a technique to bias a regression estimator, pushing the estimated coefficients towards zero for a singular or nearly singular problem. But you can also use different bias forms. One common one I have used is to bias coefficients towards smoothness, in the context of a curve fitting problem (think about splines, or low order piecewise functions here.)
In your context, I would build a scheme to bias the coefficients such that the expression (y1hat+y2hat-1) is always small. That is, write the target expression as a quadratic form. The nice thing is, you could implement that scheme without need to use a nonlinear tool like lsqnonlin. Backslash would be sufficient in the end if you do it carefully.

Iniciar sesión para comentar.

 Respuesta aceptada

Umar
Umar el 10 de Ag. de 2024
Hi @Ismail,
I would suggest utilizing a constrained optimization technique approach and formulate the problem as an optimization task where you minimize a cost function subject to equality constraints on the dependent variables. To start working on this process, first define your data by generating random data for independent variables X and dependent variables Y.
X = randn(10, 6); % Independent variables (10 samples, 6 features)
Y = randn(10, 2); % Dependent variables (10 samples, 2 outputs)
Then, define the equation constraint by specifying the coefficients matrix Aeq and the right-hand side beq of the equation constraint.
Aeq = [1, 2, 3, 4, 5, 6]; % Coefficients matrix for the equation constraint
beq = 10; % Right-hand side of the equation constraint
Afterwards, define the objective function by creating a function that represents the error you want to minimize. In this case, it can be the least squares error between the actual Y and the predicted values based on X and beta.
fun = @(beta) norm(Y - X*beta)^2;
Now, provide an initial guess for the coefficients beta.
beta0 = zeros(6, 1); % Initial guess for beta (6 coefficients)
Use fmincon function for constrained optimization with the specified equality constraint.
options = optimoptions('fmincon', 'Algorithm', 'interior-point');
beta = fmincon(fun, beta0, [], [], Aeq, beq, [], [], [], options);
Display optimal beta values by outputting the optimal values of beta that satisfy the equation constraint.
disp('Optimal beta values:');
disp(beta);
By implementing this method in your code, you should be able to use constrained optimization to find the βs that satisfy your equation constraint while considering the dependencies between the variables. This methodology also allows you to optimize the coefficients under the specified constraints, providing a solution to your problem. Please let me know if you have any further questions.

6 comentarios

Ismail
Ismail el 10 de Ag. de 2024
Editada: Ismail el 10 de Ag. de 2024
Thank you for your answer!
I’m still confused about the Aeq and beq ones, since your example put: Aeq = [1, 2, 3, 4, 5, 6]; % Coefficients matrix for the equation constraint
beq = 10; % Right-hand side of the equation constraint
As far as I understand, we’re creating a function of beta where X and Y are the constants right? If that’s the case then that’d make the Aeq and beq become: 1.β1+2.β2+3.β3+4.β4+5.β5+6.β6=10
And that makes me a bit confused. Can you help me to explain why you suggested that?
Hi @Ismail,
We can all agree that when setting up an optimization problem with equality constraints, the coefficients matrix Aeq defines the linear combination of the decision variables (in this case, the coefficients beta) that must sum up to the specified right-hand side beq and the equation represented by Aeq and beq constrains the feasible solutions to lie on a hyperplane defined by the coefficients matrix. In my provided example,
Aeq = [1, 2, 3, 4, 5, 6]; % Coefficients matrix for the equation constraint
beq = 10; % Right-hand side of the equation constraint
This equation constraint can be interpreted as: 1 * β1 + 2 * β2 + 3 * β3 + 4 * β4 + 5 * β5 + 6 * β6 = 10
and represents a specific relationship that the coefficients beta need to satisfy in addition to minimizing the least squares error between the actual Y and the predicted values based on X and beta. So, the Aeq and beq values are not arbitrary but are derived from the problem's constraints and requirements and by incorporating the equation constraint defined by Aeq and beq, you are guiding the optimization algorithm to search for solutions that not only minimize the error function but also adhere to the specified linear relationship among the coefficients which mkes sure that the optimized coefficients beta satisfy both the error minimization objective and the defined equality constraint.Therefore, the inclusion of Aeq and beq in the optimization problem provides a structured approach to finding solutions that meet all specified conditions, leading to a more robust and tailored optimization outcome. I hope this detailed explanation clarifies the role of Aeq and beq in constrained optimization and their importance in ensuring that the solution align with the defined constraints. If you still have any further questions or need additional clarification, please feel free to ask.
Thank you for your answer!
May I know where the numbers come from?
Hi @ Ismail,
To address your query regarding, “ May I know where the numbers come from?”
I manually defined these values used in the example to demonstrate the process of setting up an optimization problem with equality constraints. In a real-world scenario, you would replace these randomly generated numbers with your own data or values based on your specific problem. I hope this clarifies the origin of the numbers used in the example. If you have any further questions or need additional clarification, please feel free to ask.
Thank you for your answer!
That explains a lot!
Hi @Ismail,
Have you noticed the comments shared by @Torsten and @ John D'Errico, they have provided excellent examples to your queries. Please let us know if all your questions have been answered. Please don’t forget to click “accept answer” and vote for my friends. They have years of experience and have helped tackle lot of more complex problems than this one.

Iniciar sesión para comentar.

Más respuestas (2)

Torsten
Torsten el 10 de Ag. de 2024
Editada: Torsten el 10 de Ag. de 2024
Define a Nx12 matrix A as
A = [x11,x12,x13,x14,x15,x16,x21,x22,x23,x24,x25,x26]
and a Nx1 vector b as
b = ones(N,1)
and solve for beta = [beta11;beta12;beta13;beta14;beta15;beta16;beta21;beta22;beta23;beta25;beta26] by using
beta = A\b
If you also have values for the y, your matrix A becomes 3*N x 12 as
A = [x11,x12,x13,x14,x15,x16,x21,x22,x23,x24,x25,x26;x11,x12,x13,x14,x15,x16,zeros(N,6);zeros(N,6),x21,x22,x23,x24,x25,x26];
your vector b becomes 3*N x 1 as
b = [ones(N,1);y1;y2]
and you solve for beta = [beta11;beta12;beta13;beta14;beta15;beta16;beta21;beta22;beta23;beta25;beta26] by using
beta = A\b
If it's the latter case you have to consider, you can also give weights to the three different error components as @John D'Errico does in his suggestion (alpha1, alpha2, alpha3).
Sorry. You CANNOT insure that y1+y2 == 1, EXACTLY. At least not if there is any noise in your data.
All you can do is to use it as a target. And then that target will conflict with your goal of minimizing the errors in y1 and y2. As such, you might set it up as a problem of the form
alpha1*norm(1 - y1hat - y2hat)^2 + alpha2*norm(y1 - y1hat)^2 + alpha3*norm(y2 - y2hat)^2
where y1hat and y2hat are the predictions from the separate regression models, and alpha1, alpha2, and alpha3 are parameters you must choose to tradeoff the residuals from each piece of the problem.
An important point I want to make is that the lack of a constant term in your models may well be a significant problem.
Anyway, how might you do this? I suppose you could solve it using lsqnonlin, though, since the models do have parameters that enter in linearly, it would seem a linear modeling tool must be an option too.
Let me see if I can give an example. I'll need to make up some data, since you have provided none that I see.
N = 50;
X1 = randn(N,6);
X2 = rand(N,6);
Y1 = 0.4 + randn(N,1)/100;
Y2 = 0.6 + randn(N,1)/100;
As you can see, the data does have the property that y1+y2 should be 1, but for the noise in the system.
If I model each process separately, I get this:
B1 = X1\Y1;
B2 = X2\Y2;
[B1,B2]
ans = 6x2
0.0587 0.3051 -0.0032 0.1956 0.0102 0.1781 0.0732 0.1198 0.0657 0.2308 -0.0492 0.1005
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
It is just random crap data, so I really don't care about how well the models fit. I expect the models will be pretty poor, in fact.
Y1hat = X1*B1;
Y2hat = X2*B2;
[norm(Y1 - Y1hat),norm(Y2 - Y2hat)]
ans = 1x2
2.6661 1.0332
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
But what happens with the predictions? As done separately, how well does it do?
norm(1 - (Y1 + Y2))
ans = 0.1054
norm(1 - (Y1hat + Y2hat))
ans = 3.0541
So, with no target put on the parallel models, things actually got WAY worse in terms of the sum of the predictions being 1. Again, the lack of a constant term in your model is probably a huge factor here.
Anyway, having built the models separately, can we formulate this as a regression model, but with the sum target built in?
alph = [1/3 1/3 1/3];
B12start = randn(12,1);
[B12est,~,~,exitflag] = lsqnonlin(@(B12) obj(B12,X1,X2,Y1,Y2,alph),B12start);
Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance.
reshape(B12est,6,2)
ans = 6x2
0.0599 0.4327 -0.0137 0.2133 0.0158 0.2106 0.0582 0.1975 0.0538 0.3339 -0.0636 0.0829
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Y1hatest = X1*B12est(1:6);
Y2hatest = X2*B12est(7:12);
[norm(Y1 - Y1hatest),norm(Y2 - Y2hatest)]
ans = 1x2
2.6715 1.6373
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
norm(1 - (Y1hatest + Y2hatest))
ans = 2.1121
As you can see, the models are a little different. The coefficients have changed, but not by a lot. But now the target for the sum of the two models has had its error reduced.
function res = obj(B12,X1,X2,Y1,Y2,alph)
Y1hat = X1*B12(1:6);Y2hat = X2*B12(7:12);
res = [alph(1)*(1 - (Y1hat + Y2hat));alph(2)*(Y1-Y1hat);alph(3)*(Y2-Y2hat)];
end

Productos

Versión

R2024a

Preguntada:

el 10 de Ag. de 2024

Editada:

el 11 de Ag. de 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by