Fittype and fit do not work as expected (in a curve fitting)

78 visualizaciones (últimos 30 días)
Sim
Sim el 30 de Oct. de 2024 a las 13:42
Comentada: Sim el 31 de Oct. de 2024 a las 10:31
Let's consider the following curve fitting of a set of data, by using the fittype and fit functions:
% inputs
x = [1 4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [0 335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];
% curve fitting
ft = fittype('a*(x^n)','dependent','y','independent','x');
[fitresult, ~] = fit(x',y',ft,'Startpoint', [0 0]);
% plots
hold on
p1 = plot(x,y,'o-','LineWidth',1.5,'color','b','DisplayName','inputs');
p2 = plot(x,(fitresult.a .* (x.^fitresult.n)),'-','LineWidth',1.5,'color','k','DisplayName','best fit');
set(gca, 'XScale', 'log', 'YScale', 'log');
legend([p1 p2])
However if I perform the curve fitting with EzyFit, I get the following (correct) result, as expected:
Why don't fittype and fit functions work correctly in this example?
Did I use those two functions in a wrong way?

Respuesta aceptada

John D'Errico
John D'Errico el 30 de Oct. de 2024 a las 14:12
Editada: John D'Errico el 30 de Oct. de 2024 a las 14:23
x = [1 4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [0 335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];
Look CAREFULLY at your data!
What is y(1)? Do you see that it is tremendously out of character, inconsistent with the rest of your data?
y(1)
ans = 0
Of course, when you plot it on a log scale, MATLAB does not plot that point, because log(0) is undefined. (-inf if you insist.) HOWEVER, when you do a fit, fit sees it!
And what that does is completely drag your curve around. y(1) = 1000 or larger might be more consistent. But not zero.
ft = fittype('a*(x^n)','dependent','y','independent','x');
Next, starting ANY optimization at [0,0] is a HIGHLY, HUGELY dangerous thing to do.
[fitresult, ~] = fit(x(2:end)',y(2:end)',ft,'Startpoint', [1 -1])
fitresult =
General model: fitresult(x) = a*(x^n) Coefficients (with 95% confidence bounds): a = 1.222e+04 (4056, 2.038e+04) n = -2.596 (-3.072, -2.12)
p1 = plot(x,y,'o-','LineWidth',1.5,'color','b','DisplayName','inputs');
hold on
p2 = plot(x,(fitresult.a .* (x.^fitresult.n)),'-','LineWidth',1.5,'color','k','DisplayName','best fit');
set(gca, 'XScale', 'log', 'YScale', 'log');
legend([p1 p2])
hold off
Is the fit very good, when plotted on a log scale? Well, not really. But it never is a good idea to do this sort of thing. Why not? Because MATLAB will not worry about TINY deviations in those lower points, even though in proportional terms, it will be very obvious on a log scale.
What you should see is the fit fits the first two points quite well, and then misses the rest. But again, think about what a log scale does!!!!!
fit assumes an additive error model, with the errors presumed to be homogeneous, and additive. That is a bad thing when your dats spans 4 or 5 powers of 10. It would be far smarter to log your data and your model, THEN do the fit. Now instead of proportional error, you will have additive error. See how nice it gets?
When you log your model, the problem becomes trivial.
Your model: y = a*x^n * (proportional noise)
Logged model: log(y) = log(a) + n*log(x) + (additive homogeneous noise)
We still need to drop the bogus first data point of course. But now your model is just a linear 'poly1' model.
mdl = fit(log(x(2:end)'),log(y(2:end)'),'poly1')
mdl =
Linear model Poly1: mdl(x) = p1*x + p2 Coefficients (with 95% confidence bounds): p1 = -2.292 (-2.525, -2.059) p2 = 9.469 (8.431, 10.51)
Don't forget to exponentiate the coefficient out front!
a = exp(mdl.p2)
a = 1.2953e+04
n = mdl.p1
n = -2.2920
Or, we could use fittype
ft = fittype('log(a) + n*log(x)','dependent','y','independent','x');
Note that I logged x inside fittype, but not y.
[fitresult, ~] = fit(x(2:end)',log(y(2:end)'),ft,'Startpoint', [1 -1])
fitresult =
General model: fitresult(x) = log(a) + n*log(x) Coefficients (with 95% confidence bounds): a = 1.295e+04 (-488, 2.639e+04) n = -2.292 (-2.525, -2.059)
And you see this generates the same result for the parameters.
  8 comentarios
John D'Errico
John D'Errico el 31 de Oct. de 2024 a las 9:23
Bound constraints in optimization can sometimes be a problem. As with all constraints, they must employ a tolerance, such that the constraint can sometimes be exceeded by a tiny amount, the constraint tolerance. And while that amount can seem tiny, it is highly important.
For example, suppose you decide to compute the log(x), and you never wish to see a complex result? Is it acceptable to have any negative value of x?
log(-1e-300)
ans = -6.9078e+02 + 3.1416e+00i
The point is, a function like a log or sqrt can never accept ANY negative value at all as an argument, as you will see a complex result.
But when you set a lower bound of 0 for some parameter x, the optimization tool will allow that constraint to be exceeded by the constraint tolerance. And you cannot set the constraint tolerance to be exactly 0.
As I said, when working with logs, constraints can be problematic. You will need to set the lower bound high enough that the lower bound minus the constraint tolerance does not go below zero.
Sim
Sim el 31 de Oct. de 2024 a las 10:31
Very interesting! Yes working with logs can be messy sometimes.. To avoid as much as possible possible troubles, I am following the "path" with the weights that you showed me, thanks a lot :-)

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Linear and Nonlinear Regression en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by