Curve fitting toolbox can return bogus results for 2 term exponential functions. Is this a bug?

5 visualizaciones (últimos 30 días)
I am fitting some test data using the curve fitting toolbox and the built in 2 term exponential form, f(x) = a*exp(b*x) + c*exp(d*x). For some reason, one of the test fits returns coeff's that cause the function to go to zero everywhere. However the curve and GOF data that is displayed does not jive with the reported results. If f(x)=0 everywhere, there is no way that R-square would be anywhere close to 1 given the original data.
Is this a bug that needs to be reported? Bad data? Please help!
The input vectors are as follows:
x=[0,12.5450000000000, 25.0900000000000, 54.3900000000000, 81.6900000000000, 109.090000000000, 138.790000000000, 139.090000000000, 139.090000000000, 163.890000000000, 195.390000000000, 202.390000000000]
y=[0, 0.00337831199999972, 0.688170198000000, 1.39894291200000, 2.42361711600000, 3.96215670600000, 7.38800638200000, 7.38893139600000 ,7.35868746000000, 11.7483615060000, 19.7050504080000, 23.7202546560000]

Respuesta aceptada

Walter Roberson
Walter Roberson el 2 de Nov. de 2022
Sum of exponentials is notoriously difficult to fit.
One of the problems is that you did not put bounds on your variables. When you have a*exp(b*x) + c*exp(d*x) then a and b can change places with c and d and you would have exactly the same sum. If a and c are different signs (not at all uncommon for this kind of fitting) then when you do not put in bounds, the fitting will not be able to tell which of the two is to be positive or negative, and so the error bounds will cross the entire range, and the fit will be useless. Any time you have terms with identical forms, you need to impose constraints to have any hope of getting a useful fit.
But even then, it is quite common that one of b or d goes towards negative infinity times sign(x) so that exp(d*x) goes to 0, nearly removing the term -- or that one of the two goes towards 0, making exp(d*x) go to 1, making c into nearly an additive term. Yes, in theory you can do better mathematically, but in practice the error often increases as you move away from one of those two positions. If your starting positions do not happen to fall inside the right range, then the exponential increase in error as you move towards the peak error lead the miniizers to move further from the actual best location . Starting values are crucial for the fitting techniques that are used by the curve fitting toolbox.
I have read that there exist fitting algorithms specifically for sum of exponentials, that should do a better job, but I have not researched those algorithms. Some kind of transform has to be used, I gather.
  4 comentarios
Matt Brown
Matt Brown el 3 de Nov. de 2022
I think you nailed it on that comment. I was only working in the toolbox and hadn't exported any fitobjects to check coefficients. I figured I would go check and when I re-ran the fit, I got a slightly different result. The a and c coefficients on the second attempt are 2 orders of magnitude less and do show some disagreement in the last decimal. I assume the same is true of b and d.
I wonder, is there a way to change the number format of the results window in cftool? It's a little annoying to not have any (obvious) control over that inside the toolbox.
Thanks for your help!
Walter Roberson
Walter Roberson el 3 de Nov. de 2022
Sorry, there is not much control over the provided apps. Sometimes if you dig into the source code for long enough you can come up with a usable code change, but most of the time it is a change in code that is needed, not a change in some setting. (Occasionally if you dig harder still, it is possible to figure out how to dig into the internals far enough to change some settings without changing the source code.)

Iniciar sesión para comentar.

Más respuestas (1)

John D'Errico
John D'Errico el 2 de Nov. de 2022
Editada: John D'Errico el 2 de Nov. de 2022
Is it bug? NO.
Is it due to poor starting values? Almost always, yes. At least, unless the curve is simply not well fit by a two term exponential.
You have a dozen data points, and you want to fit 4 parameters? Using exponentials? And you want to see good results? Sigh.
x=[0,12.5450000000000, 25.0900000000000, 54.3900000000000, 81.6900000000000, 109.090000000000, 138.790000000000, 139.090000000000, 139.090000000000, 163.890000000000, 195.390000000000, 202.390000000000];
y=[0, 0.00337831199999972, 0.688170198000000, 1.39894291200000, 2.42361711600000, 3.96215670600000, 7.38800638200000, 7.38893139600000 ,7.35868746000000, 11.7483615060000, 19.7050504080000, 23.7202546560000];
numel(x)
ans = 12
plot(x,y,'o')
When I look at that curve, I might bet that a single exponential will fit entirely reasonably. As such, if this next plot is a straight line, then it will be.
semilogy(x,y,'o')
So except for the VERY first data point, it virtually IS a straight line. And that means to fit a second term in that exponential fit, you have only ONE piece of data, maybe two, to support estimating that pair of coefficients.
Should you be even remotely surprised the two term fit looks strange to you, in the sense that one of those exponentials seemed to be nonsense? NO!!!!!
[mdl1,G1] = fit(x',y','exp1')
mdl1 =
General model Exp1: mdl1(x) = a*exp(b*x) Coefficients (with 95% confidence bounds): a = 0.5475 (0.441, 0.654) b = 0.01853 (0.0175, 0.01957)
G1 = struct with fields:
sse: 1.8182 rsquare: 0.9973 dfe: 10 adjrsquare: 0.9970 rmse: 0.4264
mdl1
mdl1 =
General model Exp1: mdl1(x) = a*exp(b*x) Coefficients (with 95% confidence bounds): a = 0.5475 (0.441, 0.654) b = 0.01853 (0.0175, 0.01957)
G1
G1 = struct with fields:
sse: 1.8182 rsquare: 0.9973 dfe: 10 adjrsquare: 0.9970 rmse: 0.4264
plot(mdl1,x,y,'ro')
So am I even REMOTELY surprised that R^2 is very near 1? WHY? THE FIT LOOKS QUITE GOOD, even for a 1-term exponential. Far too many people seem to be ruled by R^2. In my opinion, R^2 is slightly more valuable than a pile of rubbish, but not by a lot. If the curve appears to fit well when you plot it, don't worry about R^2.
This is not a bug in the curve fitting toolbox. It is a problem in your understanding of modeling and curve fitting. Can we try to fit a 2 term exponential? POSSIBLY. But ONLY if we use good starting values would there be much chance. And even then, again, you have WAY too little data. At least we have decent starting vlaues for the main term in the model, so I will use them, and then guess at the other term. (I tried a couple of times before I was satisfied with the results.) Your data is pretty much useless for that model, yet your expectations are really high. Double sigh.
[mdl2,G2] = fit(x',y','exp2','start',[0.54 0.018,-0.1,-0.01])
mdl2 =
General model Exp2: mdl2(x) = a*exp(b*x) + c*exp(d*x) Coefficients (with 95% confidence bounds): a = 0.5936 (0.4233, 0.7638) b = 0.01812 (0.01667, 0.01956) c = -0.6606 (-1.324, 0.002361) d = -0.01838 (-0.07988, 0.04312)
G2 = struct with fields:
sse: 1.0043 rsquare: 0.9985 dfe: 8 adjrsquare: 0.9979 rmse: 0.3543
plot(mdl2,x,y,'ro')
Is that result meaningful? I doubt it is worth much, since that second exponential term is literally based on about 1 data point. Note the width of the confidence bounds on parameters c and d. Do you see that even the sign of that second rate parameter is in question?
Why is it that everytime someone sees something they don't understand, it must be a bug? This just requires experience in curve fitting.
  2 comentarios
Matt Brown
Matt Brown el 2 de Nov. de 2022
Editada: Matt Brown el 2 de Nov. de 2022
Sigh? With all due respect, you don't have the context nor did you speak to my question. The issue I noted was that the curve that was plotted and for which statistics (R-square) were reported was not the curve defined by the coefficients that were returned by the toolbox. There is a discrepancy when you say f(x) works out to zero then show a curve that is not zero. This is what I thought was a bug. How would poor starting values result in this mathematically? I don't follow. If I missed this in your reply, I apologize. It was so long, and frankly rude, that I just gave up on it. You spent a lot of energy telling me why my understanding is wrong without the perspective necessary to offer that advice.
The fit you show looks fine, but the one I show in my screenshot has a=-c and b=d so that f(x)=a*exp(b*x)+c*exp(d*x) is actually a*exp(b*x)- a*exp(b*x)=0.
As it turns out I have a decent understanding of curve fitting and the limitations of fitting small data sets to functions with multiple coefficients. Unfortunately, I have hundreds of similar curves of data (most of which have 2x to 3x the points) to which I have to find fits which share the same form. I am in an exploratory phase, trying out different things to see what works and what doesn't and was seemingly faced with a contradiction when f(x)=0 was not zero in the plot. I just happened to be on the 2-term exponential when this issue arose. I have tried many different forms, both built into the toolbox and custom forms, including 1-term exponentials, and my personal favorite at this stage f(x)=exp(a*x)+b*x-1. It isn't perfect and it presented its own buggy issues which is why I was trying the built in 2-term exponential. (That’s a story for a different thread).
Matt Brown
Matt Brown el 2 de Nov. de 2022
After re-reading my original question, I see that I could have made the issue more clear. I apologize for any confusion that sprang from that.

Iniciar sesión para comentar.

Categorías

Más información sobre Descriptive Statistics en Help Center y File Exchange.

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by