R2 with a loglog plot

Hi everyone.
I am quite new to matlab and I'd like to add a R2 to my loglog plot. I've seen some solution from few other posts, but none really does the job. My code is really simple:
bx= figure;
set(bx,'visible', 'on');
f = fit(x,y,'power1');
loglog(x,y,'O');
hold on;
plot(f);
[...]
the result look like that:
So far I haven't find any way of determining the r2: That post (https://uk.mathworks.com/matlabcentral/answers/182998-r-squared-value-for-fitted-line) overestimates my r2, and this bit of code found somewhere on the ML forum as well:
ba = [x,ones(size(x))]\y;
ypred = ba(1)*x + ba(2);
SSE=sum((y-ypred).^2);
SST=sum((y-mean(y)).^2);
Rsq = 1 - (SSE/SST);
I also tried that way, but I think it works fine only for linear distributions:
X = [ones(length(x),1) x];
b = X\y;
yCalc = X*b;
r = 1 - sum((y - yCalc).^2)/sum((y - mean(y)).^2);
Thank you very much for your help :)
Flo
PS: the the r2 on excel is equal to 0.9597

Respuestas (1)

dpb
dpb el 8 de Ag. de 2016

1 voto

I see nothing wrong with Star's Answer nor follow-up comment. To compute SSE from such a model requires evaluating the residual of the fit in original metric, not in log space.
But, fit returns a cfit for curves (your case) and some additional optional outputs, the second of which is a goodness-of-fit structure, gof in the documentation. Fields in gof include
sse - Error SSquares
R2 - Coefficient of determination ("raw" R-square)
adjustedR2 - DOF-adjusted R-square
stdError - RMS (or "standard") error
so to save yourself some effort, use it...
[f,gof] = fit(x,y,'power1');

7 comentarios

Flo
Flo el 9 de Ag. de 2016
Thank you for your help. From my data, the r2 and adjusted r2 are similar, and very very high: ~0.993.
Also, the Excel result for the same dataset is 0.95. Would you have any idea why such a difference between the two methods ?
Flo
dpb
dpb el 9 de Ag. de 2016
W/o the data, no, not really. Hmmm...I wonder. Check on SSE, SSQ; does the Matlab routine return the model values or the underlying values from log()? If you don't compute over the real values, but over SS(log(x0)) instead R-sq estimates will be inflated. Wouldn't think that'd be so, but w/o data to check can only hypothesize.
Flo
Flo el 12 de Ag. de 2016
Hi,
Thank you for your answer. I am unsure about what to do with that as I am far from being an stat expert !
may I give you some data here, that give me a r2 >0.99.
Again: thank you for you help :)
Flo
Star Strider
Star Strider el 12 de Ag. de 2016
The ‘problem’ — if there is one — appears to be in your data (that are very close to being linear) and have a very wide range. In the ‘Rsq’ calculation, particularly the ‘SST’ calculation, note that the mean is very sensitive to extreme values, and your data have extreme values. (To experiment with this, compare the mean and median of your data.) The result of this is that ‘SSE’ is relatively low (with a good fit), and ‘SST’ will be relatively high, leading to a very high ‘Rsq’ value.
x = [0.737543298694378
0.110045297095657
0.0434319211297629
0.0239808153477218
0.0189181987743139
0.0165201172395417
0.0101252331468159
0.00746069810818012
0.00452970956568079
0.00346389555022649
0.00319744204636291
0.00479616306954436
0.00186517452704503
0.00213162803090861
0.00133226751931788];
y = [0.752928647497338
0.116879659211928
0.0388711395101172
0.0194355697550586
0.0133120340788072
0.0103833865814696
0.00692225772097977
0.00878594249201278
0.00532481363152290
0.00399361022364217
0.00505857294994675
0.00159744408945687
0.00292864749733759
0.00159744408945687
0.00159744408945687];
yfit = @(b,x) exp(b(1)) .* x.^b(2); % Power Function
SSECF = @(b) sum((yfit(b,x) - y).^2); % Sum-Squared-Error Cost Function
B = fminsearch(SSECF, [1; 1]);
ypred = yfit(B,x);
SSE=sum((y-ypred).^2);
SST=sum((y-mean(y)).^2);
Rsq = 1 - (SSE/SST);
xplot = linspace(min(x), max(x));
figure(1)
plot(x, y, 'bp')
hold on
plot(xplot, yfit(B,xplot), '-r')
hold off
grid
dpb
dpb el 12 de Ag. de 2016
Editada: dpb el 12 de Ag. de 2016
Very good observations, IA...I wondered if perhaps the "problem" was the curve fit in Excel excluded the one real outlier, the first observation, so just deleted it and reran -- Rsq = 0.996, a fair drop but still far from the 0.95 reported from Excel. I have no way to explain that other than it doesn't seem to match the data...OP should compare the results of the fitting from each.
Well, I took your plot and changed to loglog which looks like
Clearly, this isn't the same data set as OPs figure -- similar but not the same. For the most obvious case, the max point therein is ~[0.25 0.25], not ~[0.75 0.75] and while I didn't do a detailed examination and the patterns are similar, it doesn't look to me like any of the data points are exactly the same. So, it's an "apples to oranges" comparison problem it seems on the numerical value.
Flo
Flo el 12 de Ag. de 2016
Maybe my previous graph was not the good version. I apologize for that, but I run the code again, and my graph look like yours dpb.
what did you mean by "OP should compare the results of the fitting from each." ?
Cheers everyone! flo
dpb
dpb el 12 de Ag. de 2016
Editada: dpb el 12 de Ag. de 2016
Well, you need to look at the results obtained from the models (for the same, not disparate datasets) from Excel and Matlab to uncover where there's a difference. Clearly the results for the data you posted appear correct; if you got wildly different results from Excel, the most likely cause given the plot you posted is that it isn't the same dataset you're actually comparing to.
ADDENDUM
I attempted to read values off the above plot to see what kind of fit it actually provided; the Rsq was lower some but about 0.998 as 0.95 altho I could see guesstimating didn't work to get terribly close to the plot.
Agan, I can only suggest if you can reproduce the results you first quote in Excel, attach that set of data and model coefficients and results.
>> [f,gof]=fit(x,y,'power1')
f =
General model Power1:
f(x) = a*x^b
Coefficients (with 95% confidence bounds):
a = 1.026 (1.011, 1.04)
b = 1.013 (0.9852, 1.042)
gof =
sse: 1.6864e-04
rsquare: 0.9997
dfe: 13
adjrsquare: 0.9996
rmse: 0.0036
>> B % results from S-S:
B =
0.0254
1.0134
>> Rsq
Rsq =
0.9997
>>
BTW, I used fit in comparison to the fminsearch solution--results are essentially identical--
ADDENDUM 2
"*fit* in comparison to the fminsearch"
Actually, I just noticed there are two different solutions reached; the power term with fminsearch is about the magnitude of that of the fit solution less 1.0 -- 0.0254 vis a vis 1.026. How can that be???
Oh,
yfit = @(b,x) exp(b(1)) .* x.^b(2); % Power Function
has a definition problem; it's estimating log(B(1)) instead of B(1) directly...let's see what happens if redefine in same model space as fit uses--
>> yfit = @(b,x) b(1).*x.^b(2); % model A*x^B; B(1),B(2)-->A,B
>> SSECF = @(b) sum((yfit(b,x) - y).^2);
>> B = fminsearch(SSECF, [1; 1]);
>> B
B =
1.0256
1.0134
>>
Ah! As expected, now we agree...whew! :) Was worried there for a minute...

Iniciar sesión para comentar.

Preguntada:

Flo
el 8 de Ag. de 2016

Editada:

dpb
el 12 de Ag. de 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by