Non-constant statistical weights in non-linear regression

2 visualizaciones (últimos 30 días)
thanos
thanos el 6 de Mayo de 2020
Editada: thanos el 7 de Mayo de 2020
Hello everyone, hope this message finds you and your families safe and healthy.
I am trying to perform weighted non-linear regression, where the weight factors are not contant. To clarify everything consider the following example:
exp_data = importdata('data.txt');
xdata = exp_data(:,1);
ydata = [exp_data(:,2),exp_data(:,3)];
x0 = [1 1 1 1];
fitfcn = @(c,xdata)test_function(c,xdata);
[lb] = [1 1 1 1]; ub = [10 10 10 10];
problem = createOptimProblem('lsqcurvefit','x0',x0,'objective',fitfcn,...
'lb',lb,'ub',ub,'xdata',xdata,'ydata',ydata);
ms = MultiStart('PlotFcns',@gsplotbestf);
[xmulti,errormulti] = run(ms,problem,500)
Normally, when a constant weight, w, is used one can simply multiply ydata and the expression in the function handle with w:
exp_data = importdata('data.txt');
xdata = exp_data(:,1);
ydata = [exp_data(:,2),exp_data(:,3)];
w = 1./sqrt(((ydata(:,1).^2)+(ydata(:,2).^2)));
ydata2 = ydata.*w;
x0 = [1 1 1 1];
fitfcn = @(c,xdata)test_function(c,xdata).*w;
[lb] = [1 1 1 1]; ub = [10 10 10 10];
problem = createOptimProblem('lsqcurvefit','x0',x0,'objective',fitfcn,...
'lb',lb,'ub',ub,'xdata',xdata,'ydata',ydata2);
ms = MultiStart('PlotFcns',@gsplotbestf);
[xmulti,errormulti] = run(ms,problem,500)
In my case I have two options. Either calculating w by using the experimental (imported) values or the calculated ones. The first case is straightforward (since the weight vector is defined and constant) and can be solved as above. However, due to the existence of random errors and an increased possibility of bias, I have to resort to the second case which is more complicated (at least for me), since the weight vector changes each time the c vector changes (from the x0 guess values) to produce an output based on the objective function. The objective function produces as an ouput a matrix of n x 2 dimensions (the real and imaginary part of a complex number). My goal is to use as weight the inverse of the modulus of the produced (calculated) values, i.e. the ydata in w above would be the calculated ydata for each c.
My question is how I can assign these calculated values to w.
Any help/suggestions would be highly appreciated.
Thank you in advance for your time.
Regards

Respuestas (1)

Bjorn Gustavsson
Bjorn Gustavsson el 6 de Mayo de 2020
When I have this type of problems (fitting of functions to photon-counting-data where the weights should be calculated from the function - Poisson-statistics using the expected values), I typically get good enough results with your first approach (the weights calculated from the data doesn't differ much from the weights calculated from the fitted function). For the next-level attempt I've thought about running this iteratively, first use the data-weights, then repeat with weights calculated from the first solution - which hopefully should be close to the optimal solution.
If that is not good enough, you might have to do this "properly" and calculate the weights as you should - you might have to be careful and make sure that the weights don't blow up/down. If your data doesn't have normal-distributed noise you might have to use the proper "norm" to minimize - this might force you away from any lsq-function - unfortunately.
HTH
  3 comentarios
Bjorn Gustavsson
Bjorn Gustavsson el 6 de Mayo de 2020
What you want to do is to get a maximum-likelihood solution, right? This requires you to have some knowledge/grasp/estimate of the statistical characteristics of your data (if all data can be assumed to be samples from some normal-distributions we get the least-square-solution, weighted or not. If you have some other noise-characteristics, for example Cauchy-distribution (with much longer tails) you have to maximize another likelihood-function). If you think that you have data that is reasonably normal-distributed except for some fraction of outliers you could take a look at Huber-like norms, that's one way to reduce the impact of outliers.
thanos
thanos el 7 de Mayo de 2020
Editada: thanos el 7 de Mayo de 2020
Thank you very much for your illuminating comments. Electrical impedance measurements are usually measured by the application of a perturbation signal (sinuosidal excitation signal). The excitation and response signals (voltage and current) are recorded using a constant sample rate and then are transferred to the frequency domain (by FT) to calculate the impedance of the system. For an impedance spectrum, several frequencies are applied. The noise component of the signal at each frequency (isolated by FT) results in a Gaussian distribution around the true value of the impedance. I believe that this is why the least-square-solution is implemented (as you pointed out). However, I will totally have in mind the different distributions you mentioned and try to see if they can be applied to my case (especially the Huber function).

Iniciar sesión para comentar.

Categorías

Más información sobre Surrogate Optimization en Help Center y File Exchange.

Productos


Versión

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by