"diff" function doesn't work properly with small numbers
150 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Sylwester
el 22 de Dic. de 2025 a las 14:10
Editada: Fangjun Jiang
hace alrededor de 10 horas
For some reason when difference between n and n+1 is too small diff function assumes the solution is 0.
There are +-290 data points on the plot, The precision is 10^(-10), As far as i know Matlab works on 16 or 32 digits so it shouldn't be a problem.
Technically on the plot there should be on no constants, Just increase and decrease of value.
Pomiary=cisnienie300920151701average300
Czas = Pomiary{:, 4};
Temperatura = Pomiary{:, 5};
CzasDMY= Czas / 86400 + datenum(1970, 1, 1);
y = Temperatura;
x = CzasDMY;
ydiff=diff(y,1);
wieksze = (ydiff > 0);
mniejsze = (ydiff < 0);
gora = y;
dol = y;
gora(~wieksze) = NaN;
dol(~mniejsze) = NaN;
plot(x,y,'b',x, gora, 'r', x, dol, 'g');
grid on;
xlim tight;
xlim("auto");
ylim("auto");
legend("Constant", "Increasing", "Decreasing");
legend("Position", [0.15754,0.1468,0.20438,0.12165]);

8 comentarios
dpb
hace alrededor de 9 horas
Editada: dpb
hace alrededor de 8 horas
whos -file x
whos -file y
d=dir('cisni*.mat');
whos('-file',d.name)
load x
load y
X=[x y];
fprintf('%.12f %.12f\n',X(1:10,:).')
dy=diff(y);
iy=find(dy==0);
nnz(iy)
This shows there are 5 separate repeated instances in the y vector.
iy
shows that there aren't repeated values more than two in a rwo in this data set at least so the averaging technique in the earlier Answer would work to produce something that would have no zero differences if that is the ultimate goal.
Why it is significant and not just accepting the result as is is, so far, unclear? But, as noted, the problem is not in diff() or machine precision, but that the data have been rounded such that there really are identical values.
fprintf('%.14f\n',y(iy(1)+[-1:2]))
plot(x(iy(1)+[-1:2]),y(iy(1)+[-1:2]),'*-')
Reproduces exactly the problem illustrated before -- the data are identical to machine precision because the values have been rounded to seven (7) decimal digits and when read into memory from the input file containing those values, they were interpreted and stored identically in memory. Ergo, the diff() between those subsequent positions is, as it returns identically zero.
As my Answer over the same subset of the data shows, your only choices if you find this result unacceptable is to provide the data with full precision as input on the hope that there will be a difference in later digits in the original before the rounding or as illustrated there, interpolate over the range beyond the duplicated values to produce a different result for the second/repeated value such that a subsequent diff() would be nonzero. The caveats noted there are still in play, of course.
The basic answer is that your data are, indeed, not changing at every point in either a positive or negative direction but are unchanging over at least two consecutive positions and diff() is just doing its job.
Fangjun Jiang
hace alrededor de 8 horas
Editada: Fangjun Jiang
hace alrededor de 7 horas
@dpb, @Sylwester, There is no problem regarding diff(). There is no probelm regarding data accuracy or precision. It is a visual mis-conception.
First, as @dpb pointed out, in the whole set of 288 data points, there is only 5 places where the data value is un-changed thus regarded as "Constant" trend.
@Sylwester had this thought. Plot all the data in BLUE color, plot all the "Increasing" trend data in RED color, plot all the "Decreasing" trend data in GREEN color. Since the RED and GREEN color are going to over-write the BLUE color, the resulting plot should show almost no "BLUE" section, since there is only 5 out of 288 data points that are "Constant" trend.
But there is no problem regarding diff() function. It is just a visual mis-conception. Or it is due to how the plot(time,data,'r') function connects the data points with the line style and color when there are "NAN" data points in the "data" set.
I only changed to this line.
plot(x,y,'b.',x, gora, 'r+', x, dol, 'g*');
and the resulting plot gives the correct visual impression (that there is almost no BLUE "Constant" data).

Respuesta aceptada
Paul
hace alrededor de 8 horas
The data in gora and dol are on the plot as can be seen below when using markers. However, if the y-data pattern is
increasing->decreasing->increasing ...
then the gora and dol will have data->nan->data ...
and so the data points in gora and dol won't be connected on the plot (and won't be visible at all if not using markers)
load x
load y
ydiff=diff(y,1);
wieksze = (ydiff > 0);
mniejsze = (ydiff < 0);
gora = y;
dol = y;
gora(~wieksze) = NaN;
dol(~mniejsze) = NaN;
figure
plot(x,y,'b',x, gora, 'r-o', x, dol, 'g-x');
xlim([7.3623688,7.3623691]*1e5)
xl = xlim;
counts = (1:numel(x)).';
index = x>xl(1) & x < xl(2);
format long
[counts(index),x(index),y(index),gora(index),dol(index),wieksze(index),mniejsze(index)]
1 comentario
Fangjun Jiang
hace alrededor de 7 horas
with this "increasing->decreasing->increasing" extreme case.
Nothing is wrong. I call it a visual mis-conception by the OP.
Más respuestas (2)
Fangjun Jiang
el 22 de Dic. de 2025 a las 15:45
The data value and results make sense. There is no problem using diff() to process your data based on your example data.
%%
format long
y=[36 1023.08766260000
37 1023.03861350000
38 1023.01522350000
39 1023.01522350000
40 1022.96080630000]
ydiff=diff(y,1)
wieksze = (ydiff > 0)
mniejsze = (ydiff < 0)
By default, MATLAB uses 64 bits floating-point data to represent a numeric value.
At around value 1023, its relative accuracy is 1e-13, sufficient to represent your data precision 10e-10.
The problem you observed comes from your raw data. Note that y(3,2) and y(4,2) are exactly the same by visual observation.
eps(1023)
Check the document for eps(). You will understand the issue better.
doc eps
3 comentarios
Fangjun Jiang
hace alrededor de 1 hora
The length of diff() output is 1 smaller than its input length. Your code didn't seem to consider this.
diff(1:3)
Fangjun Jiang
hace alrededor de 7 horas
Editada: Fangjun Jiang
hace alrededor de 7 horas
The length difference of 1 between the input and output of the diff() function is not an issue either in this case.
There is no issue regarding diff() function or data accuray/precision. The OP has a visual mis-conception due to the way that the plot(time,data,'b') function connects data points with color and line style when there are "NAN" data points in the "data" set.
dpb
hace alrededor de 16 horas
Editada: dpb
hace 14 minutos
X=[
36 1023.08766260000
37 1023.03861350000
38 1023.01522350000
39 1023.01522350000
40 1022.96080630000];
dx=diff(X)
As hypothesized above, some of the temperature/pressure values are identical owing to the apparent rounding to seven (7) decimal digits.
You would have to have at least one more decimal place in the above between the 3rd and 4th data values in order for the difference to not be identically zero.
If you're transferring data from one place to another, to avoid this don't use text files but save the whole internal precision by using .mat files or binary formatted transfer if from some external source. Besides being able to retain full precision (note that precision does not necessarily imply accuracy), it's much more efficient in speed and memory/disk space.
As for your comment above about the values that "They are meant to be the same, The issue is that for some reason function for marking if value increased/decreased has holes in it and skips points unless difference is high enough", that makes no sense at all -- the two values are identically the same so how can there be any sense of the value changed that "increased/decreased" implies?
If you're trying to measure an overall change; then diff is entirely the wrong function as it is on a pointwise basis and so will indeed notice when there are any points for which the difference is actually zero.
Looking at your small subsample of data
plot(X(:,1),X(:,2),'*-')
indeed, there is an overall negative trend, but it isn't uniformly decreasing at every point, just overall. If you want indications of trends excluding such points, you'd have to do something like find the inflection points and then (say) the two points on either side and then use the adjusted temperature to compute the change.
Note that you would also have to locate any locations of more than two successive points being the same and then do something over those ranges. Also, in doing something like this you'll run into the issue that @Fangjun Jiang raised about the differenced vector being shorter than the original so the points are offset by one in the addressing.
For the simple example here
ix=find(dx(:,2)==0); % locate the zero point `
fprintf('%d %15.10f\n',X(ix+[0:1],:).') % display where are relatively
X(:,3)=X(:,2); % augment the X array
X(ix+1,3)=mean(X(ix+[0 2],3)); % replace the unchange with linear interp1
hold on
plot(X(:,1),X(:,3),'rx-')
legend('Original','Interpolated','location','northeast')
diff(X)
Now you don't have any zeros in the 3rd column diff().
0 comentarios
Ver también
Categorías
Más información sobre Logical en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!




