zscore of an array with NaN's?
38 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hi there!
What is the best way to take the zscore of this array while keeping NaNs in their place, and without simply indexing the part of the array where there aren't NaNs?
Thanks!
test =
NaN
NaN
25.0182
8.1078
0.7304
-8.4954
13.1454
31.7136
-7.2601
10.2586
16.4023
-55.0183
-53.4840
-4.5846
NaN
0 comentarios
Respuestas (4)
Mbvalentin
el 21 de Dic. de 2016
What zscore function does is simply ensure your data has zero mean and std = 1.
Say you have a matrix X where rows are measures and cols are different features (which are measured in different scales, so in order to compare one another you need to normalize them). Then you could obtained normalized X by simply doing:
X = [rand(100,1) 100*rand(100,1)];
% Take a look at how different the scales of X are
figure, plot(X,'o'), title('Unnormalized features');
% Now we can subtract the mean, per column:
X = bsxfun(@minus,X,mean(X,1));
% And make standard deviation equal to 1, per column:
X = bsxfun(@rdivide,X,std(X,[],1));
% Now they can be compared!
figure, plot(X,'o'), title('Normalized Features');
Now, if you have NaNs in your data, the only difference is that instead of using 'mean' or 'std' function you should use nanmean and nanstd, which will ignore the nans in your data:
% First let's put some NaNs
X = [rand(100,1) 100*rand(100,1)];
X(round(rand(20,1)*(numel(X)-1))+1) = NaN;
% And now normalize:
X = bsxfun(@minus,X,nanmean(X,1));
X = bsxfun(@rdivide,X,nanstd(X,[],1));
figure, plot(X,'o'), title('Normalized Features');
Remember that X, even after using nanmean and nanstd STILL HAS NaNs!!! If you wanna delete them, from each column:
X_without_nans = arrayfun(@(col) X(~isnan(X(:,col)),col),(1:size(X,2)),'un',0);
1 comentario
Steven Lord
el 15 de Nov. de 2018
test = [NaN, NaN, 25.0182, 8.1078, 0.7304, -8.4954, ...
13.1454, 31.7136, -7.2601, 10.2586, 16.4023, -55.0183, ...
-53.4840, -4.5846, NaN];
N = normalize(test) % zscore is the default normalization method
M = mean(N, 'omitnan')
S = std(N, 'omitnan')
0 comentarios
Pablo Rivas
el 16 de Jul. de 2014
Editada: Pablo Rivas
el 16 de Jul. de 2014
Dear Kate, This is what I've done in the past:
if any(isnan(X(:)))
xmu=nanmean(X);
xsigma=nanstd(X);
x=(X-repmat(xmu,length(X),1))./repmat(xsigma,length(X),1);
else
[x,xmu,xsigma]=zscore(X);
end
First, check if there is really a NaN in your data, X, or else, just use the traditional zscore function, your zscored data is now in x. But if there are NaNs use the functions Sean suggested. I hope this helps. Note: if your data is very large, using repmat may not be the best alternative, you are better off using for loops to subtract the mean and divide by the standard deviation; it will take longer, but it is computationally feasible for large data.
Peace.
0 comentarios
Sean de Wolski
el 11 de Nov. de 2013
Editada: Sean de Wolski
el 11 de Nov. de 2013
I think any function that will calculate the zscore will inevitably have to remove the nans.
You may find nanvar, nanmean, nan* etc. useful.
4 comentarios
Sean de Wolski
el 12 de Nov. de 2013
I mean, this is about as simple as it gets:
x = randn(100,1);
x(rand(100,1)>0.95) = nan; %add nans
zscore(x(~isnan(x)))
Ver también
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!