# Removing NaN in Linear Regression Problem. Error in line 66.

43 views (last 30 days)
NATALIA ARREGUI GONZALEZ on 18 Mar 2020
Edited: dpb on 20 Mar 2020
Hello guys,
I am trying to conduct a multivariable linear regression problem. The predictors (X) form a table sized 52824x9.
When trying to remove all the NaN values using this piece of code, included in the regress function:
% Remove missing values, if any
wasnan = (isnan(y) | any(isnan(X),2)); %line 66
havenans = any(wasnan);
if havenans
y(wasnan) = []; %line 69
X(wasnan,:) = [];
n = length(y);
end
At first, I got an error stating:
Undefined function 'isnan' for input arguments of type 'table'.
Error in regress (line 66)
wasnan = (isnan(y) | any(isnan(X),2));
I searched for solutions, and I was able to find one saying that isnan function is not able to access data from tables, and the provided solution was to include the following:
wasnan = (isnan(y{:,:}) | any(isnan(X{:,:}),2));
Now I get an error in line 69 saying the following:
Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not
supported. Use a row subscript and a variable subscript.
If anyone knew how to solve the problem or to provide another solution for accessing data with the isnan function, it would be very much appreciated. I have been trying to solve this problem for some days now.
Many thanks,
Natalia

dpb on 18 Mar 2020
Edited: dpb on 20 Mar 2020
You don't pass the table to regress but the variables to be used in the regression -- then you won't run into the issue inside regress.
And you DEFINITELY DO NOT WANT TO BE MUCKING INSIDE THE SUPPLIED REGRESS FUNCTION!!!!
We don't know the function you're trying to fit nor the variable names in your table, but assuming
Y ~ 1 + AX1 + BX2 + ...
for variables X and Y in the table and a linear model plus intercept, then the syntax for regress would be
b=regress(t.X,[ones(height(t),1) t.Y]);
where the table variable is t. Use your table variable name and variable names within the table, of course.
If you have the Curve Fitting Toolbox besides Statistics, I would suggest that the fit function in it is a little more user friendly than the core regress function. Lacking it, see the Alternative Functionality section of the documentation for regress that suggests using LinearModel instead for similar reasons/purposes.
Read the section in the documentation for table on how to address data within a table for the details of using tables and which forms of addressing return the variables as native type, tables, etc., ... But, in particular note that addressing a table variable with parentheses returns another table of the addressed rows and columns within the table which is probably the root cause of your troubles.
x=t(:,1); % returns x as a table all rows of table t, column 1
while
x=t.X; % presuming X is the first column in table t returns X as an array
% or
x=t{:,1}; % returns x as a array -- NB: the "curlies" {} instead of ()

#### 1 Comment

NATALIA ARREGUI GONZALEZ on 19 Mar 2020
Best,
Natalia

Cris LaPierre on 18 Mar 2020
Try using rmmissing. It accepts vectors, matrices, cell arrays, tables, and timetables as input.

dpb on 18 Mar 2020
Hmmm....that would work outside regress and if the OP did extract that code from the regress function and is trying it elsewhere. Looked to me like was trying to patch regress instead.
But even if so, unless changes the form in which calls regress it'll result in a table and will fail again trying to get around the input check inside regress.
Cris LaPierre on 18 Mar 2020
Ah, I didn't realize that code snippet was from the regress function. Yes, don't go changing code inside the function. Use this to clean up your table before passing it to regress.
And yes, regress does not support tables as inputs. Use the dot notation to pass in variables.
NATALIA ARREGUI GONZALEZ on 19 Mar 2020
Many thanks for your answer as well, I will keep this in mind from now on.
Best,
Natalia