Gradient descent giving me Nan

Question

0 votos

Real estate.csv

Hello,

I am running linear regression on realestate data. Gradient descent is giving me Nan answers for theta. When I did the normal equation it provided me with some numbers of theta. Can you please advise why my gradient descent is not working?

Thanks for your help..

This is my code

clear
clc
data=importfile('realestate.csv');
%% Setting data 
X= data (:,2:7);
y= data(:,8);
m=height(y);
%Feature normalization 
X=table2array(X);
y=table2array(y);
[X_norm, mu, sigma]=featureNormalize(X);
% Add intercept term to X
X = [ones(m, 1) X];
%% setting data for Gradient decsent 
theta = rand(7, 1);
J = computeCostMulti(X, y, theta);
%chooses sume alpha and number of iteration 
alpha = 0.5;
num_iters = 1500;
[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);
% Plot the convergence graph
figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');
% Display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n', theta);
fprintf('\n');
%% using normal equation 
clear
clc
%%Setting data 
data=importfile('realestate.csv');
X= data (:,2:7);
y= data(:,8);
m=height(y);
X=table2array(X);
y=table2array(y);
% Add intercept term to X
X = [ones(m, 1) X];
% Calculate the parameters from the normal equation
theta = normalEqn(X, y);
% Display normal equation's result
fprintf('Theta computed from the normal equations: \n');
fprintf(' %f \n', theta);
fprintf('\n');

The functions are as follows:

function J = computeCostMulti(X, y, theta)
%COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
%   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly 
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.
h = X * theta;
J = (1/(2*m) * sum((h - y).^2));
% =========================================================================
end

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
%   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCostMulti) and gradient here.
    %
    h = X * theta;
    theta = theta - (alpha/m) * ( (h - y)' * X)';
    % ============================================================
    % Save the cost J in every iteration    
    J_history(iter) = computeCostMulti(X, y, theta);
end
end

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Adam Danz el 1 de Sept. de 2020

Abrir en MATLAB Online

The multiple "theta" outputs aren't a problem with the code, they were a problem for us reading the code without being able to run it. I didn't see that you already attached the dataset, sorry about that.

We don't have your importfile() or featureNormalize() functions but I was still able to look into why you're getting NaN values in the gradientDescentMulti() function.

There's a hint in your code,

% Hint: While debugging, it can be useful to print out the values
%       of the cost function (computeCostMulti) and gradient here.

I'll recommend the same advice 🙂 Here are instructions for using debug mode so you can look at how your theta values evolve.

Specifically, at the start of the ~48th iteration the values of the theta are (they willl slightly differ each time due to random processes)

theta =
 -8.4981e+299
 -1.7108e+303
 -1.5135e+301
 -1.2774e+303
 -2.9784e+300
 -2.1217e+301
 -1.0328e+302

These values are very close to the smallest floating point value that can be represented in Matlab.

K>> realmin
ans =
  2.2251e-308

On the next iteration, they all turn to -inf. On that same iteration, the result of ((h - y)'*X) is

ans =
  -Inf  -Inf  -Inf  -Inf  -Inf  -Inf  -Inf

Then theta become inf (becuase -inf*-inf=inf). Because X has some 0-values, when 0 is multiplied by inf you get NaN. That introduces some NaN values into theta which soon spread to all of your theta values.

Maha Almubarak el 1 de Sept. de 2020

Hello Adam,

Thanks so much for your reply. I really appreciate your time in looking into the code. I found that when making alpha very small alpha = 0.000000001; it gave me numbers for theta, however, these numbers are diffrent from the normal equation, and it has higher RMSE value than the normal equation. I attached two pictures form my cost functionvalue with each iteration. the first one was when my alpha value was .1 and the second when I fixed it to alpha = 0.000000001;

I stil think I have a problem with my gredient descent as it gives me diffrent values form my theta than the normal equation.

Muzamil Shah el 12 de Sept. de 2020

theta = theta - (alpha/m) * (X' * (X * theta - y))

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Robert Misior el 2 de Sept. de 2020

0 votos

Hi,

Your theta_change calucation (alpha/m) * ( (h - y)' * X)' is not correct.

Look at the notes provided with this problem: (https://www.coursera.org/learn/machine-learning/resources/O756o)

The change in theta (the "gradient") is the sum of the product of X and the "errors vector", scaled by alpha and 1/m. Since X is (m x n), and the error vector is (m x 1), and the result you want is the same size as theta (which is (n x 1), you need to transpose X before you can multiply it by the error vector.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Maha Almubarak el 3 de Sept. de 2020

0 votos

thank you all for your help. I found that the problem was with my normalization. once I used matlab normalize function. I worked!

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Safoura Tanbakouei el 17 de En. de 2022

Hi. my I ask how is finally your for loop?

Iniciar sesión para comentar.

Gradient descent giving me Nan

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuestas (2)

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

Gradient descent giving me Nan

5 comentarios Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuestas (2)

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos