Neural Network Stock price prediction - Extremely accurate results

32 visualizaciones (últimos 30 días)
Soham Acharjee
Soham Acharjee el 7 de Abr. de 2015
Respondida: David Willingham el 5 de Oct. de 2020
Hi,
I have implemented a narxnet neural network to predict the next day closing price of stocks. I have been conducting this experiment for Offshore Stocks on the Singapore Exchange. The problem I am having is that Neural Network is giving extremely accurate results for prediction with Mean Absolute error in the range of 5*E^-3 to 7*E^-3. I am not able understand if the Neural Network really so accurate in predicting stock prices, or if I have been making some mistake in the implementation of the ANN. I am using 60% Data for Training, 5% Validation and 35% Testing and Early Prediction to predict the performance. Please could you have a look at the code below and suggest where I am going wrong?
if true
%code
clc, clear
Stock_names = {'CAO' 'CHO' 'CHW' 'DMHL' 'ezion' 'EZRA' 'falcon' 'keppel' 'KS' 'SCI' 'SMM' 'swiber'};
b = Stock_names{2};
data=csvread(strcat(b,'.csv'), 1,1);
inputSeries = tonndata(data(1:end,2:end),false,false);
targetSeries = tonndata(data(1:end,1),false,false);
mae = zeros(3,1);
mape= zeros(3,1);
rmse = zeros(3,1);
msre = zeros(3,1);
start = int32(5);
last = int32(9);
for k = 1: %loop for iteratively increasing the hidden layers by 5
Performance_Matrix5 = zeros(5,4);
s = strcat('D', int2str(start), ':', 'G', int2str(last));
for j = 2:6 % looping through number of delays from 2 to 6
for i = 1:3 % loop for taking average of results for each iteration
inputDelays = 1:j;
feedbackDelays = 1:j;
hiddenLayerSize = 5*k; %hiddenlayer neurons
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'};
[inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);
net.divideFcn = 'divideblock'; % Divide data randomly
net.divideMode = 'value'; % Divide up every value
net.divideParam.trainRatio = 60/100;
net.divideParam.valRatio = 5/100;
net.divideParam.testRatio = 35/100;
net.trainFcn = 'trainlm'; % Levenberg-Marquardt
net.performFcn = 'mse'; % Mean squared error
net.plotFcns = {'plotperform','plottrainstate','plotresponse', ...
'ploterrcorr', 'plotinerrcorr'};
% Train the Network
[net,tr] = train(net,inputs,targets,inputStates,layerStates);
% Test the Network
outputs = net(inputs,inputStates,layerStates);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs);
% Recalculate Training, Validation and Test Performance
trainTargets = gmultiply(targets,tr.trainMask);
valTargets = gmultiply(targets,tr.valMask);
testTargets = gmultiply(targets,tr.testMask);
%trainPerformance = perform(net,trainTargets,outputs)
%valPerformance = perform(net,valTargets,outputs)
%testPerformance = perform(net,testTargets,outputs);
% Closed Loop Network
netc = closeloop(net);
[xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries);
netc.name = [net.name ' - Closed Loop'];
yc = netc(xc,xic,aic);
closedLoopPerformance = perform(netc,tc,yc);
N_closedLoopPerformance = closedLoopPerformance/mean(cell2mat(tc));
%Early Prediction
nets = removedelay(net);
nets.name = [net.name ' - Predict One Step Ahead'];
[xs,xis,ais,ts] = preparets(nets,inputSeries,{},targetSeries);
ys = nets(xs,xis,ais);
earlyPredictPerformance = perform(nets,ts,ys);
c1 = cell2mat(ts);
c2 = cell2mat(ys);
b1 = c1(:,1:(end-1));
b2 = c2(:,1:(end-1));
N_earlyPredictPerformance = earlyPredictPerformance / mean(b1);
% error calculation through ereperf.m external file
MAE_ep = errperf(b1, b2, 'mae');
MAPE_ep = errperf(b1, b2, 'mape');
RMSE_ep = errperf(b1, b2, 'rmse');
MSRE_ep = errperf(b1, b2, 'msre');
% disp('Next day closing price was forecasted to')
% ys(end)
% figure
% plot([cell2mat(yc);cell2mat(tc)]')
% legend('Network Predictions','Expected Outputs')
% figure
% plot([cell2mat(ys);cell2mat(ts)]')
% legend('Network Predictions','Expected Outputs');
mae(i) = MAE_ep;
mape(i)= MAPE_ep;
rmse(i) = RMSE_ep;
msre(i) = MSRE_ep
end
Performance_Matrix5(j-1,1) = mean(mae);
Performance_Matrix5(j-1,2) = mean(mape);
Performance_Matrix5(j-1,3) = mean(rmse);
Performance_Matrix5(j-1,4) = mean(msre);
xlswrite(strcat(b, '_results.xlsx'), Performance_Matrix5, s);
end
end
start = start+ 6;
last = last + 6;
end
I have looped through a range of hidden neurons, and delays corresponding to each iteration of the hidden neuron like a trial and error for getting the best configuration. I have used both closeloop net and early prediction for predicting the stock price but the MAE, RMSE, MAPE error seem to be absurdly small which is extremely difficult for stock market prediction due to its efficiency. I have also uploaded the csv file with the stock data 'CHO.csv' if you would like to test the network out. Where have I gone wrong?
The Matlab Program outputs the results for each iteration of changing hidden neurons and delay to an excel file. The errperf.m file contains external error measurements.
  2 comentarios
Rimi Khongji
Rimi Khongji el 6 de Mayo de 2017
I'd really like a detailed explanation of this code. Please mail me @rkhongji@gmail.com
Rajesh Vemulapalli
Rajesh Vemulapalli el 31 de Oct. de 2017
Pls mail me @rajeshvemulapalli1997@gmail.com detail explaination of this code.

Iniciar sesión para comentar.

Respuestas (3)

Greg Heath
Greg Heath el 9 de Abr. de 2015
% Neural Network Stock price prediction - Extremely accurate results
% Asked by Soham Acharjee about 10 hours ago
% Hi,
% I have implemented a narxnet neural network to predict the next day
% ...
% with Mean Absolute error in the range of 5*E^-3 to 7*E^-3. I am not able
That means nothing unless the scale of the data is known
% understand if the Neural Network really so accurate in predicting stock
% ...
% I am using 60% Data for Training, 5% Validation and 35% Testing
Strange division combination ... Why?
% and Early Prediction to predict the performance.
Although can get OPENLOOP early prediction when min(TARGET feedback delay) = 1
CANNOT get CLOSELOOP early prediction when min(OUTPUT feedback delay) = 1
%Please could you have a look at the code below and suggest
% where I am going wrong?
% ...
%
% Stock_names = {'CAO' 'CHO' 'CHW' 'DMHL' 'ezion' 'EZRA' 'falcon' 'keppel' ...
% 'KS' 'SCI' 'SMM' 'swiber'};
% b = Stock_names{2};
% data=csvread(strcat(b,'.csv'), 1,1);
% inputSeries = tonndata(data(1:end,2:end),false,false);
% targetSeries = tonndata(data(1:end,1),false,false);
[I N ] = size(inputSeries) % = ?
[ O N ] = size(targetSeries) % = ?
whos data inputSeries targetSeries ?
% mae = zeros(3,1);
mae is the name of a MATLAB function
% mape= zeros(3,1);
% rmse = zeros(3,1);
% msre = zeros(3,1);
What is msre ?
start = int32(5);
last = int32(9);
Why the need for int32 ?
% for k = 1: %loop for iteratively increasing the hidden layers by 5
ERROR: k = 1 : ?
% Performance_Matrix5 = zeros(5,4);
% s = strcat('D', int2str(start), ':', 'G', int2str(last));
s = D5:G9 What does that mean ???
% for j = 2:6 % looping through number of delays from 2 to 6
% for i = 1:3 % loop for taking average of results for each iteration
% ...
% net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
% net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'};
Why include mapminmax defaults?
% [inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);
% ...
% net.divideParam.valRatio = 5/100;
% net.divideParam.testRatio = 35/100;
Weird division. Why this choice?
% net.trainFcn = 'trainlm'; % Levenberg-Marquardt
% net.performFcn = 'mse'; % Mean squared error
% net.plotFcns = {'plotperform','plottrainstate','plotresponse', ...
% 'ploterrcorr', 'plotinerrcorr'};
Why include? These are defaults.
% % Train the Network
% [net,tr] = train(net,inputs,targets,inputStates,layerStates);
No output states ? ... How to predict beyond data ?
% ...
% % Closed Loop Network
% netc = closeloop(net);
% [xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries);
% netc.name = [net.name ' - Closed Loop'];
% yc = netc(xc,xic,aic);
% closedLoopPerformance = perform(netc,tc,yc);
% N_closedLoopPerformance = closedLoopPerformance/mean(cell2mat(tc));
Reference MSE should be mean(var(cell2mat(tc)',1)
What is the numerical difference betwen open loop and closed loop answers?
% %Early Prediction
% nets = removedelay(net);
Only works for OPENLOOP
% Where have I gone wrong?
Dunno
  2 comentarios
Soham Acharjee
Soham Acharjee el 13 de Abr. de 2015
Neural Network Stock price prediction - Extremely accurate results
% Asked by Soham Acharjee about 10 hours ago
% Hi,
% I have implemented a narxnet neural network to predict the next day
% ...
% with Mean Absolute error in the range of 5*E^-3 to 7*E^-3. I am not able
That means nothing unless the scale of the data is known
  • The range of data is from 0.3 to 1.5
% understand if the Neural Network really so accurate in predicting stock
% ...
% I am using 60% Data for Training, 5% Validation and 35% Testing
Strange division combination ... Why?
% and Early Prediction to predict the performance.
Although can get OPENLOOP early prediction when min(TARGET feedback delay) = 1
CANNOT get CLOSELOOP early prediction when min(OUTPUT feedback delay) = 1
  • Yes, I am aware of that. Therefore I am iteratively increasing the output and feedback delay from 2 to 6 (loop j)
%Please could you have a look at the code below and suggest
% where I am going wrong?
% ...
%
% Stock_names = {'CAO' 'CHO' 'CHW' 'DMHL' 'ezion' 'EZRA' 'falcon' 'keppel' ...
% 'KS' 'SCI' 'SMM' 'swiber'};
% b = Stock_names{2};
% data=csvread(strcat(b,'.csv'), 1,1);
% inputSeries = tonndata(data(1:end,2:end),false,false);
% targetSeries = tonndata(data(1:end,1),false,false);
[I N ] = size(inputSeries) % = ?
[ O N ] = size(targetSeries) % = ?
  • I = 1 N = 791
  • O = 1 N = 791
whos data inputSeries targetSeries ?
  • The 'data' is obtained after the file CHO.csv (attached) is read using the command
data = csvread('CHO.csv',1,1);
inputSeries and targetSeries are then obtained from this 'data'
% mae = zeros(3,1);
mae is the name of a MATLAB function
% mape= zeros(3,1);
% rmse = zeros(3,1);
% msre = zeros(3,1);
What is msre ?
  • msre is the mean square relative error (defined in the attached file errperf.m)
start = int32(5);
last = int32(9);
Why the need for int32 ?
  • no specific need, can remove it.
% for k = 1: %loop for iteratively increasing the hidden layers by 5
ERROR: k = 1 : ?
% Performance_Matrix5 = zeros(5,4);
% s = strcat('D', int2str(start), ':', 'G', int2str(last));
s = D5:G9 What does that mean ???
  • it is an excel sheet range, and the excel sheet is formatted using headers and column names etc. So program just writes into the range D5:G9 which contains the data.
% for j = 2:6 % looping through number of delays from 2 to 6
% for i = 1:3 % loop for taking average of results for each iteration
% ...
% net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
% net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'};
Why include mapminmax defaults?
% [inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);
% ...
% net.divideParam.valRatio = 5/100;
% net.divideParam.testRatio = 35/100;
Weird division. Why this choice?
  • It was just chosen randomly as I wanted to keep the training percentage above 50 and validation to a minimum. Since the testing error was very small, I was scared there wasn't enough data for testing. Hence this choice.
% net.trainFcn = 'trainlm'; % Levenberg-Marquardt
% net.performFcn = 'mse'; % Mean squared error
% net.plotFcns = {'plotperform','plottrainstate','plotresponse', ...
% 'ploterrcorr', 'plotinerrcorr'};
Why include? These are defaults.
  • no reason. can remove it.
% % Train the Network
% [net,tr] = train(net,inputs,targets,inputStates,layerStates);
No output states ? ... How to predict beyond data ?
% ...
% % Closed Loop Network
% netc = closeloop(net);
% [xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries);
% netc.name = [net.name ' - Closed Loop'];
% yc = netc(xc,xic,aic);
% closedLoopPerformance = perform(netc,tc,yc);
% N_closedLoopPerformance = closedLoopPerformance/mean(cell2mat(tc));
Reference MSE should be mean(var(cell2mat(tc)',1)
What is the numerical difference betwen open loop and closed loop answers?
  • Open loop performance: 1.0441e-04 closed loop performance: 6.6112e-04 early predict performance: 6.5612e-05
Configuration (Input delays: 3, feedback delays: 3, hidden neurons: 5)
% %Early Prediction
% nets = removedelay(net);
Only works for OPENLOOP
% Where have I gone wrong?
Greg Heath
Greg Heath el 14 de Abr. de 2015
Neural Network Stock price prediction - Extremely accurate results
% Asked by Soham Acharjee about 10 hours ago
% ... with Mean Absolute error in the range of 5*E^-3 to 7*E^-3.
>> That means nothing unless the scale of the data is known
> The range of data is from 0.3 to 1.5
It helps, immensely to ALWAYS scale data BEFORE training. Then, regardless of the problem and data source, you can be familiar with the range of numbers at different stages in the design. I prefer to scale with ZSCORE to detect outliers and estimate significant auto and cross correlation lags usin NCORR.
> ... Therefore I am iteratively increasing the output and feedback > delay from 2 to 6 (loop j)
You can estimate, DIRECTLY, significant correlation lags using NNCORR
>> whos data inputSeries targetSeries ?
>The 'data' is obtained after the file CHO.csv (attached) is read using the command
Sorry, what I meant is to cut and paste that whos statement (without the "?") into the command line to verify dimensions and class
% % Train the Network
% [net,tr] = train(net,inputs,targets,inputStates,layerStates);
>>No output states ? ... How to predict beyond data ?
Can use [ net tr y e Xof Aof ] = train(neto,... ... % % Closed Loop Network % netc = closeloop(net); % [xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries); % netc.name = [net.name ' - Closed Loop']; % yc = netc(xc,xic,aic); % closedLoopPerformance = perform(netc,tc,yc); % N_closedLoopPerformance = closedLoopPerformance/mean(cell2mat(tc));
>>Reference MSE should be mean(var(cell2mat(tc)',1)
>>What is the numerical difference betwen open loop and closed loop answers?
>Open loop performance: 1.0441e-04 >closed loop performance: 6.6112e-04 >early predict performance: 6.5612e-05
>Configuration (Input delays: 3, feedback delays: 3, hidden neurons: 5)

Iniciar sesión para comentar.


Brian Whatcott
Brian Whatcott el 31 de Ag. de 2016
I enjoyed reading this question asked by Soham Acharjee on 7 Apr 2015. It embodied the puzzlement of seeing wonderfully small error measures of an open net addressing a stock series forecast, knowing the difficulty of prognosticating stock prices' futures. Here's the problem, as I see it: stock price movements are a Drunkard's Walk, and so it is unrealistic to expect great precision of the forecast that really matters - next day's closing price. Kudos for providing everything needed to run his script! I expect neural nets to extract seasonal patterns, significant connections between prior prices and a relevant price series. In this case, Soham's excellent demonstration looks for closing price given a history of closing prices and prices at the open - so he demands only an eight hour prediction. Even a weatherman can make a fair prediction of rainfall today by asking if rain fell yesterday! I liked his application of loops to search the solution space for optimal delays and neuron numbers and his averaging several trial solutions at each parameter pair. I was disappointed I could not write his error tables to Excel (though the Matlab help suggested a text file would create in Excel's absence, it did not!) Using much the same technique, I was able to see the close direction in nine of twelve trials stepping one day ahead for each trial - nothing to write home about, but rather better than chance, at low significance. Brian W. Okla

David Willingham
David Willingham el 5 de Oct. de 2020
Hi,
If the intent is to forecast price to make trading decisions, I suggest looking at this example on GitHub:
This example shows how to teach an RL Agent to learn to buy, sell or hold based on historical data. The reward function has been setup to reward when a trade doesn't result in a loss.
Regards,

Categorías

Más información sobre Sequence and Numeric Feature Data Workflows en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by