K-means for stock market timeseries
11 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Abdelrazzaq
el 19 de En. de 2014
Respondida: Abdelrazzaq
el 2 de Feb. de 2014
Hi
I am doing my research to test the accuracy of different volatility models in forecasting the stock market volatility using indexes time series. I need to cluster the data normally with K-means into two groups. I already have the time series from different stock markets but all came with the same length. I Just need to cluster each of them into two subsets. Then the first subset will be used to train the models and the second one will be used to test and to forecast the models. I wonder if you can give the direct code or at least how to start the k-means in Matlab.
I seriously look forward to hearing from you very soon.
Regards, Abdelrazzaq.
0 comentarios
Respuesta aceptada
AJ von Alt
el 20 de En. de 2014
Editada: AJ von Alt
el 20 de En. de 2014
The function kmeans is part of the Statistics Toolbox in MATLAB. The following code demonstrates how to use k-means to cluster data into two groups and pull out the individual groups.
% Generate random data
nSamples = 100;
sampleWidth = 5;
X = rand(nSamples,sampleWidth);
trainingSetSize = 20;
% seperate into two groups using euclidean distance
% IDX will be size nsamples x 1 where each element indicates the label at
% that index
IDX = kmeans( X , 2 , 'distance' , 'sqEuclidean');
% separate the data into two groups
G1 = X(IDX == 1 , : );
G2 = X(IDX == 2 , : );
As a result of the k-means clustering, the groups will be self similar and would likely make very bad training and test data for an ML algorithm. A much more suitable function for generating training and test sets is the randsample function in the Statistics toolbox. By uniformly sampling a population at random, this function will provide more diverse training data to your ML algorithm and help improve its robustness.
% Randomly select trainingSetSize samples without replacement
rsIDX = randsample( size(X,1) , trainingSetSize );
% Create a logical mask for the selected values
tsMASK = false( nSamples , 1 );
tsMASK( rsIDX ) = true;
% Separate the data into training and test samples.
GTraining = X( tsMASK , : );
GTest = X( ~ tsMASK , : ) ;
Más respuestas (1)
Ver también
Categorías
Más información sobre Statistics and Machine Learning Toolbox en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!