How to select the number of samples to train a Machine Learning algorithm?

I working in a dataset of 12000 samples concerning about 5 years of an industrial process.
It is likely that during this time the plant has undergone changes (equipments, the performance drop itself, chemical products).
Is there a tool for identifying the best subset of this data? In my view, a temporal cut in the data could increase the quality of the models created.

3 comentarios

According to my understanding, newer data is more relevant than older data. So I would start out with a temporal cut that is recent and try to achieve the required performance and keep adding older data iteratively as needed. The size required for the initial attempt will purely depend on the data used and the net being trained. Shallower nets hold lesser information than deeper ones (with comparable number of neurons).
Maybe you can provide more details regarding the net used and the nature of the data (no of dimensions and so on).
Thanks for the comment!
The dataset has 426 inputs (I am using techniques for feature selection too).
I am using four algorithms to create the models: Regression Tree, Bagged Trees, SVM and Neural Networks.
As a common sense rule of thumb I try to use at least 10 to 30 times as many training points as unknown parameters that have to be estimated.
In addition I use 10 to 20 sets of random initial weights.
I assume , of course, that you ave examined plots of the data to initialize your common sense.
Hope this Helps
Greg

Iniciar sesión para comentar.

Respuestas (1)

u can use deep belif networks ; they are the best for feature sellection and mapping; and train you network by driven chunks of data "by randomly chosing a pairs of (inputs,targets)" and in the same time pire attention to your approximation function you must keep your error function in its local minimam. deep belif nets depands on a set of stacked auto_encoders that allows to tune all the parameters of the networks with small amount of training data

Categorías

Más información sobre Deep Learning Toolbox en Centro de ayuda y File Exchange.

Preguntada:

el 31 de En. de 2019

Comentada:

el 4 de Feb. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by