TreeBagger Training, large datasets

13 visualizaciones (últimos 30 días)
Claire Br
Claire Br el 27 de Mzo. de 2015
Editada: TED MOSBY el 18 de Nov. de 2024 a las 19:49
I want to train the TreeBagger Classifier with a large dataset (4 mio x 1 array). My PC runs out of memory if I try to do this in one run! Is their a chance to run the Training in a loop? I was wodering if I could first use a subsets of the training data to train the TreeBagger algorithm and update it with the missing subsets. Could I use the results of the first Training-run as some kind of prior for the next?
Thanks, Claire

Respuestas (1)

TED MOSBY
TED MOSBY el 15 de Nov. de 2024 a las 9:34
Editada: TED MOSBY el 18 de Nov. de 2024 a las 19:49
The ‘TreeBagger’ class in MATLAB does not natively support incremental learning, which means you can't directly update an existing model with new data subsets.
You can try the following methods for efficient memory usage:
Train Multiple Models on Data Subsets:
  • Divide your dataset carefully so that it’s not biased
  • Train on each chunk
  • Combine models by averaging all the predictions
Preprocess data:
Consider down sampling or preprocessing your data before training. Feature selection, dimensionality reduction (e.g., PCA), or using a smaller, more representative subset of the data helps reduce the memory footprint.
Alternative algorithms:
If the above methods don’t work you can consider using other machine learning algorithms like XGBoost and LightGBM that can handle large datasets efficiently.
Hope this helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by