- Divide your dataset carefully so that it’s not biased
- Train on each chunk
- Combine models by averaging all the predictions
TreeBagger Training, large datasets
13 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I want to train the TreeBagger Classifier with a large dataset (4 mio x 1 array). My PC runs out of memory if I try to do this in one run! Is their a chance to run the Training in a loop? I was wodering if I could first use a subsets of the training data to train the TreeBagger algorithm and update it with the missing subsets. Could I use the results of the first Training-run as some kind of prior for the next?
Thanks, Claire
0 comentarios
Respuestas (1)
TED MOSBY
el 15 de Nov. de 2024 a las 9:34
Editada: TED MOSBY
el 18 de Nov. de 2024 a las 19:49
The ‘TreeBagger’ class in MATLAB does not natively support incremental learning, which means you can't directly update an existing model with new data subsets.
You can try the following methods for efficient memory usage:
Train Multiple Models on Data Subsets:
Preprocess data:
Consider down sampling or preprocessing your data before training. Feature selection, dimensionality reduction (e.g., PCA), or using a smaller, more representative subset of the data helps reduce the memory footprint.
Alternative algorithms:
If the above methods don’t work you can consider using other machine learning algorithms like XGBoost and LightGBM that can handle large datasets efficiently.
Hope this helps!
0 comentarios
Ver también
Categorías
Más información sobre Classification Ensembles en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!