Make classification with huge dataset
Mostrar comentarios más antiguos
I'm trying to make classification with huge dataset containing 6 persons for training and here I'm getting this error from only 1 person dataset: "Requested 248376x39305 (9.1GB) array exceeds maximum array size preference." First of all I'm trying Bagged Tree and Neural Network classificators and I want to ask how can I do it? It's possible to learn these classificators in portions of datasets (learn saved classification model again)?
9 comentarios
Greg Heath
el 7 de Nov. de 2016
Please explain how 248376 x 39305 constitutes a 1 person data set
[ I N ] = size(input)
[ O N ] = size(target)
Thanks,
Greg
Mindaugas Vaiciunas
el 7 de Nov. de 2016
Editada: Walter Roberson
el 7 de Nov. de 2016
Walter Roberson
el 7 de Nov. de 2016
Please show your Tree Bagging code. https://www.mathworks.com/help/stats/treebagger.html does not return matrices.
Mindaugas Vaiciunas
el 7 de Nov. de 2016
Walter Roberson
el 7 de Nov. de 2016
Have you considered reducing the number of trees?
Mindaugas Vaiciunas
el 8 de Nov. de 2016
Greg Heath
el 9 de Nov. de 2016
Editada: Greg Heath
el 9 de Nov. de 2016
I still don't get it
39305/765
ans =
51.3791
Regardless, I think you should use dimensionality reduction via feature extraction.
Hope this helps,
Greg
Mindaugas Vaiciunas
el 9 de Nov. de 2016
Greg Heath
el 10 de Nov. de 2016
Of course it will affect it. However, the way to choose is to set a limit on the loss of accuracy.
Respuestas (1)
Walter Roberson
el 7 de Nov. de 2016
0 votos
Add more memory (RAM) to you computer. Then check or adjust Preferences -> MATLAB -> Workspace -> MATLAB array size limit.
Or, you could set the division ratios so that a much smaller fraction is used for training and validation, with most of it left for test. This effectively uses only a small subset of the data, but a different small subset each time it trains.
6 comentarios
Mindaugas Vaiciunas
el 7 de Nov. de 2016
Walter Roberson
el 7 de Nov. de 2016
Amazon Web Services, among other providers, make available machines with more than 36 Gb of RAM. If you had that much RAM your program would run; therefore adding RAM is a solution for the problem.
Mindaugas Vaiciunas
el 8 de Nov. de 2016
Walter Roberson
el 8 de Nov. de 2016
https://www.mathworks.com/products/parallel-computing/matlab-parallel-cloud/ 16 workers, 60 Gigabytes, $US 4.32 per hour educational pricing, including compute services.
Or if you provide your own EC2 instance, https://www.mathworks.com/products/parallel-computing/parallel-computing-on-the-cloud/distriben-ec2.html $0.07 per worker per hour for the software licensing from MATLAB. For example you could do https://aws.amazon.com/ec2/pricing/on-demand/ m4.4xlarge, 16 cores, 64 gigabytes, $US 0.958 per hour for the EC2 service. Between that and the $0.07 per worker from Mathworks it would come in less than $US2.50 per hour. About the price of a Starbucks "Grande" coffee.
Remember, your time is not really "free". At the very least you need to take into account "opportunity costs" -- like an hour spent fighting a memory issue is an hour you could have been working on a minimum wage job.
Mindaugas Vaiciunas
el 9 de Nov. de 2016
Walter Roberson
el 9 de Nov. de 2016
Let me put it this way:
- You do not with to reduce the number of trees or the data because doing so might decrease the recognition rate
- We do not have a magic low-memory implementation of the TreeBagger available.
- You do not have enough memory on your system to run the classification using the existing software
Your choices would seem to be:
- write the classifier yourself, somehow not using as much memory; or
- obtain more memory for your own system; or
- obtain use of a system with more memory
Categorías
Más información sobre Licensing on Cloud Platforms en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!