Why is LASSO in MATLAB so slow in the case of highly correlated predictors?

3 visualizaciones (últimos 30 días)
I am using LASSO based on 4-fold cross-validation in a regression problem. I observed that with an increasing number of predictors, the computation time for the MATLAB LASSO function increases dramatically, such that it becomes unfeasible for me (since I need to run the LASSO several 1000 times). E.g, for 100 predictors, LASSO needs mor than 60 sec. The same example in Python takes only few seconds. What could be the reason for such a difference in computation speed? ---added later: I observed that it is not the number of predictors that affects LASSO computation time, but the degree of colinearity in the predictors. MATLAB algorithm 'cDescentCycle' takes almost all the computation time. MATLAB help suggests using ELASTIC NET (set alpha < 1) in case of highly correlated predictors. ELASTIC NET is a bit faster,but is still unfeasible slow. I have not done further tests with LASSO implemented in python. I still don't know what to do to increase speed of LASSO in the case of highly correlated predictors (reducing the number of Lambda values or increasing the RelTol parameter does help only very little, ~few sec).

Respuesta aceptada

Ilya
Ilya el 1 de Dic. de 2015
There could be many reasons. The lasso function has a lot of flexibility, so make sure you are comparing apples and apples. To make it run faster, you could
  1. Use fewer values of lambda.
  2. Increase the relative tolerance.
  3. Try standardizing or not standardizing predictors.
  4. Try running in parallel if you have a Parallel Computing Toolbox license.
The function would still be likely slower than C/C++ or Fortran code.
  2 comentarios
Marlis Hofer
Marlis Hofer el 2 de Dic. de 2015
Editada: Marlis Hofer el 2 de Dic. de 2015
Thanks for your answer! I have already tried out different options of LASSO (e.g., increasing the RelTol one order of magnitude, decreasing NumLambda to 50, using the Parallel option). This helped to increase speed but only for a small fraction of the total run time, such that it is still too slow. I agree that I should not compare Python with MATLAB without specifying the exact options in each algorithm. However, I observed (as also updated in my question) that it is not the number of predictors, but the collinearity amongst them which affects the speed.
Ilya
Ilya el 2 de Dic. de 2015
Editada: Ilya el 2 de Dic. de 2015
If you are willing to experiment a bit, try this. Find the cdescentCycle function inside lasso and replace line 799 (line numbers could be different in your version)
for j=find(active);
with these 3 lines:
a = find(active);
a = a(randperm(numel(a)));
for j=a
Does this help?

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Get Started with Statistics and Machine Learning Toolbox en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by