Is there a better way to compute metrics on labeled array elements.

Question

Burke Rosen el 17 de Jun. de 2018

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/406050-is-there-a-better-way-to-compute-metrics-on-labeled-array-elements

Editada: Burke Rosen el 18 de Jun. de 2018

For example, I have a 1d double array 'data' and a 1d cell array of strings called 'labels'. For each unique label I want the mean of the data. The best I have come up with is below. I don't believe this is fully vectorized. Is there a better way?

%%make sample dataset
n = 1000;
data = rand(n,1);
labels = char(randsample(97:122,n,true)');%[a-z]
%%get means for each label
[uniLab,~,labIdx] = unique(labels,'stable');% stable for speed
mu = arrayfun(@(x) mean(data(labIdx==x)),1:numel(uniLab));

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Walter Roberson el 17 de Jun. de 2018

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/406050-is-there-a-better-way-to-compute-metrics-on-labeled-array-elements#answer_325022

https://www.mathworks.com/help/stats/grpstats.html

2 comentarios
Mostrar NingunoOcultar Ninguno

Walter Roberson el 17 de Jun. de 2018

Abrir en MATLAB Online

The last step of your code can be replaced by

accumarray(labIdx, data, [], @mean)

Burke Rosen el 18 de Jun. de 2018

Editada: Burke Rosen el 18 de Jun. de 2018

This yields a ~25% speed increase at n = 1e3 and ~5% at n = 1e5. (500 trials per algorithm, randomized order). Thank you.

Iniciar sesión para comentar.

Answer 2

Burke Rosen el 17 de Jun. de 2018

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/406050-is-there-a-better-way-to-compute-metrics-on-labeled-array-elements#answer_325032

Thank you for that tip @Walter.

After further review:

1. The way I wrote the sample data set, labels is actually a character array not a cell array, one has to cellstr it to yield that.

2. mu = grpstats(data,labels,'mean') is compact, easy to read, and maybe 1 or 2 percent faster that my formulation, if one adds the cellstr.

3. My solutions is 5x faster than grpstats if labels is a character rather than a cell array.

4. My guess is that unique operates much faster on character arrays than cell arrays and the runtime of the loop (or arrayfun) over the unique labels is negligible compared the unique itself.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Is there a better way to compute metrics on labeled array elements.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Is there a better way to compute metrics on labeled array elements.

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos