Faster/Simpler way to update my data in a table

44 visualizaciones (últimos 30 días)
stone_ward
stone_ward el 18 de Feb. de 2020
Comentada: Jon el 21 de Feb. de 2020
I have the following code:
H = height(T);
count = 0;
z=num2cell(0);
for i=1:1000
if strcmp(T{i,2},'sage')
count = count+1;
z = num2cell(20*count);
T{i,1} = z;
else
T{i,1} = z;
end
end
And it does precisely what I want it to. My issue is this loop is being applied to a massive table. The table has just over 658k rows in it, and I will need this code to apply to any table I upload, so the row count could potentially be even more for other datasets. After the complete if statement, I added a second if statement of "if i = 1000; break; end" and then I did the "run and time". That portion took about 36 seconds, and I did the math and found that would mean that it takes about 6.7 hours for this one for loop to execute throughout the entire table.
Here is a screenshot of part of the table as an example:
I inlclude this only to show that the string 'sage' shows up at random intervals, so I don't know what commands I could use to skip using a for loop altogether (as that would be the faster option).
  4 comentarios
Ruger28
Ruger28 el 18 de Feb. de 2020
is the delta between 'grey' always the same? so once you find grey, the index from the first to the next will always be 20?
stone_ward
stone_ward el 18 de Feb. de 2020
No. And I just realized, it was supposed to be 'sage', not 'grey'... I apologize, I changed the character strings to colors as this is for a work project, and got mixed up. I've updated the code above to reflect that. But as you can see in the screenshot I included, sage first occurs on row 19, the second occurance is on row 30, and the third occurance is row 33. It's random.

Iniciar sesión para comentar.

Respuesta aceptada

Jon
Jon el 18 de Feb. de 2020
Editada: Jon el 18 de Feb. de 2020
Here's an approach that I think does what you want.
You would have to benchmark the run time, but the approach is quite vectorized and I would assume it is quite a bit more efficient than what you were doing
% create some data just to demonstrate
% you would read from your actual table
time = 1:100;
colorIdx = randi([1,4],100,1)
colorChoices = {'red','sage','fuschia','gray'}
color = colorChoices(colorIdx)'
% make the table
T = table(time(:),color)
% make the new column incrementing by 20 for each occurrence of sage
% make a logical vector of indices (rows) where color is gray
idlColorMatch = strcmp(T.color,'sage')
% make the new time column
T.newTime = 20*(cumsum(idlColorMatch))
  2 comentarios
stone_ward
stone_ward el 21 de Feb. de 2020
This worked perfect, thank you SO much!
Jon
Jon el 21 de Feb. de 2020
Glad to hear that was what you needed. Thanks for letting me know. The cumsum counting of the logical indices comes in handy anytime you need to count occurences, and is so amazingly simple.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Productos


Versión

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by